Mod ndb formats
| mod_ndb |
|---|
Contents |
[edit] Output Formats in mod_ndb
An Output Format (or Result Format) is a method that mod_ndb uses to transform database result sets into HTTP response pages. Mod_ndb includes three built-in formats -- raw, JSON, and XML-- and also has the ability to use custom result formats that are defined in the Apache configuration file using a simple declarative programming language, described here.
[edit] Defining Output Formats
The following code describes a simple HTML table format, which can present result sets of any number and rows or columns as an HTML table.
<ResultFormat "HTML"> Format HTML = '$table$\n' Scan table = '<table> $row$ ...</table>' Row row = '<tr> $data$ ... </tr>\n' Record data = '<td>$value/x$</td>' </ResultFormat>
The definition of a format is contained within a <ResultFormat> container in httpd.conf, with the name of the format specified in the opening tag. The format name must be enclosed in qutoes. Each line of the definition describes one object. The definition lines may appear in any order. Definition lines have the syntax:
ObjectType ObjectName = Definition
The one exception to this is the object type "Record," which has the syntax:
ObjectType ObjectName = Definition [ or Definition ]
In this case, the first definition will be used for columns with non-null values, and the second for values that are NULL.
[edit] Object Types
There are four types on objects that may be defined in the format description.
- Format
- Every result format must contain a Format object, which, by convention, has the same name as the result format itself.
- Scan
- A Scan object functions as a loop over the rows in a result table. Scans must be defined for all queries that perform table scans or ordered index scans; they are not necessary for one-row lookups that use a primary key or unique index.
- Row
- A Row object functions as a loop over the columns of an individual result row.
- Record
- A Record object specifies how to format an individual result column, including how to encode characters and quote values. It optionally specifies a second encoding specifically for NULL values.
[edit] Object Names and References
Every object in a format definition must be named. An object name must start with a letter, and may not contain a dollar sign ($). An object is referenced from within another object by enclosing its name in dollar signs. For example, in the built-in JSON format, the row object is defined as ' { $item$ , ... }', where $item$ is a reference to an object named item.
[edit] Syntax of Object Definitions
Object definitions are parsed according to the following rules:
- Object definitions are contained within single quotes.
- Anything surrounded by dollar signs, e.g. $row$, is treated as an object reference.
- The ellipses, i.e. ... , may appear in Scan and Row definitions, where it is treated as a looping symbol.
- The sequence \n represents a line break.
- The sequence \' represents a single quote character.
- An identifier enclosed in dollar signs, such as $row$, is treated as an object reference or a column reference.
- Everything else in a format, including white space, is interpreted as literal text.
[edit] Column References
Column references are the meat of a result format, can take three basic forms:
- column ordinal
- $1$, $2$, etc. indicate the first, second, and successive columns within a row. Ordinal references can be used inside both Row objects and Record objects. In the output, the reference is replaced by the value of the associated column.
- $name$
- $name$ can be used only inside a Record object, which is always invoked in the context of a particular column. In the output, the reference is replaced by the name of the column.
- $value$
- $value$ can be used only inside a Record object, which is always invoked in the context of a particular column. In the output, the reference is replaced by the value of the column.
Column modifiers specify how particular columns should be encoded and quoted. They appear after the after the ordinal or name of a column reference, separated by a slash, and before the terminal dollar sign (e.g. $name/qx$). The following modifiers are available:
- x
- encode the value for XML. The characters <, >, &, and " are represented as <, >, &, and ", respectively.
- j
- encode the value for JSON. The ", \, /, backspace, formfeed, newline, carriage return, and tab characters are represented as \", \\, \/, \b, \f, \n, \r, and \t, respectively.
- Q
- enclose the value in quotes.
- q
- enclose the value in quotes if it is from a column of a string type (type CHAR, VARCHAR, DATE, TIME, DATETIME, or TEXT), but not if it is numeric.
Modifiers play a crucial role in the ensuring the correctness and security of mod_ndb applications, and should be used whenever possible. Neglecting to use appropriate encoding in an output format can lead to cross-site scripting vulnerabilities and other flaws. If your application requires a modifier or encoding that is not available, please send a note to the mod_ndb mailing list.
Examples of column references:
- $1/Qx$
- value of the first result column, quoted, and encoded for XML or HTML.
- $value/qj$
- value of the current column (inside a Record object), encoded for JSON, and quoted only if the column type is a string type.
- $name/Q$
- name of the current column (inside a Record object), quoted.
[edit] Object Specifications
[edit] Format
The definition of a Format may contain leading text, a reference to a core object, and trailing text.
When a Format object is invoked, the leading text is copied literally at the beginning of a result page; the core object is invoked to create the body of the page; and the trailing text is copied literally onto the end of the result page. Example:
Format JSON = '$scan$\n'
The core object in a Format may reference either a Row or a Scan. If the core is a Scan object, the format can be used for both single-row lookups and multi-row scans. If the core is a Row object, however, the format is usable only for single-row lookup queries using either a primary key or a unique index.
Note: Usually, a top-level Format object should include a terminal line feed in its trailing text, so that every response page ends with a line break.
[edit] Scan
The definition of a Scan object may contain optional leading text, an optional reference to a Row object, optional separator text terminated by an ellipses looping symbol, and finally optional trailing text.
When a Scan object is invoked, the leading text is copied literally into the output before any rows. The core Row object is invoked once for each result row. The separator text is copied literally into the output before each row other than the first row. Finally, the trailing text is copied literally into the output at the end of the table.
In one-row lookups (using a primary key or ordered index), the one result row is available to every object in the format, and no Scan object is required. In queries that involve a scan of many rows, however -- either a full-table scan or an ordered index scan -- the Scan object makes the result rows available to the objects it invokes. Objects referenced in the scan can access actual result rows, but the top level Format and Scan cannot access them.
In the internal JSON format, each table is presented as a JSON array, which begins with a bracket "[", contains members separated by commas, and ends with a closing bracket "]". The default JSON format adds line breaks for improved readability.
Scan scan = '[\n $row$,\n ... \n]'
[edit] Row
A Row object is a looping object, much like a Scan, though while a Scan describes a loop over the rows of a table, a Row describes a loop over the columns of an individual row.
The definition of a Row object may contain leading material, an optional reference to a Record object, and optional plain-text separator terminated by an ellipses looping symbol, and optional trailing material.
When a Row object is invoked, the leading material is copied into the output. The leading material may contain a combination of plain text and ordinal column references. The core Record object is invoked once for each column in the row. The separator text is copied literally into the output before each column, other than the first. The trailing material, which may contain a combination of plain text and ordinal column references, is copied into the end of the row.
In the internal JSON format, each row is presented as JSON object (as described in the JSON spec at [1]). A JSON object begins with a curly brace "{", contains name/value pairs, and ends with a closing brace "}". The name/value pairs are filled in by the inner Record object, item.
Row row = ' { $item$ , ... }'
[edit] Record
A Record object is invoked with the context of a single value within a result set. Unlike the other objects, a Record may contain two definitions; the first definition is used for non-null values, and the second for NULLs.
Each definition may contain a combination of plain text and column references. It may not contain references to other kinds of object.
The Record object in the standard JSON format is:
Record item = '$name/Q$:$value/qj$' or '$name/Q$:null'
Non-null values are represented by the column name (always quoted), then a colon, and finally the column value (encoded properly for JSON, and quoted if non-numeric). NULL values are represented by the quoted column name followed by a colon and the word null.
[edit] Examples
[edit] Hyperlink
Here is a simple format for a one-row lookup that formats two result columns as an HTML hyperlink. The first column contains the link URI, and the second is the link text.
<ResultFormat "href">
Format href = ' $row$ \n'
Row row = '<a href=$1/Qx$>$2/x$</a>'
</ResultFormat>
Both the link href ($1/Qx$) and the link text ($2/x$) are encoded for XML output. (If the output was not encoded, a malicious user might be able to insert Javascript code into the database, and cause that code to run inside the browsers of users who retrieved the data and followed the hyperlink).
[edit] Bulleted HTML
The commentlist format (used in the Roman Emperors demo application) takes a scan over a single VARCHAR column and presents it as a bulleted list in HTML:
<ResultFormat "commentlist">
Format commentlist = '$scan$'
Scan scan = ' $row$ ... '
Row row = ' • $1/x$ <br />\n'
</ResultFormat>
[edit] Trivial Format
Using the "fixed" result format here, any successful query will return a 200 OK response with a response body that simply says "Query successful."
<ResultFormat "fixed">
Format fixed = 'Query successful.\n'
</ResultFormat>