Rococo S-Expression is a standard based on Lisp S-Expressions, It was first used as the syntax of Rococo's Sexy script language.
A Rococo S-Expression is an ASCII string with legal codes in the range codes 32-127 plus line feed (code 12), return (code 13) and horizontal tab (code 9). Every code inside the range from 1-31 is permitted, and ANSI characters 128-255 are also permitted, but in both cases should be avoided. Generally human programmers should not use characters outside the basic ASCII range, but lexical analyzers should not report them as errors. S-Expressions are limited to 2^31-1 bytes, or 1 byte short of 2 gigabytes. This allows a 32-bit signed integer to hold the length of any legitimate S-Expression. It also sanitizes the length of files - file editors can assume any file >= 2GB should not be parsed as an S-Expression. Encoded UTF-8 sequences are also permitted, but not given explicit support by the Rococo APIs at the time this document was written. Generally the only illegal character code in any context is the null character (code 0). An in-memory S-file should terminate the text block with a null character so that C-style APIs can interpret the S-file as a C-Style string.
Space (code 32), line feed (code 12), return (code 13) and horizontal tab (code 9) define the complete set of blank-space characters. They are used to separate tokens. Parenthesis characters '(' and ')' are used to demarkate the beginning and end of compound expressions. There is a single quotation character - the double quote: ".
Comments follow C++ rules. Sexy Script was designed to script C++ programs, so inherits some of its characteristics to make it easier for C++ programmers to work with both.
The comment character '/' can be used to start either a line-comment or a block-comment.
If doubled, as in '//', then it begins a line-comment, and everything that follows is skipped by lexical analysis; analysis continues once a line feed or return character is hit.
If the comment character is followed by an asterix as in '/*', then a block comment begins. Lexical analysis skips characters until the sequence '*/' is read. Blank-space characters do not terminate a block-comment, so block-comments are generally used either for a paragraph of comment text, or to mark a comment within a line.
A compound expression is defined recursively as a sequence of tokens of at least one token in length, separated by blank-space, with each token either being an atomic expression, a null expression, a string literal expression or a compound expression. The tokens are termed the children of the compound expression, with counting begining from zero and incrementing from left to right, so all compound expressions will have a child 0 as its left-most token. If a token is a compound-expression, then it is demarkated with opening and closing parenthesis.
(this is a (compound) expression with (8) children) /* A compound expression with eight children, two of which are compound expressions */
An atomic expression is a sequence of ASCII characters, at least one character in length, none of which is blank-space, comment characters, parenthesis characters or quotation characters.
(this-is-an-atomic-expression) // A compound expression with one child that is an atomic expression
An expression that has no children is like a compound-expression with notionally no tokens, but it is not a legitimate compound-expression, it is called the null expression. There is only one way to define a null expression, by using opening and closing parenthesis with at most blank-space between either character, like this: ( )
(This is a null expression: ( )) /* A compound expression with 6 children, the final one is the null expression */
An expression that opens and closes with a double quote character is called a string literal. A lexical analyzer will treat it as one token, stripping the quote characters from the interal representation. In such a context the ampersand character '&' acts as the escape character. If the escape character is scanned then it is stripped from the internal representation, along with a number of trailing characters called the escape sequence, and substituted with the value of the escape sequence. The following escape sequences are defined:
| Escape Sequence | Decimal Ascii Value | Value |
| && | 38 | ampersand: & |
| &a | 7 | alert |
| &b | 8 | backspace |
| &e | 38 | ampersand: & |
| &f | 12 | form-feed |
| &r | 13 | carriage return |
| &n | 10 | line feed |
| &t | 9 | horizontal tab |
| &v | 11 | vertical tab |
| &' | 39 | single quote: ' |
| &" | 34 | double quote: " |
| &xAB | N/A | hexcode AB, where A and B are hexadecimcal digits giving the character code |
("This is a string literal&n") // A compound expression with one child, a string literal.
In a plain-text file, parsed as a Rococo S-Expression file, the text forms the root expression. Root expressions have no parents, but may have children. For a file of zero length or constituted purely of blank-space - a blank file - the root expression is deemed a null expression. If none-blank-space characters are present then the root expression is a compound expression. Root expressions defined in text files do not have enclosing parenthesis.
In the Rococo repository is a project sexy.s-parser. This is the foundational parser for the Sexy scripting language, and converts an S-Expression file into an S-Expression tree. The module is designed to mimimize memory fragmentation and can convert arbitrarily large S-Expression files into expression trees with a single memory allocation - the S-block. This also has the advantage of keeping trees localized in memory, which may increase parsing speed by capitalizing on the cache behaviour of modern CPUs. The module is lightweight, so is easily used in any project that benefits from S-Expression files.