Advantages of S-Expressions:
- Easily and universally parsable by computers.
- Easily parsable by humans.
- Some people don't like all the parentheses.
- Prefix notation for code is not intuitive to most people. (However, it is possible to do infix notation as well)
A list is: an opening parenthesis, then all the S-expressions it contains, terminated by closing parenthesis.
An integer is: 0b[01]+ | 0[0-7]* | [1-9][0-9]* | 0[xX][0-9A-Fa-f]+
A string is: a double quote character, followed by characters and/or backslash-escaped characters, terminated by another double quote. A string need not, at the storage layer, consist of valid UTF-8, but it should for all the visible uses. A string may contain embedded NULs. (I'm thinking about a potential alternate storage mechanism for variables)
A token is: anything else, just like a string except not surrounded by quotes.
It should be obvious that integers and strings are merely specializations of a token, except for how much input is consumed.
There shall be extra-syntactical comments, I recommend # at least, // and /* */ could be nice.
Some callers might treat a string and a token identically - e.g. atcommands where the GM may or may not quote the character's name if it contains no whitespace or parentheses, or when defining (function foo body...) vs (function "foo" ...)
Ambiguous behavior:
lack of whitespace between S-Expressions, e.g. ()()"abc"3g
whitespace in a token, e.g. \ \t\x20
Errors:
- no closing parenthesis before eof
- no closing quote before eof
- unexpected closing parenthesis
- invalid backslash-escape in string or token
- to load conf files
- required by all servers.
- it should be possible to run a server without any configuration files at all, except that the hard-coded defaults are currently for eathena, not tmwAthena (e.g. port 6900 instead of 6901) - this is on my list to fix.
- There are many options that have be deleted or changed over the course of my rewrite, so the conf files will need to be updated anyways.
- The parser won't have to handle imports of another file, whoever takes the S-expression will do that.
- to store accounts, characters, etc.
- handled by the login-server and char-server, which are isolated from "content"
- requires dynamic conversion (probably as a helper program, although it could be integrated)
- to store the mob and item dbs
- handled by the map server
- requires one-time conversion of existing content, which shouldn't be too hard.
- Does not require the server to write it out again
- this includes {script}, which would be changed to "script" or "{script}" as a temporary measure until s-expressions are implemented for scripts as well. (I'm not sure whether the script-runner requires the }).
- to reimplement scripts and magic
- map server
- This is the place where the existing code is the worst.
- This will require much changes to content, which I cannot do on my own
- This will require much changes to code (e.g. all builtin functions have to be rewritten)
- Some people aren't used to prefix notation.
- This will require voodoo if we are to support syntactical (rather than programmatic) array indexing. @array[@elt] vs (getarrayelt @array @elt)
- This would make loops and block if-then-else really easy
- This will allow scripts to be simplified and parsed at load time, rather than at execution time
- this is probably the worst-performing part of the current server
- this will allow constant evaluation and type checking
- optimize by changing item/mob ID/name into pointers to the internal data structures
- internally storage: (may or may not be storage a an s-expression with a pointer-to-other) expression-to-be-evaluated, next-if-true, next-if-false
- optimize gotos into that.
- What is currently "one line" in config/save/db files becomes "one top-level S-Expression". For generated S-Expressions (save files), which would likely still be one line, but it may be nice to wrap at known points for item/mob dbs
- Every top level S-Expression should be a list, the first element of which should be the type, even in files that only contain one type of object.
- I recommend the first row of each object (not in conf files, just save and db files) be a header row to ensure forward and backward compatibility. Allowing multiple header rows would allow concatenation even across versions. There would be a helper function to read files like this get_object_field("name").
- Bitfields in item/mobs should be transformed into list of traits, e.g. (aggressive undead fire boss)
- Every list that is currently spread across multiple fields (e.g. mob drops) shall be stored as a list of individual whatevers (in this case, each element would be a list containing two elements, item ID and droprate)
- save files should write a magic token - or perhaps, an empty list - at the end of file to indicate clean saving. (Or perhaps, just surround the whole file in a set of parentheses? but that approaches "superfluous") This, plus a decent backup policy and set of ladmin commands, is everything that SQL support would give us, but much simpler (less computational overhead, easier to set up, less prone to errors). (If it's all on one hard drive and the whole disk dies, you still lose everything)
I'm not working on this yet.
The parser should be relatively easy to code, and completely independent from the rest of my work on tmwAthena, so someone else who wants to get involved could do it.
- class SExpr should have a default constructor that initializes it to the empty list. This should be stored as all zero bytes.
- I'm not sure whether C++ copy-construction or assignment should be allowed
- It shall have a clear() method that frees previous data and makes it into the empty list
- It shall provide setters for string/token from a C and/or C++ string, for the integer from a signed C/C++ integer, and from pointer/length to sub
- It should have a destructor that calls clear()
- It must allow the addition of custom pointer types (which will be ignored in the destructor)
- It shall provide begin() and end() methods (for use with STL), and a safe subscript operator (returning a reference to a static, immutable SExpr of the empty list if passed an out-of-bounds index) for the case where it is a list (if it is not a list, should it throw an exception?)
- It shall provide a string converting function, that returns the empty string if type is not string or token.
- It shall provide an integer conversion operator, that returns 0 if type is not integer.
- The reader function should take a reference to S-expression and return an error/warning code. It is the responsibility of the caller to (typically) kill the server when .
- There should also be a reader that reads from memory instead of from a stream (It would be nice if this was pointer/length rather than nul-terminated pointer, to avoid the allocation and copying in the use-cases where we don't get a nul-terminated string).
- There should be a helper function for reading an entire file as a list of S expressions (note that they are not actually surrounded by one great set of parentheses in the file)
- There shall be no reference to any tmwAthena-specific types, I'm planning on making a new source directory, src/lib/, for such things.
Code: Select all
class SExpr
{
union
{
uint64_t len;
int64_t ival;
};
union
{
char *string_or_token;
SExpr *list_elts;
void *anything_else;
};
uint8_t type;
};
Note that this integer size need not be the same size used internally by the scripting engine, it is only relevant for constants.
However, it would be nice if constants could cover the whole range of integers used by the scripting engine, and
sometimes we have to deal with millisecond-precision times, and 49 days is really not an acceptable wrap time (the code has to jump through many hoops to avoid problems), whereas 500 million years is not my problem.
I wonder how much I could have gotten done if I had been writing code instead of writing this post. But, since this proposed change would require significant attention from the content team and from the server admins, it's worth it. Besides, it's relaxing to take a break from the code.
I expect to release at the least a regression-hunt version before this conversion begins, however. One possibility is not to do the map-server specific changes in the current code base, but instead create a copy of the map server that did it, being protocol-compatible with char-server and client but not content-compatible with its twin.
Now, who spots the stealth pun in this post? I can guarantee that whoever implements the parser will get it.