tmwAthena data storage proposal

Content and general development discussion, including maps, quests, and server code from the development team.
Post Reply
User avatar
o11c
Knight
Knight
Posts: 2262
Joined: 20 Feb 2011, 22:09
Location: ^ ^

tmwAthena data storage proposal

Post by o11c » 13 May 2011, 01:26

Currently, every place that reads a file uses its own parser, which is bad design in general as well as being badly implemented by tmwA (e.g. requiring tabs). I propose a new storage system based on S-Expressions, to be used for all inputs that need reading: saved stuff, mob/item dbs, constants, scripts, magic, and (to some degree), the arguments to @atcommands and #magic (among other things, to make their handling of quotes universal).

Advantages of S-Expressions:
  • Easily and universally parsable by computers.
  • Easily parsable by humans.
Disadvantages of S-Expressions:
  • Some people don't like all the parentheses.
  • Prefix notation for code is not intuitive to most people. (However, it is possible to do infix notation as well)
My definition of an S expressions contains 4 (public) types: lists of S-expressions (should be implemented as an array, rather than a linked-list, because S expressions should not be modified after being read), integers, strings, and tokens (other types may be used internally, e.g. during optimization of scripts)
A list is: an opening parenthesis, then all the S-expressions it contains, terminated by closing parenthesis.
An integer is: 0b[01]+ | 0[0-7]* | [1-9][0-9]* | 0[xX][0-9A-Fa-f]+
A string is: a double quote character, followed by characters and/or backslash-escaped characters, terminated by another double quote. A string need not, at the storage layer, consist of valid UTF-8, but it should for all the visible uses. A string may contain embedded NULs. (I'm thinking about a potential alternate storage mechanism for variables)
A token is: anything else, just like a string except not surrounded by quotes.
It should be obvious that integers and strings are merely specializations of a token, except for how much input is consumed.
There shall be extra-syntactical comments, I recommend # at least, // and /* */ could be nice.
Some callers might treat a string and a token identically - e.g. atcommands where the GM may or may not quote the character's name if it contains no whitespace or parentheses, or when defining (function foo body...) vs (function "foo" ...)

Ambiguous behavior:
lack of whitespace between S-Expressions, e.g. ()()"abc"3g
whitespace in a token, e.g. \ \t\x20

Errors:
  • no closing parenthesis before eof
  • no closing quote before eof
  • unexpected closing parenthesis
  • invalid backslash-escape in string or token
Where tmwAthena will use S-Expressions:
  • to load conf files
    • required by all servers.
    • it should be possible to run a server without any configuration files at all, except that the hard-coded defaults are currently for eathena, not tmwAthena (e.g. port 6900 instead of 6901) - this is on my list to fix.
    • There are many options that have be deleted or changed over the course of my rewrite, so the conf files will need to be updated anyways.
    • The parser won't have to handle imports of another file, whoever takes the S-expression will do that.
  • to store accounts, characters, etc.
    • handled by the login-server and char-server, which are isolated from "content"
    • requires dynamic conversion (probably as a helper program, although it could be integrated)
  • to store the mob and item dbs
    • handled by the map server
    • requires one-time conversion of existing content, which shouldn't be too hard.
    • Does not require the server to write it out again
    • this includes {script}, which would be changed to "script" or "{script}" as a temporary measure until s-expressions are implemented for scripts as well. (I'm not sure whether the script-runner requires the }).
  • to reimplement scripts and magic
    • map server
    • This is the place where the existing code is the worst.
    • This will require much changes to content, which I cannot do on my own
    • This will require much changes to code (e.g. all builtin functions have to be rewritten)
    • Some people aren't used to prefix notation.
    • This will require voodoo if we are to support syntactical (rather than programmatic) array indexing. @array[@elt] vs (getarrayelt @array @elt)
    • This would make loops and block if-then-else really easy
    • This will allow scripts to be simplified and parsed at load time, rather than at execution time
      • this is probably the worst-performing part of the current server
      • this will allow constant evaluation and type checking
      • optimize by changing item/mob ID/name into pointers to the internal data structures
      • internally storage: (may or may not be storage a an s-expression with a pointer-to-other) expression-to-be-evaluated, next-if-true, next-if-false
      • optimize gotos into that.
Structure within the S-Expressions:
  • What is currently "one line" in config/save/db files becomes "one top-level S-Expression". For generated S-Expressions (save files), which would likely still be one line, but it may be nice to wrap at known points for item/mob dbs
  • Every top level S-Expression should be a list, the first element of which should be the type, even in files that only contain one type of object.
  • I recommend the first row of each object (not in conf files, just save and db files) be a header row to ensure forward and backward compatibility. Allowing multiple header rows would allow concatenation even across versions. There would be a helper function to read files like this get_object_field("name").
  • Bitfields in item/mobs should be transformed into list of traits, e.g. (aggressive undead fire boss)
  • Every list that is currently spread across multiple fields (e.g. mob drops) shall be stored as a list of individual whatevers (in this case, each element would be a list containing two elements, item ID and droprate)
  • save files should write a magic token - or perhaps, an empty list - at the end of file to indicate clean saving. (Or perhaps, just surround the whole file in a set of parentheses? but that approaches "superfluous") This, plus a decent backup policy and set of ladmin commands, is everything that SQL support would give us, but much simpler (less computational overhead, easier to set up, less prone to errors). (If it's all on one hard drive and the whole disk dies, you still lose everything)
Implementation:
I'm not working on this yet.
The parser should be relatively easy to code, and completely independent from the rest of my work on tmwAthena, so someone else who wants to get involved could do it.
  • class SExpr should have a default constructor that initializes it to the empty list. This should be stored as all zero bytes.
  • I'm not sure whether C++ copy-construction or assignment should be allowed
  • It shall have a clear() method that frees previous data and makes it into the empty list
  • It shall provide setters for string/token from a C and/or C++ string, for the integer from a signed C/C++ integer, and from pointer/length to sub
  • It should have a destructor that calls clear()
  • It must allow the addition of custom pointer types (which will be ignored in the destructor)
  • It shall provide begin() and end() methods (for use with STL), and a safe subscript operator (returning a reference to a static, immutable SExpr of the empty list if passed an out-of-bounds index) for the case where it is a list (if it is not a list, should it throw an exception?)
  • It shall provide a string converting function, that returns the empty string if type is not string or token.
  • It shall provide an integer conversion operator, that returns 0 if type is not integer.
  • The reader function should take a reference to S-expression and return an error/warning code. It is the responsibility of the caller to (typically) kill the server when .
  • There should also be a reader that reads from memory instead of from a stream (It would be nice if this was pointer/length rather than nul-terminated pointer, to avoid the allocation and copying in the use-cases where we don't get a nul-terminated string).
  • There should be a helper function for reading an entire file as a list of S expressions (note that they are not actually surrounded by one great set of parentheses in the file)
  • There shall be no reference to any tmwAthena-specific types, I'm planning on making a new source directory, src/lib/, for such things.
Recommended C++ class layout:

Code: Select all

class SExpr
{
    union
    {
        uint64_t len;
        int64_t ival;
    };
    union
    {
        char *string_or_token;
        SExpr *list_elts;
        void *anything_else;
    };
    uint8_t type;
};
If 64-bit integers are used, they should go before the pointer; if 64-bit integers are used, they should go after the pointer.
Note that this integer size need not be the same size used internally by the scripting engine, it is only relevant for constants.
However, it would be nice if constants could cover the whole range of integers used by the scripting engine, and
sometimes we have to deal with millisecond-precision times, and 49 days is really not an acceptable wrap time (the code has to jump through many hoops to avoid problems), whereas 500 million years is not my problem.

I wonder how much I could have gotten done if I had been writing code instead of writing this post. But, since this proposed change would require significant attention from the content team and from the server admins, it's worth it. Besides, it's relaxing to take a break from the code.

I expect to release at the least a regression-hunt version before this conversion begins, however. One possibility is not to do the map-server specific changes in the current code base, but instead create a copy of the map server that did it, being protocol-compatible with char-server and client but not content-compatible with its twin.

Now, who spots the stealth pun in this post? I can guarantee that whoever implements the parser will get it. :P
Former programmer for the TMWA server.
User avatar
Crush
TMW Adviser
TMW Adviser
Posts: 8046
Joined: 25 Aug 2005, 17:08
Location: Germany

Re: tmwAthena data storage proposal

Post by Crush » 13 May 2011, 05:48

When I understood that correctly you want to change the format of the data files.

When you are going to do that, then why not use XML like the client does? That way you can make the life of the content managers much easier because they needn't learn two markup languages and they can use the same config files for client and server. This reduces maintainance work and is less error-prone.
  • former Manasource Programmer
  • former TMW Pixel artist
  • NOT a game master

Please do not send me any inquiries regarding player accounts on TMW.


You might have heard a certain rumor about me. This rumor is completely false. You might also have heard the other rumor about me. This rumor is 100% accurate.
User avatar
o11c
Knight
Knight
Posts: 2262
Joined: 20 Feb 2011, 22:09
Location: ^ ^

Re: tmwAthena data storage proposal

Post by o11c » 13 May 2011, 18:21

Your concerns are valid, in the context of item and mob DBs (I assume that is what you mean by config files, rather than the server-only stuff in conf/). However, while XML is suitable for arbitrarily complex data structures handled only by computer systems, it is not as suitable for fixed data structures, and is prone to errors if generated by humans.

For example, during the last content update, a typo of a '-' instead of a '=' in the XML file cause updates not to be downloaded. (Not that I'm suggesting changing the client data format, just this is an example of buggy human behavior).

In particular, I would not want to write npc or magic scripts in XML.

Also, keep in mind that what the client needs and what the server needs regarding items and mobs are quite different: the server needs to know mob stats, item scripts, etc, whereas the client needs to know all that graphical stuff. And, since these files are changed in practically every client update, the client version should contain the minimal necessary information.

Personally, I don't think items or mobs should be part of the client data at all, they should be sent as part of the server protocol. But that would break older clients, so I can't do that.

Lacking that, I would still like to see client and server mob/item data generated from the same source, and it's a lot easier to convert from S-Expressions to XML than vice versa.
Former programmer for the TMWA server.
nmaligec
Novice
Novice
Posts: 253
Joined: 08 Apr 2010, 02:55

Re: tmwAthena data storage proposal

Post by nmaligec » 13 May 2011, 21:32

o11c wrote: Where tmwAthena will use S-Expressions:
[...]
  • to store the mob and item dbs
    [...]
    • requires one-time conversion of existing content, which shouldn't be too hard.
    [...]
  • to reimplement scripts and magic
    [...]
    • This will require much changes to content, which I cannot do on my own
Are you really serious about changing the syntax for all user content? First off please provide example entries for the new mob_db and item_db files, along with a demo version of a new script file.

There are other projects besides The Mana World that use the server code base. Each of them would also have to update their content. Updating item and mob entries wont be so bad, but redoing all existing scripts is just plain not going to happen. At this point it would be more practical to hold off updating the server and wait for a stable server from the Mana Project.

If you are going to force the content change, then provide support for the old structures and add a toggle to the server config: content = old | new strict | mixed. The mixed setting would help test during the conversion process.

o11c wrote:For example, during the last content update, a typo of a '-' instead of a '=' in the XML file cause updates not to be downloaded. (Not that I'm suggesting changing the client data format, just this is an example of buggy human behavior).
Imagine the parenthesis mess that will occur with people not used to Scheme or Prologue. XML tags are at least a little more intuitive.
User avatar
o11c
Knight
Knight
Posts: 2262
Joined: 20 Feb 2011, 22:09
Location: ^ ^

Re: tmwAthena data storage proposal

Post by o11c » 14 May 2011, 03:28

While I don't think it will be feasible to support both syntaxes for scripts in one binary, it will be possible to connect one client compiled as each, to a single char-server ("World")

I am quite aware that scripts will be hard to port, but they can be ported one map at a time by using 2 map-servers - one for legacy scripts and one for new ones.

example format for mobs, with only known-nonworking fields removed:

Code: Select all

(header ID      Name            LV      HP      SP      EXP     JEXP    ATK     DEF     MDEF    stats           
    ranges  Scale   Race    Element     Mode                    Speed   Adelay  Amotion Dmotion
    drops
    mutations)
(mob    1002    "Maggot"        5       50      0       0       2       (5 10)  0       5       (1 1 1 0 6 5)
    (1 1 1) medium  plant   (water 2)   (canmove canattack)     800     1872    672     480     
    (("Maggot slime" 800) ("Cactus Drink" 150) ("Bug Leg" 400) ("Roasted Maggot" 150) ("Cactus Potion" 70) ("Sharp Knife" 10))
    (0 0))
The fields, of course, could be moved around - I recommend placing the variable-length ones, such as drops, last.
Using tokens instead of integers makes it pretty obvious that some of this mob's fields are wrong.

Of course, this could be improved by moving less-used fields out of "header-specified" indices and onto the tail of only the mobs that require them. I'd recommend making canmove and canattack implicit and instead specify cantmove and cantattack for the exceptions.
Here, all the extra fields are specified, but most of them would be left out in production.

Ideally, both EXP and JEXP would be automatically calculated.

Code: Select all

(header ID      Name            LV      HP      ATK     DEF     MDEF    stats           
    ranges  Speed   Adelay  Amotion Dmotion
    drops
    mutations)
(mob    1002    "Maggot"        5       50      (5 10)  0       5       (1 1 1 0 6 5)
    (1 1 1) 800     1872    672     480     
    (("Maggot slime" 800) ("Cactus Drink" 150) ("Bug Leg" 400) ("Roasted Maggot" 150) ("Cactus Potion" 70) ("Sharp Knife" 10))
    (0 0)
    (SP 0) (EXP 0) (JEXP 2) (scale medium) (race plant) (element (water 2)) (mode (canmove canattack)))
Come to think of it, LV should not be specified, it should be generated from stats.
Former programmer for the TMWA server.
User avatar
o11c
Knight
Knight
Posts: 2262
Joined: 20 Feb 2011, 22:09
Location: ^ ^

Re: tmwAthena data storage proposal

Post by o11c » 14 May 2011, 05:24

Any halfway decent text edit will highlight the matching parentheses, even if it doesn't have special support for LISP.

Remember that, with this as the script implementation, it would be able to catch syntax errors at load time, rather at run time as is now

Code: Select all

(npc    "Angela"
    (sprite 196)
    (map "031-2" 29 28)
    (script
        # Evaluated at load time
        (const @Q_Nivalis_state_MASK NIBBLE_5_MASK)
        (const @Q_Nivalis_state_SHIFT NIBBLE_5_SHIFT)
        (set @rescue_Cindy
            (>>
                (& QUEST_Nivalis_state @Q_Nivalis_state_MASK)
                @Q_Nivalis_state_SHIFT))
        (if (== @rescue_Cindy 4)
            (goto L_Hello_Again))
        (if (== @rescue_Cindy 3) 
            (goto L_Reward))
        (mes "...")
        (close)
    # Outdented for readability
    (label L_Reward)
        (mes "[Angela]")
        # mes will automatically concatenate all of its arguments
        # (there will also be an explicit way to concatenate to make a string)
        (mes "\"Hello "
            (strcharinfo 0)
            ", thank you again. I'm so glad Cindy is back home safe.\"")
        (next)
        (mes "\"As I told you, my husband is an adventurer. He is on one of his travels, so he couldn't rescue Cindy himself.\"")
        (next)
        (mes "\"I want to give you one of his treasures. Beside all the junk he brings, there are some very valuable things.\"")
        (next)
        (mes "\"This item is called the Rock Knife. When you wield it, you feel as robust as a rock.\"")
        
        (getinventorylist)
        (if (== @inventorylist_count 100)
            # I'm wavering on whether the block keyword would be required
            # or whether an extra set of parentheses will be needed even if
            # there's only one statement in the condition
            # Otherwise, it would be interpreted as calling the function
            # returned by invoking the first argument. Of course, I'm
            # not sure whether I should allow functions to be passed at all.
            # This isn't a real LISP, after all, it merely uses S-Expressions
            (block
                (mes "\"Oh, it seems you carry so much stuff - I will keep it for you until you can take it.\"")
                (close)))

        # default argument should be interpreted as 1 in this case
        # (it would be 0 if the getitem function didn't check the number of arguments)
        # Also, the getitem function should optimize at load time to use a pointer to the actual item data
        (getitem "RockKnife")
        
        (set @rescue_Cindy 4)
        # We're in the NPC's scope so we can just call it like this
        # (This would be resolved to actual function pointer at load time)
        (S_Update_Mask)
        
        (next)
        (mes "\"I hope this will be useful for you.\"")
        (next)
        (mes "\"I am so glad Cindy is safe. But there is still another problem. The Yetis took away all the white and yellow present boxes we wanted to bring to Santa!\"")
        (next)
        (mes "\"Usually, Yetis are very shy - I wonder why they did that. There is something strange going on.\"")
        (next)
        (mes "\"May I ask you for help again? I'll give you a small reward for every 3 boxes of one color you bring me.\"")
        (close)
        
    (label L_Hello_Again)
        (mes "[Angela]")
        (mes "\"Hello! Good to see you again. Please warm yourself.\"")
        (next)
        # Only add the relevant menu entries
        # Monolithic menus would also be supported
        (menuopt "I just wanted to say hello.")
        (if (countitem "YellowPresentBox")
            (menuopt "I have some yellow present boxes." L_Yellow))
        (if (countitem "WhitePresentBox")
            (menuopt "I have some white present boxes." L_White))
        # It might be illegal to have a (next) or other blocking call between the menuopt's and the domenu
        (domenu)
        (close)
        
    (label L_Yellow)
        (set @dq_level 70)
        (set @dq_cost 32)
        # (set @dq_count 3)
        # ending string variables with $ would be merely a convention
        # (set @dq_name$ "YellowPresentBox")
        (set @dq_friendly_name$ "yellow present box")
        (set @dq_money 5300)
        (set @dq_exp 1300)

        # Function not found in this scope, call the one in global scope instead
        # (all functions parsed at once, then resolved together)
        (DailyQuest "YellowPresentBox" 3)
        
        (next)
        
        (if (== @dq_return 4)
            (mes "\"Santa will be glad to have them back.\"";))
        (close)

    (label L_White)
        (set @dq_level  80)
        (set @dq_cost 64)
        # (set @dq_count 3)
        # (set @dq_name$ "WhitePresentBox")
        (set @dq_friendly_name$ "white present box")
        (set @dq_money  10800)
        (set @dq_exp 2800)

        (DailyQuest "WhitePresentBox" 3)
        
        (next)
        
        (if (@dq_return ==4)
            (mes "\"You are a great help!\""))
        (close))
    # This is a function, rather than a script
    # It requires a name and argument list
    # but the old way, of passing through named variables, still works
    (function S_Update_Mask ()
        # proposed function
        (setnibble 5 QUEST_Nivalis_state @rescue_Cindy)
        # or, the old way
        (set QUEST_Nivalis_state
            (|  (&  QUEST_Nivalis_state
                    (~ @Q_Nivalis_state_MASK)))
                (<< @rescue_Cindy @Q_Nivalis_state_SHIFT)))
# Traditionally, in Lisp, all trailing parentheses are appended
# However, I wouldn't mind if the top-most ones weren't
) # npc Angela


# In another file
(function DailyQuest ((item @dq_name$) (int @dq_count))
    (set @dq_earliest
        (- (gettimetick 2) 86400))
    (if (< DailyQuestTime @dq_earliest)
        (set DailyQuestTime, @dq_earliest))

    //how many whole daily quest points the player has earned
    //we increment DailyQuestTime by the number of seconds in that many increments
    (set @dq_increments
        # Indentation makes it obvious what goes with what
        (/  (*  (- (gettimetick 2)
                    DailyQuestTime)
                BaseLevel)
            86400))
    (+= DailyQuestTime
        (/  (*  @dq_increments 86400)
            BaseLevel))

    (if (< DailyQuestPoints BaseLevel) 
        (block
        //normal recharging case - increment, but don't let it recharge more than a day's worth
        (+= DailyQuestPoints @dq_increments)
        (if (> DailyQuestPoints BaseLevel)
            (set DailyQuestPoints BaseLevel))))
    //bonus *is* allowed to push DailyQuestPoints above BaseLevel
    (if DailyQuestBonus
        (+= DailyQuestPoints DailyQuestBonus))
    (unset DailyQuestBonus)

    (if (< BaseLevel @dq_level)
        (goto L_Low_Level))
    (if (< DailyQuestPoints @dq_cost)
        (goto L_Not_Enough_Points))

    (mes "\"If you bring me " @dq_count " " @dq_friendly_name$ ", I will give you a reward.\"")
    (menu ("I have what you want." L_Trade)
        ("Ok, I'll get to work.")
        ("Nah, I'm not going to help you."))

    (set @dq_return 1)
    (goto L_Exit)

(label L_Trade)
    (if (< (countitem @dq_name$) @dq_count)
        (goto L_Not_Enough))
    # Hm, how can we make the optimizer aware of this?
    # My best guess is to make @dq_name$ a function parameter, so calls can
    # be checked at load time, so the type would be known even though the
    # value isn't
    (delitem @dq_name$ @dq_count)

    (+= zeny @dq_money)
    (getexp @dq_exp)

    (-= DailyQuestPoints @dq_cost)

       
    (mes "\"Thank you!\"")
    (mes "")
    (mes "[" @dq_money " money]")
    (mes "[" @dq_exp " experience points]")

    (return 4)

(label L_Not_Enough)
    (mes "\"I said " @dq_count " " @dq_friendly_name$ "; you should learn to count.\"")
    (return 3)

(label L_Low_Level)
    (mes "\"Hey, you should go kill some things to get stronger first.\"")
    (return 0)

(label L_Not_Enough_Points)
    (mes "\"You look exhausted, maybe you should rest a bit.\"")
    (return 2)
) # function DailyQuest
Former programmer for the TMWA server.
User avatar
Crush
TMW Adviser
TMW Adviser
Posts: 8046
Joined: 25 Aug 2005, 17:08
Location: Germany

Re: tmwAthena data storage proposal

Post by Crush » 15 May 2011, 14:48

The changes you are proposing (new content data file format and new scripting language) mean a workload which would be pretty much equivalent to porting the current world to Manaserv.
  • former Manasource Programmer
  • former TMW Pixel artist
  • NOT a game master

Please do not send me any inquiries regarding player accounts on TMW.


You might have heard a certain rumor about me. This rumor is completely false. You might also have heard the other rumor about me. This rumor is 100% accurate.
Post Reply