Page 3 of 3

Re: No servers available?

Posted: 19 Feb 2010, 01:46
by AnonDuck
Here's why it was down then:

The login server was laggy. The mmo_auth_sync() function that writes the accounts database was taking an inordinate amount of CPU time due to the many accounts on TMW. Being a single-threaded daemon all other processing is suspended during this operation, leading to horribly long login delays. The problem was possibly exacerbated by the web-based account creation script which likes to repeatedly call ladmin with bad parameters, possibly leading to more calls to mmo_auth_sync() than needed.

Several solutions were tried to mitigate this problem. The login-server was calling mmo_auth_sync() whenever any changes were made to the accounts DB(a lot), so first I removed most calls to the function and set it to be triggered on a 5 minute timer. This was not very effective, again possibly due to the ladmin issue, as I left mmo_auth_sync() in there assuming anything from ladmin could be considered trusted input and should trigger an immediate DB write.

The next attempt involved removing all calls to mmo_auth_sync() except the one triggered by the timer. Jaxad had estimated that it takes around 30 seconds to dump the accounts DB to disk, so it was decided that it would be a good idea to fork(2) off the mmo_auth_sync() function to a child process. This solution had proved successful with the character server and had almost totally eliminated it's lag issues(including party/whisper lag). I pushed the patch and went to sleep. While I was drooling on a pillow, Jax had pushed this code live.

Upon waking up I found that TMW was down. Unfortunately the login server operates slightly differently than the char-server, so what appeared to be a cut&dry copy/paste fix went a bit awry. The char-server relies on a SIGINT handler to clean up after itself when it exits. The login-server had to be funky and uses atexit(3) semantics to run additional code on exit. Since the code that forks off writes calls exit() when the child process is through running mmo_auth_sync(), the atexit() function is being called. I won't bother with a full stack dump here, but the code the atexit() handler is calling eventually closes down all sockets in the process. Now if you know anything about POSIX forking semantics you would know that if a child closes a socket shared with the parent process, it's closed for the parent process also. This led to a condition where the parent login-server process was running it's main select()/accept() loop on a closed socket, spewing errors to the log and chewing 100% CPU. Nice. The solution to this issue was to replace the call to exit() with _exit(), which bypasses the atexit() handler and exits immediately. After pushing these changes, Jax pushed them to the main repo, restarted the server.. and here we are.. It seems there still might be problems.

Satisfied with the explanation? It took me 10 minutes to write up. Time I could have been spending looking into this further.

Re: No servers available?

Posted: 19 Feb 2010, 02:12
by Big Crunch
Thank you for taking the time to explain. I assume that this was posted as a legit explanation and not an attempt to be an exaggeratedly precise post. I would have been satisfied with 'we are having some software issues and we hope to have it taken care of in the next few hours.' I appreciate the level of detail you replied with however. It indicates a high level of commitment to those of us who depend on you guys.


BC

Re: No servers available?

Posted: 19 Feb 2010, 02:12
by meway
Thank you MC but really in the mean time if you have nothing better to do meway.ath.cx GM playground. ^_^ just for now.

Re: No servers available?

Posted: 19 Feb 2010, 02:34
by thedarkfinder
Mad Camel

Thank you for getting our beloved game back up.

Re: No servers available?

Posted: 19 Feb 2010, 03:42
by Jaxad0127
There were some issues with some bug fixes to the login-server. It took a few tries to get everything working right. Everything should be fine now.

Re: No servers available?

Posted: 19 Feb 2010, 16:00
by Big Crunch
It is fixed and working better than ever i might add. Thanks guys.

Re: No servers available?

Posted: 19 Feb 2010, 17:13
by meway
Big Crunch wrote:It is fixed and working better than ever i might add. Thanks guys.
yes, problems with lag I was having before do not seem to be as presented as before. I actually receive no lag now :D