Page 1 of 3

a little analysis of lag

Posted: 21 Jan 2013, 07:52
by o11c
I played the game again for the first time in a while. I experienced lag irregularly.

Note that I did *not* measure network usage.

At the same time, I ran 'top' in an ssh session. This is what I observed:
  • login-server uses 100% CPU in a fork, briefly, every 5 minutes. This does not have any effect on lag.
  • char-server uses 100% CPU with a different pid (so it must be the fork), much more frequently than that. This does not have any effect on lag, but should probably be fixed anyway.
  • about half the time, when a lag spike happens, the status of the map-server process changes from S to D. I expect that it actually happens all the time, but top and/or my eyes are not updating fast enough (since the lag I experienced was less than a second). at the time, map-server CPU is 1%. If the char-server is running a save, it is also set to D, and CPU usage is between 10% and 24%.
Since TMW uses nonblocking sockets, and does not do name lookup except at process start, a status of D probably means "waiting for the hard drive".

The map-server accesses the hard drive for three reasons (after startup that is):
  • the MOTD when a player logs in, and @help. I plan to fix both of these eventually, but since Platinum has plenty of free RAM, they're unlikely to cause much of a problem.
  • Writing the log. Also, gzipping during log rotation uses system(), which blocks (but would not leave the process with status D).
  • Saving global variables. We should probably clear the obsolete ones, and maybe should maybe decrease the number of high scores from 10 to 5 (affects fluffy hunt as well as the new Illia quest)

Re: a little analysis of lag

Posted: 21 Jan 2013, 11:22
by Jenalya
o11c wrote:Saving global variables. We should probably clear the obsolete ones, and maybe should maybe decrease the number of high scores from 10 to 5 (affects fluffy hunt as well as the new Illia quest)
I just had a look at the save file which contains the global variables. It's overall 63 lines long.
- The fluffy hunting uses 33 lines, which I agree could be reduced to something lower.
-The illia quest uses 14 lines currently. Most of that information is intended to be able to monitor how many people are able to beat the quest to see if it's well-balanced. I talked to V0id and he said he got enough information to tell that, so those data can be reduced, and mostly be removed. $Illia_Win_Counter would be kept, but the detailed information which takes most of the lines can be removed.
- There are 7 variables about the easter event 2010 and 5 variables about the halloween event 2010. I suppose we can delete those, as well as $Golbenez_Inn_Cost.
What's left is:
- $CandyOpsComplete, which sounds like an event variable to me, but I don't know.
- $NPC_NURSE, which is used and not a problem, since it's only one
- $state, which I have no idea what it is, due to this wonderful descriptive name...

Regarding deleting variables. Would it be safe to delete them directly from the save file during the next content release while the servers are shut down? (Of course also remove them from scripts where necessary.)
Or would it be better to e.g. add them to the clear_vars function?

Re: a little analysis of lag

Posted: 21 Jan 2013, 11:53
by Nard
According to my experience, the most laggy period is roughly 18:00 to 24:00 server time though it also happens when you were playing. It could be interesting to repeat the experience in that time interval.

Most laggy areas in game are Candor; GY and Cindies events. This leads to think that lags occur mostly when there are many players, mobs and drops on the same map. Thus when clients', network's and server(s)' charge increase, lags frequency and duration increase too. CRC guild offers Candor, Character and Keys while you want to test it again during these events.

I watched Manaplus's pings while playing and did not notice any 5mn multiple periodicity in lags. Eyes are not a very reliable tool though.

I would be interested to know about the command/request buffers history along with the cpu charge. (applies to client too).

Re: a little analysis of lag

Posted: 21 Jan 2013, 13:58
by 0x0BAL
Another kind of lag occurs when a player drops a lot of items, he can realize it but other players lag very bad.

Re: a little analysis of lag

Posted: 21 Jan 2013, 17:28
by o11c
Jenalya wrote: - $NPC_NURSE, which is used and not a problem, since it's only one
- $state, which I have no idea what it is, due to this wonderful descriptive name...
$state is from world/map/npc/007-1/voltain.txt

For these - how important is it really that they be persistent across restarts? Though I agree that as they're only one each, they're relatively insignificant.

Jenalya wrote: Regarding deleting variables. Would it be safe to delete them directly from the save file during the next content release while the servers are shut down? (Of course also remove them from scripts where necessary.)
Or would it be better to e.g. add them to the clear_vars function?
Rather, I was thinking in an OnInit function.

Re: a little analysis of lag

Posted: 21 Jan 2013, 21:54
by Jenalya
V0id and I did some commits to remove some global variables from the scripts and I added an invisible NPC to clear the variables we want to remove: https://github.com/jtoelke/tmwa-server- ... 1b2162a85c
I tested locally and it fails to delete the string variables. How can I delete them properly?

Re: a little analysis of lag

Posted: 22 Jan 2013, 00:48
by Nard
Shouldn't the posts about variables be better under variable exhaustion topic? :roll:

Re: a little analysis of lag

Posted: 22 Jan 2013, 07:40
by Jenalya
Nard wrote:Shouldn't the posts about variables be better under variable exhaustion topic? :roll:
That topic is about player variables, which are saved in world/save/athena.txt.
What I posted is about global variables, which are saved in world/map/save/mapreg.txt by the map-server and based on what o11c observed and described in the first post it might be a cause of the lag.

Re: a little analysis of lag

Posted: 22 Jan 2013, 21:16
by o11c
Jenalya wrote:V0id and I did some commits to remove some global variables from the scripts and I added an invisible NPC to clear the variables we want to remove: https://github.com/jtoelke/tmwa-server- ... 1b2162a85c
I tested locally and it fails to delete the string variables. How can I delete them properly?
Not sure ... are you sure you're waiting long enough for it to actually save?

I might have time to check on this, but whether I do or not, the relevant breakpoints would be set on mapreg_setregstr and script_save_mapreg.

Re: a little analysis of lag

Posted: 22 Jan 2013, 22:14
by Jenalya
o11c wrote:Not sure ... are you sure you're waiting long enough for it to actually save?
The integer variables were successfully deleted, so yeah.

Re: a little analysis of lag

Posted: 23 Jan 2013, 15:42
by straelyn
Using the debug feature I've been able to narrow down the two different types of lag I tend to see.
The first is occasionally when fighting mobs I see (in debug/network tab) the ping go from 160ms to 1100ms (when there's a 1 second lag), or ~2000 (when there's a 2 second lag), etc.
The second I only really see when there's a spawn party in town, I noticed my fps drops from 50 down to 5. If I enable texture compression in performance settings, the fps goes back up to normal, but a lot of the images are displayed incorrectly. I assumed it's a problem with my system (relatively new laptop to me, relatively new system I've been building), and it rarely ever happens mind you, but in case there's a possibility this is some kind of client bug I figured I'd mention it here.

edit: Correction, regarding texture compression, I've now got it working (it would seem). Sorry I doubted :wink:

Re: a little analysis of lag

Posted: 23 Jan 2013, 17:17
by o11c
straelyn wrote:Using the debug feature I've been able to narrow down the two different types of lag I tend to see.
This topic is regarding the server-side lag (ping), note client-side pseudolag (fps).

Although for future reference, I can't say for certain that what I observed is the *only* source of "true" lag.

Re: a little analysis of lag

Posted: 23 Jan 2013, 21:57
by Jenalya
o11c wrote:
Jenalya wrote:V0id and I did some commits to remove some global variables from the scripts and I added an invisible NPC to clear the variables we want to remove: https://github.com/jtoelke/tmwa-server- ... 1b2162a85c
I tested locally and it fails to delete the string variables. How can I delete them properly?
Not sure ... are you sure you're waiting long enough for it to actually save?

I might have time to check on this, but whether I do or not, the relevant breakpoints would be set on mapreg_setregstr and script_save_mapreg.
I had a second look at the issue, and noticed there was a fault in my script, skipping the reset of the loop counter. Sorry for the confusion.
I fixed that, and with the current version we can reduce the size of the mapreg.txt to 18 lines (instead of 63 before).

Re: a little analysis of lag

Posted: 23 Jan 2013, 22:30
by BoomerTheKran
This is just my 40% of a nickel. Dunno for sure if it's useful.

If you watch the client-side ping in debug window of manaplus, where lag exists for the player, you can see ping change to higher numbers when someone talks, logs in or out, or changes clothes, or spins, and sometimes walks through a door. This frequently happens when any player does any of those actions.

Those CPU spikes on server parts are an still probably an issue.

To verify if there are truly networking lag issues, you might try ntop on the server, http://www.ntop.org with logging option. It would take a while to sort through the log(tho there are scripts to help), but might show spikes and maybe uncover some culprits. I haven't used ntop in a while, but when I did, it pointed to changes in packet compression that sped things up(I made the router handle compression and the server just spit things out uncompressed, as routers are more streamlined for that sort of thing). There might even be unneeded packets being sent in/out too, which could be an issue with anything in the chain(server to client, even within the client or server individually).

Re: a little analysis of lag

Posted: 24 Jan 2013, 08:18
by shargom
Maybe just focus all your powers on developing manaserv, instead of literally wasting time on eAthena server:codename "The NeverEnding Problems"?