a little analysis of lag

Got something on your mind about the project? This is the correct place for that.
Forum rules
This forum is for feature requests, content changes additions, anything not a Bug in the software.
Please report all bugs on the Support Forums
User avatar
o11c
Grand Knight
Grand Knight
Posts: 2262
Joined: 20 Feb 2011, 22:09
Location: ^ ^

a little analysis of lag

Post by o11c »

I played the game again for the first time in a while. I experienced lag irregularly.

Note that I did *not* measure network usage.

At the same time, I ran 'top' in an ssh session. This is what I observed:
  • login-server uses 100% CPU in a fork, briefly, every 5 minutes. This does not have any effect on lag.
  • char-server uses 100% CPU with a different pid (so it must be the fork), much more frequently than that. This does not have any effect on lag, but should probably be fixed anyway.
  • about half the time, when a lag spike happens, the status of the map-server process changes from S to D. I expect that it actually happens all the time, but top and/or my eyes are not updating fast enough (since the lag I experienced was less than a second). at the time, map-server CPU is 1%. If the char-server is running a save, it is also set to D, and CPU usage is between 10% and 24%.
Since TMW uses nonblocking sockets, and does not do name lookup except at process start, a status of D probably means "waiting for the hard drive".

The map-server accesses the hard drive for three reasons (after startup that is):
  • the MOTD when a player logs in, and @help. I plan to fix both of these eventually, but since Platinum has plenty of free RAM, they're unlikely to cause much of a problem.
  • Writing the log. Also, gzipping during log rotation uses system(), which blocks (but would not leave the process with status D).
  • Saving global variables. We should probably clear the obsolete ones, and maybe should maybe decrease the number of high scores from 10 to 5 (affects fluffy hunt as well as the new Illia quest)
Former programmer for the TMWA server.
User avatar
Jenalya
TMW Adviser
TMW Adviser
Posts: 718
Joined: 22 Sep 2010, 20:28

Re: a little analysis of lag

Post by Jenalya »

o11c wrote:Saving global variables. We should probably clear the obsolete ones, and maybe should maybe decrease the number of high scores from 10 to 5 (affects fluffy hunt as well as the new Illia quest)
I just had a look at the save file which contains the global variables. It's overall 63 lines long.
- The fluffy hunting uses 33 lines, which I agree could be reduced to something lower.
-The illia quest uses 14 lines currently. Most of that information is intended to be able to monitor how many people are able to beat the quest to see if it's well-balanced. I talked to V0id and he said he got enough information to tell that, so those data can be reduced, and mostly be removed. $Illia_Win_Counter would be kept, but the detailed information which takes most of the lines can be removed.
- There are 7 variables about the easter event 2010 and 5 variables about the halloween event 2010. I suppose we can delete those, as well as $Golbenez_Inn_Cost.
What's left is:
- $CandyOpsComplete, which sounds like an event variable to me, but I don't know.
- $NPC_NURSE, which is used and not a problem, since it's only one
- $state, which I have no idea what it is, due to this wonderful descriptive name...

Regarding deleting variables. Would it be safe to delete them directly from the save file during the next content release while the servers are shut down? (Of course also remove them from scripts where necessary.)
Or would it be better to e.g. add them to the clear_vars function?
User avatar
Nard
Knight
Knight
Posts: 1113
Joined: 27 Jun 2010, 13:45
Location: France, near Paris

Re: a little analysis of lag

Post by Nard »

According to my experience, the most laggy period is roughly 18:00 to 24:00 server time though it also happens when you were playing. It could be interesting to repeat the experience in that time interval.

Most laggy areas in game are Candor; GY and Cindies events. This leads to think that lags occur mostly when there are many players, mobs and drops on the same map. Thus when clients', network's and server(s)' charge increase, lags frequency and duration increase too. CRC guild offers Candor, Character and Keys while you want to test it again during these events.

I watched Manaplus's pings while playing and did not notice any 5mn multiple periodicity in lags. Eyes are not a very reliable tool though.

I would be interested to know about the command/request buffers history along with the cpu charge. (applies to client too).
"The language of everyday life is clogged with sentiment, and the science of human nature has not advanced so far that we can describe individual sentiment in a clear way." Lancelot Hogben, Mathematics for the Million.
“There are two motives for reading a book; one, that you enjoy it; the other, that you can boast about it.” Bertrand Russell, Conquest of Happiness.
"If you optimize everything, you will always be unhappy." Donald Knuth.
User avatar
0x0BAL
Peon
Peon
Posts: 40
Joined: 19 Dec 2012, 11:36

Re: a little analysis of lag

Post by 0x0BAL »

Another kind of lag occurs when a player drops a lot of items, he can realize it but other players lag very bad.
User avatar
o11c
Grand Knight
Grand Knight
Posts: 2262
Joined: 20 Feb 2011, 22:09
Location: ^ ^

Re: a little analysis of lag

Post by o11c »

Jenalya wrote: - $NPC_NURSE, which is used and not a problem, since it's only one
- $state, which I have no idea what it is, due to this wonderful descriptive name...
$state is from world/map/npc/007-1/voltain.txt

For these - how important is it really that they be persistent across restarts? Though I agree that as they're only one each, they're relatively insignificant.

Jenalya wrote: Regarding deleting variables. Would it be safe to delete them directly from the save file during the next content release while the servers are shut down? (Of course also remove them from scripts where necessary.)
Or would it be better to e.g. add them to the clear_vars function?
Rather, I was thinking in an OnInit function.
Former programmer for the TMWA server.
User avatar
Jenalya
TMW Adviser
TMW Adviser
Posts: 718
Joined: 22 Sep 2010, 20:28

Re: a little analysis of lag

Post by Jenalya »

V0id and I did some commits to remove some global variables from the scripts and I added an invisible NPC to clear the variables we want to remove: https://github.com/jtoelke/tmwa-server- ... 1b2162a85c
I tested locally and it fails to delete the string variables. How can I delete them properly?
User avatar
Nard
Knight
Knight
Posts: 1113
Joined: 27 Jun 2010, 13:45
Location: France, near Paris

Re: a little analysis of lag

Post by Nard »

Shouldn't the posts about variables be better under variable exhaustion topic? :roll:
"The language of everyday life is clogged with sentiment, and the science of human nature has not advanced so far that we can describe individual sentiment in a clear way." Lancelot Hogben, Mathematics for the Million.
“There are two motives for reading a book; one, that you enjoy it; the other, that you can boast about it.” Bertrand Russell, Conquest of Happiness.
"If you optimize everything, you will always be unhappy." Donald Knuth.
User avatar
Jenalya
TMW Adviser
TMW Adviser
Posts: 718
Joined: 22 Sep 2010, 20:28

Re: a little analysis of lag

Post by Jenalya »

Nard wrote:Shouldn't the posts about variables be better under variable exhaustion topic? :roll:
That topic is about player variables, which are saved in world/save/athena.txt.
What I posted is about global variables, which are saved in world/map/save/mapreg.txt by the map-server and based on what o11c observed and described in the first post it might be a cause of the lag.
User avatar
o11c
Grand Knight
Grand Knight
Posts: 2262
Joined: 20 Feb 2011, 22:09
Location: ^ ^

Re: a little analysis of lag

Post by o11c »

Jenalya wrote:V0id and I did some commits to remove some global variables from the scripts and I added an invisible NPC to clear the variables we want to remove: https://github.com/jtoelke/tmwa-server- ... 1b2162a85c
I tested locally and it fails to delete the string variables. How can I delete them properly?
Not sure ... are you sure you're waiting long enough for it to actually save?

I might have time to check on this, but whether I do or not, the relevant breakpoints would be set on mapreg_setregstr and script_save_mapreg.
Former programmer for the TMWA server.
User avatar
Jenalya
TMW Adviser
TMW Adviser
Posts: 718
Joined: 22 Sep 2010, 20:28

Re: a little analysis of lag

Post by Jenalya »

o11c wrote:Not sure ... are you sure you're waiting long enough for it to actually save?
The integer variables were successfully deleted, so yeah.
User avatar
straelyn
Novice
Novice
Posts: 117
Joined: 04 Jan 2013, 21:56

Re: a little analysis of lag

Post by straelyn »

Using the debug feature I've been able to narrow down the two different types of lag I tend to see.
The first is occasionally when fighting mobs I see (in debug/network tab) the ping go from 160ms to 1100ms (when there's a 1 second lag), or ~2000 (when there's a 2 second lag), etc.
The second I only really see when there's a spawn party in town, I noticed my fps drops from 50 down to 5. If I enable texture compression in performance settings, the fps goes back up to normal, but a lot of the images are displayed incorrectly. I assumed it's a problem with my system (relatively new laptop to me, relatively new system I've been building), and it rarely ever happens mind you, but in case there's a possibility this is some kind of client bug I figured I'd mention it here.

edit: Correction, regarding texture compression, I've now got it working (it would seem). Sorry I doubted :wink:
User avatar
o11c
Grand Knight
Grand Knight
Posts: 2262
Joined: 20 Feb 2011, 22:09
Location: ^ ^

Re: a little analysis of lag

Post by o11c »

straelyn wrote:Using the debug feature I've been able to narrow down the two different types of lag I tend to see.
This topic is regarding the server-side lag (ping), note client-side pseudolag (fps).

Although for future reference, I can't say for certain that what I observed is the *only* source of "true" lag.
Former programmer for the TMWA server.
User avatar
Jenalya
TMW Adviser
TMW Adviser
Posts: 718
Joined: 22 Sep 2010, 20:28

Re: a little analysis of lag

Post by Jenalya »

o11c wrote:
Jenalya wrote:V0id and I did some commits to remove some global variables from the scripts and I added an invisible NPC to clear the variables we want to remove: https://github.com/jtoelke/tmwa-server- ... 1b2162a85c
I tested locally and it fails to delete the string variables. How can I delete them properly?
Not sure ... are you sure you're waiting long enough for it to actually save?

I might have time to check on this, but whether I do or not, the relevant breakpoints would be set on mapreg_setregstr and script_save_mapreg.
I had a second look at the issue, and noticed there was a fault in my script, skipping the reset of the loop counter. Sorry for the confusion.
I fixed that, and with the current version we can reduce the size of the mapreg.txt to 18 lines (instead of 63 before).
BoomerTheKran
Peon
Peon
Posts: 47
Joined: 11 Dec 2010, 22:07
Location: Kentucky, USA

Re: a little analysis of lag

Post by BoomerTheKran »

This is just my 40% of a nickel. Dunno for sure if it's useful.

If you watch the client-side ping in debug window of manaplus, where lag exists for the player, you can see ping change to higher numbers when someone talks, logs in or out, or changes clothes, or spins, and sometimes walks through a door. This frequently happens when any player does any of those actions.

Those CPU spikes on server parts are an still probably an issue.

To verify if there are truly networking lag issues, you might try ntop on the server, http://www.ntop.org with logging option. It would take a while to sort through the log(tho there are scripts to help), but might show spikes and maybe uncover some culprits. I haven't used ntop in a while, but when I did, it pointed to changes in packet compression that sped things up(I made the router handle compression and the server just spit things out uncompressed, as routers are more streamlined for that sort of thing). There might even be unneeded packets being sent in/out too, which could be an issue with anything in the chain(server to client, even within the client or server individually).
This IS a sig
shargom
Newly Registered User
Posts: 10
Joined: 19 Jan 2013, 16:54

Re: a little analysis of lag

Post by shargom »

Maybe just focus all your powers on developing manaserv, instead of literally wasting time on eAthena server:codename "The NeverEnding Problems"?
Post Reply