In real life, I specialize in performance tuning of Linux servers. Here are some things I've learned about computer performance.
- If your process is running on CPU1, and CPU2 is busy, that's irrelevant. If CPU 3 is also busy, that's irrelevant.
- If the computer has unused RAM, RAM is not a problem.
- Linux processes write to the filesystem buffer. Disk commits happen asynchronously (basically non-blocking), unless you write code that specifically calls for a sync. eAthena does not do this.
- Amount of disk used is irrelevant except in extreme situations (e.g. an SMTP server performing 4000 transactions per second, at 90% full). eAthena is nothing close to an extreme situation.
- A computer has a limited number of Input/Output operations it can process per second. Performance is generally unaffected until the computer reaches some saturation of this. More below.
- Performance data is independent of the tool used to measure it. Some tools are more accurate or more precise than others. This is why I get frustrated when someone insists that their favorite tool will give better answers. Unless you know how your tool is superior to the ones already used, you're arguing that your measuring tool will somehow improve absolute performance, which is nonsense.
Input/Output:
I have found I/O to be the most complex and poorly understood aspect of computer performance. Without going too far into the weeds, I'll relate some things I have learned from experience. Unfortunately, I don't have external sources for this.
Computers have different types of I/O. The most common limitation is disk access. (This is also the only type relevant on this server.) It's easy to see how any computer runs more slowly when it's busy accessing the hard drive. Read operations require the program to wait until it receives the data. Write operations are much more forgiving. Most operating systems like Linux and Windows buffer writes for a few seconds, so that programs can continue without waiting. This works well unless the demands are so heavy that the computer cannot keep up. In that situation, programs must wait for the buffer to have more space. Also, the computer itself can be so busy trying to catch up on disk operations that everything else gets bogged down. Note that occasional and minor disk operations have no measurable effect on performance.
To test whether disk I/O affects lag, I used
iostat to measure disk operations on the Platinum server. I then used
rsync to copy thousands of small files as fast as possible, asked players about lag in the game, and measured again. I cancelled the file copy, measured again, and again asked about lag.
Before the test, Platinum did about 100 reads+writes per second.
During the test, Platinum did 15000-23000 reads+writes per second.
After the test, Platinum did about 100 reads+writes per second.
Players reported no difference in lag when disk I/O increased by a factor of 200. I saw no difference in lag.
I think this test conclusively demonstrates that reducing disk operations on the server will have no measurable effect on lag.
Measurement tools:
Different tools have different degrees of accuracy, and of precision.
"top" is useful for an overall view. It presents data using a combination of averages (CPU metrics) and instantaneous samples (process state). Averages are not precise, and instantanous samples are not necessarily accurate.
To measure disk activity, I used
iostat at 1-second intervals. This gives detailed averages per device, and in my experience the results are accurate within the upper and lower data. (This is why I gave a range.)
One final note: if the main server is really limited by hardware performance, we should easily be able to reproduce the problem on the testing server, which is much weaker.