Combat data availability.

Post by **Freeyorp101** » 17 Apr 2013, 08:51

Combat has long been a problem with tmwAthena. Mechanics have typically been added in an ad-hoc way, rather than as part of following some overarching concept or plan.

Part of the problem with wider participation is that it's really quite hard to come up with reliable, reasonable data to discuss things from. In the past, there have been some modifications to clients for some degree of processing - the *.Q clients had a nice Killstats system, which many clients have now added or emulated in some form.

Unfortunately, this is still all very localised, and it's hard and time consuming to get a decent overall impression from this.

However, there is another option available.

Around four years ago, Fate changed the map server to log, amongst other things, basic combat information. What gets attacked, with what, for how much, where, when and more. This is the closest TMW has had to a complete record of everything. The downside is that this is quite a lot of data - the sample data I asked for, a day's worth of logs, has nearly 3 million records.

To try to make sense of things, I've been working on and off on a tool to chart the information. Here's an old screenshot:

: From 30 Mar 2013 - it should have improved a little since then; shuffling.png (166.95 KiB) Viewed 3981 times

It's made using [crossfilter] (that page has a demo so you can get an idea of what it's like

), so it's all interactive, which is half of the fun. A screenshot doesn't come close to conveying what you can find out. Unfortunately, this poses a problem.

The tool itself is, naturally, all [open source] and [publicly accessible]. The data, however, is not.

Historially, the logs have been restricted, leaving analysis to the tiny intersection of server administrators and active developers, with rare exceptions, such as Fate.
This is with good reason, as without processing, the raw logs contain all manner of interactions, such as every time someone logs on, every time someone logs off, every step of every trade and more.

The tool I've made can operate without needing this level of detail.
In particular, it uses or will use:

Every hit and miss in combat, the source, the target, the damage dealt, and the weapon used.
Every spell cast, and whether it succeeded or failed
Every experience gain instance, whether from combat, healing, or scripts.
Stat allocations when someone logs in, out, or allocates points. The logging in and out isn't as important as the stat information - the timestamps can be zeroed out and the records moved to the start of the combined logfile
The timestamp, map, and coordinate positions of all of the above. The logs use timestamps accurate to fractions of a second, but this can be blurred to a fuzzier level of detail without losing any information important to combat analysis.

Player characters are identified by their numeric database ID. This can be similarly blurred.
Everything else can be removed before it's made available.

The question now is how people feel about making combat data available.

On one hand, once people can infer correspondence of an ID (regardless of whether it was blurred from the database ID) to a particular character, which isn't impossible if they already have a rough idea what someone does anyway, they're able to link this precise information about what they do.
I've always been an outspoken privacy advocate, and I can't imagine everyone would be comfortable with this possibility.

On the other hand, availability and making things open to participation has often been a problem for the project, and was certainly a problem the last time people tried to fix mechanics.
Discussions become awkward and uneven where one side can refer to matters that others cannot see, and with the limited developer resources the project has always had, making things as open and with as little a barrier to entry as possible has always been in the best interests of the project.
When the oracle is locked away, the weight is given to the priests, so to speak. Not to mention that it's kinda fun playing around with filters and seeing what you can find out.

At the very least, it should be possible to put up a test server specifically for data collection. It'll never be as representative or as useful as taking even a small snippet from main, but at least people might get something to refer to and talk about.

On the other side of things, how would people feel about going as far ongoing collection and analysis on main, automatically keep everything up to date?

A compromise could be keeping things to a data collection weekend, where people are informed in advance and in the news before they connect that combat logs will be made available for mechanics analysis and balancing efforts. How would people feel about this?

So, where do you stand? These are by no means the only options, so if you've got an idea, feel free to bring it up.

---Freeyorp

Post by **Big Crunch** » 17 Apr 2013, 14:19

If you eliminate the possibility of pinning X action on Y player, ie keeping the stats anonymous, then I'm 100% for this in either form.

BC

Post by **veryape** » 17 Apr 2013, 14:30

As Big Crunch said, keep it anonymous or make it an opt-in in some way.

A solution that I thought might work is just eighter publish say a month old data, that should make it really hard to track down who is who from the stats since you can't possibly remember when a player did what and from that draw conclusions about who is who.

Another possibility might be to wipe timestamps or just set them to time 0 when the log starts and than have timestamps from that, and not reveal when the data was fetched originally. It would be really interesting to see those stats and under the circumstances I posted I don't feel that any "personal data" is revealed since it would make it near unpossible to dechiffre what characther is behind what data.

Post by **Freeyorp101** » 17 Apr 2013, 14:41

Big Crunch wrote:If you eliminate the possibility of pinning X action on Y player, ie keeping the stats anonymous, then I'm 100% for this in either form.

Data is anonymised, but must keep a consistent identifier within the dataset. This means that a numeric ID, say 12345678, is used to denote a character. This can either be the database ID directly, or with some form of noise applied to generate some blurred ID, say 12344383, which would make it even harder to figure out who's who.

One of the advantages offered by keeping this to specific events is the possibility of generating a different blurred identifier each time, which not only makes it very difficult to guess who is who, but it also limits its relevance to the time for which it ran. If I see ID 12344383 is playing in the places where I think Big Crunch would play and guess that 12344383 is Big Crunch, I wouldn't be able to apply this to any later dataset. This would also makes it near impossible to confirm that one's guess is correct.

---Freeyorp

Post by **Big Crunch** » 17 Apr 2013, 14:48

Freeyorp101 wrote:
Big Crunch wrote:If you eliminate the possibility of pinning X action on Y player, ie keeping the stats anonymous, then I'm 100% for this in either form.
Data is anonymised, but must keep a consistent identifier within the dataset. This means that a numeric ID, say 12345678, is used to denote a character. This can either be the database ID directly, or with some form of noise applied to generate some blurred ID, say 12344383, which would make it even harder to figure out who's who.

One of the advantages offered by keeping this to specific events is the possibility of generating a different blurred identifier each time, which not only makes it very difficult to guess who is who, but it also limits its relevance to the time for which it ran. If I see ID 12344383 is playing in the places where I think Big Crunch would play and guess that 12344383 is Big Crunch, I wouldn't be able to apply this to any later dataset. This would also makes it near impossible to confirm that one's guess is correct.

---Freeyorp

If the randomized ID is used, I see no reason why this shouldnt happen in either of the instances. I voted for the second merely because I would hate to think of someone having to constantly (or commit to regular and frequent) pulls and handling of the data

BC

Post by **AnonDuck** » 17 Apr 2013, 15:10

There is no way to anonymize the data. Even scrubbing the user ID and timestamps.

If I took 5 minutes to modify a client to generate similar logs, a 10 line perl script could match everyone's identity in short order. To prevent this sort of thing the data would have to be mangled into near-uselessness.

A better suggestion is to allow users to opt-in for public data collection somehow.

Post by **argul** » 17 Apr 2013, 15:14

I voted for
> Give us full, ongoing, and up to date information

This makes sure that any participant in the balancing discussion has the same data basis and the discussion is fair and equal.

However I'd like to stress the point that the 'full' should be only as full as it respects the privacy of the players.
So do not gather data which would be violate privacy in the first run and blurr out (randomize) any data, which could track back to revealing a specific players identity.

Post by **Nard** » 17 Apr 2013, 15:42

My opinion is that such Data should be public and widely open (Maybe with some encoding for account/character information). The reason is that conclusions that we may retrieve from data analysis can heavily depend on the method employed to represent or model them. So if any discussion arises about a specific point, contradictors must have access to the same dataset in order to be constructive. I hope narrow privacy conceptions will not prevent us to use such interesting data. Closed datasource stats are useless as you can make them say what you want.

As a side remark, when I need powerful computing for data analysis, I use

, which covers quite anything you could need, from the student to the best skilled statistician. Of course R is free software: It is a GNU project . It compiles and runs on a wide variety of UNIX platforms, including MacOSX and windoses. The R users and developers is huge so it is rather easy to get help with it; for example:

Post by **AnonDuck** » 18 Apr 2013, 10:48

I guess nobody got what I said about anonymizing the data..

Say I modified my client to generate the same sort of logs. I'm in a popular area like the Hurnscald Mines just playing as normal.

When I am done playing I can compare my own logs to the server logs and look for sequences of actions from other players(move here, attack a slime for 40 damage, take 10 damage from another slime, healed by another player, runs away and sits). I can then cross reference these unique sequences to de-anonymize the server log for all players I have witnessed perform actions. This is very easy to automate.

Freeyorp101 wrote: Data is anonymised, but must keep a consistent identifier within the dataset. This means that a numeric ID, say 12345678, is used to denote a character. This can either be the database ID directly, or with some form of noise applied to generate some blurred ID, say 12344383, which would make it even harder to figure out who's who.

This means that once I figure out who is who, I can track their every move throughout the entire data set.

There is no way I can think of to prevent this that wouldn't make the logs useless. Making the logs anonymous is not an option.

Post by **Nard** » 18 Apr 2013, 12:37

We are starting again the endless talk which occured with Pjotr's online list Data.
My opinion is that these information include no personal data, thus can be published. The only problem resides in compatibility with roleplay. The decision does NOT belong to players but to TMWC. Take your responsibilities, with this in mind: In statistics open source means open data.
So the choice is a binary one for me: make the data public or suppress the useless logs and former data from the server, you will not be able to take decision based on their results anyway and the server load will be a bit lighter.

Edit à MadCamel: make a watching bot (it requires just little mods of whisper bot), you will obtain good results a lot faster and with less effort than cross correlating these data (I allready said that in discussion about online list).

Edit 2: I didn't vote because my opinion is not in the possible choices: give the data, as complete as possible over a long time period (a year, yearly or similar).

Post by **AnonDuck** » 19 Apr 2013, 05:45

I'm just making sure everyone knows the implications. The data can't be anonymized and includes quite a lot of information that is not otherwise available to the public. Full info on Trades, Stats, Item pickups, Spells, Exp gains, etc etc etc. About the only thing it doesn't log is private whispers.

This is a bit different than the online list which is 100% public data and can be used any way people like (in my opinion)

The dataset includes private transactions between players and information that is not otherwise easily gained. For example if it's released publicly I'll know how much gp people I'm selling to have so I can raise prices to match.. Even if we exclude trades I'll still know how many red slimes they've killed in the past month without having to follow them everywhere (That'd be really HARD and time-consuming work)

And these are just the things *I* can think of and easily implement. Someone else might have even better ideas. Really it just opens a whole can of messy worms. I personally am not comfortable with other players having this level of detail about my activities.

I really do want some better analysis of the server data. I think it could really help the content people balance the game.. But going 100% public with it makes me uncomfortable. I think if anyone expresses interest in processing the data TMWC should talk to them and decide on a case-by-case basis if they should have it.. And they should agree not to share it with others.

Maybe even post a sticky topic with instructions on how to ask for data access.

Post by **Nard** » 19 Apr 2013, 07:14

When you make the scripts, the game and the code open source, you do not care if it breaks the the balance and role play. And it breaks the balance and role play:
Players who are able to find their way on git hub and to understand script files have an advantage over those who can't. (Wiki spoilers counter-balance this a bit). This even obliged devs themselves to cheat with opensource (Hitchhiker) by using magic to get answers from player and hide magic.conf; in order to keep answers hidden.
Players who have programming skills, and used or programmed bots broke the balance and got huge advantages on others. The 4-leafs clover price dropped (to 1/10 of its former value now) after manaplus users were able to select their targets. And money is not the only field where bot users have an unbalanced advantage...
Regarding economy, Thanks to TradeBot and ManaMarket, we can have an objective idea of price statistics, this makes the market a bit more fair. Their features should be included in server as they are in other games. I don't think that the server data would change this a lot: they take into account gifts, or exchanges between friends or alts and mules which will bias the global results.
Exp. gains, money accumulation, are usually used in other games to rank players, guilds... Even if I don't care about that to have an opinion about players I cannot see anything private in this topic.
Who is trustable and who is not is also a big question. In my opinion there are even in TMWC persons who showed that they were not trustable.
My opinion is, just as it was with Pjotr's stats:
I really do want some better analysis of the server data. I think it could really help the content people balance the game.. Open source requires the data are 100% public. I have interest in processing the data and would use them only for TMW interests TMWC should not decide on a case-by-case basis who should have them but only if the data are kept or not. I will never agree not to share it with others and will fight against any published result taken out of them which cannot be verified.
Statistics a matter where it is easy to make misinterpretations, errors and mistakes, in the best case. In the worst one, it is somewhat easy to present partial results to make them say what you want them to say. Thus no public data means untrustable conclusions, every one of us, in our own countries can easily find examples where two politicians draw opposite conclusions from the same public data.

Edit: Public data doesn't mean that conclusions are right: you must also indicate hypothesis and computation process.

Post by **Nard** » 19 Apr 2013, 08:20

Example of usual mistake in data interpretation:

Consider the following data set:

Code: Select all

X	Y
0	5,3433968
1	4,8812892
2	4,4735472
3	4,1201708
4	3,82116
5	3,5765148
6	3,3862352
7	3,2503212
8	3,1687728
9	3,14159
10	3,1687728
11	3,2503212
12	3,3862352
13	3,5765148
14	3,82116
15	4,1201708
16	4,4735472
17	4,8812892
18	5,3433968

If you compute the correlation coefficient of these data, you should find zero. Because we often forget that "correlation coefficient" is in fact "linear correlation coefficient" we may conclude that there is no relation between X and Y.
But we generated these data with the strong functional relation:

Code: Select all

Y=0.0271828*(X-9)²+3.14159

The real good conclusion is: there is no linear relation between X and Y. It is impossible to find the mistake without the data or with only a part of them: data starting from X= 9 give a linear CC of 0,96 from which you may conclude that there is a linear relation between Y and X which is obviously false too. (a graphic would show it though data are not incompatible with a linear model)

Post by **AnonDuck** » 19 Apr 2013, 09:30

You know way more about statistics etc than I do Nard.. That flew way over my head. But I can understand your point about verifiable results.

I think what I'm trying to say here and my motivation for arguing my points can be summed up pretty simply:

a) Anyone who thinks the data could be made anonymous is probably wrong.

b) If the data were made public I would like the ability to opt out.

c) I would also feel very bad about the project as a whole. TMW would be joining the ranks of Every Big Company On The Planet(tm) by gathering and distributing detailed information that could be used against people. By default. I find this type of behavior to be incredibly distasteful on a very personal level. Even when it's not being done for profit. I don't want to feel bad about TMW.

Good thing arguments with Nard are usually constructive

Post by **Nard** » 19 Apr 2013, 13:08

MadCamel wrote: I would also feel very bad about the project as a whole. TMW would be joining the ranks of Every Big Company On The Planet(tm) by gathering and distributing detailed information that could be used against people. By default. I find this type of behavior to be incredibly distasteful on a very personal level. Even when it's not being done for profit. I don't want to feel bad about TMW.

I felt and feel very bad when I saw/see some persons in GHP/TMWC, I felt very bad before and after the move because some actions and sayings were made which were beyond the limits of the normal relation between people who participate willingly to a common open source and free project and even beyonds the legal limits. I feel bad when a developer is able to hide informations or even lie to justify his decisions. I felt and feel bad when a developer speaks about TMW as a "sinking boat" and still wears the dev cap. I feel bad when there is a confusion between real persons privacy and character roleplay. I feel bad when a player announces out of proportions progression and asks to be able to skip the quests to make it even faster (what is the interest of writing stories and draw nice items then?). I feel bad when a drunk admin calls me a bastard or a GM call me an asshole and both call us to common sense or ban a nice guy for calling another payer "idiot" without trying to understand what he could feel (even if he was wrong). I feel sad when I see nice persons despise others because they think different or get angry because they are unable to accept to be wrong. Finally I wonder on which planet I live and get rather angry when I see people able to build fake web site to phish for some virtual hats, or to spend even a small time to dos or ddos the server or it's competitors/forks, because they want to grind a small part of virtual power in a tiny obscure project. What a shame! (and I don't speak about the intrigues of some past GM candidates)
But I feel deeply well when I see such a job as Illia's sisters. I feel wonderfully good when I see how people can give their time and money to host and support this project (yes I include Platyna). I feel well when I see how generous can the GM I quoted can be nice with other players. I have a good mood for several days when I see Gina's trolls, or after her perpendicular events
(parallels goes in the same direction). I am admirative when I see what amount and quality of job that was accomplished here by Jenalya, Alige, 4144,... last two years, and former one by crush, i, FotherJ, o11c, modanung, all GMs... and many others whose job has been more hidden or even not released. Finally I am happy because I met here very nice persons, from 12 (maybe lower) to 60 (maybe more), from Mexico to indonesia, from Russia to Australia and Canada to Poland via Sweden, from Engineer to waitress or schoolboy to doctor; who all brought me this little something that makes real life worth to be lived. (sometimes it can just be a big kick in the ass

)

Which conclusions can I draw from that? Mainly that this project is rather human, that all developers of various forks have roughly the same goals and that almost all of them are gobally trustable and nice persons.
There was a hot debate about Pjotr's data about online list before. The points of friction were similar to those which are developed here. Anybody was able to get these data and they have been public a long time after the debate was closed (maybe they are still public). No one tried to exploit them. Why? because as I said at that time, the job was not worth the result, and you can get the same one (maybe more) with a little observation time and some wiseness.
This is why I have no worry about these data.

Remark:
When we look with some distance at the past and all misunderstandings and disagreements, not to say stupid behaviours that occured, a question arises: What was the loss of time and development power with all the forks that occured and extinguished? Wiseness should make us able to draw lessons from the past.

Combat data availability.

What information do you want made available?

Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.

Re: Combat data availability.