Skip to content

Maintaining Accuracy

Kent Coble edited this page Feb 22, 2020 · 1 revision

How accurate is the data pulled?

There have been some comments made in the past about the accuracy of the data here. So let's go over a few things.

Polling Frequency

Polling can only happen as fast as the API allows for. By default, Server API keys are limited to 5 IP addresses with 20 requests per second. Client API keys have no IP address limit but can only query up to 10 times per second. Assuming you ran everything on a PC at home with a Server API key:

20 requests/sec * 100 players/request = 2000 players/sec

1,083,600,000 - 1,073,740,000 = 9,860,000 PS4 Player IDs

15,200,000 - 5,000 = 15,195,000 Xbox Player IDs

9,860,000 + 15,195,000 = 25,055,000 players

25,055,000 players / 2000 players/sec = 12,527.5 seconds = 208.792 minutes = 3.48 hours

This is actually almost exactly what I see on my own server. Without additional help, it's not possible to run the query every hour. For my own purposes, running it daily is sufficient. Since players will be queried around the same time each day (Xbox players first, PS players last), having a three-hour difference between the first and last player has no real impact. You'll have a near-24-hour window between queries for each user.

Speeding up the work by cutting it down to completing within an hour window is impressive but serves no real purpose in terms of accuracy. Whether you are checking every hour or every day, the data is the same regardless. If the last player to be queried happens to play 15 matches between the time when querying begins and when the server sends the batch of work that includes their account ID, while the first player queried plays 20 matches before that other player is pulled from the API, it doesn't have a real impact on the data. We are looking at trends over time for how often players are active, and whether it's counted for the previous or current day has little impact over a 100-day view.

The Source of the Data

In the link above, it is noted that the public API is a different database than that of what WG employees use. They also claim that tank and player's performance stats are accurate, but other information is not as trustworthy. They use this to claim that the models previously presented are, therefore, incorrect.

While we cannot confirm or deny this, this project actually utilizes the data they claim to be accurate. By tracking the number of battles a player has participated in, we can determine how alive the multiplayer experience is. Yes, we do not have the ability to view PvE or special competition events, but players are more concerned regarding multiplayer battles and the waiting time in queue. We don't have to look at any metadata, just the number of times a player has hopped into a tank and processed by the matchmaker algorithm.