Game Server Stress Test

banner

Ok so I started to stress test my game server, Gamend (Using k6, a great server stress tool). I wanted to see where are the limits of the system right now.

I set the stress test to execute for 1 minute 5000 flows, where each flow creates a new account (db write) and then reads 10 times the self user data (db read), sleeping 1s in between.

Throughout this whole test I saw what the current bottlenecks are and what I need to improve.

Architecture

But first, the architecture. Clients call into:

  • HTTP Layer, then
  • Business Layer, then
  • Database Layer (SQLite or PostgreSQL) or
  • External Services (OAuth provider, Email Service, etc.).

banner

Out of Memory Problems

Immediately a lot of the calls errored out and hanged after few seconds. Looking at Graphana metrics (image below), I saw that the app went Out of Memory (the red dotted line).

first result

The limits were set the lowest, as I was in development until now:

  • 1 vCPU and 256 MB of ram (about 3$ per month).

So I increased to:

  • 4 vCPU and 1 GB ram (8$ per month).

second result

Now even at 1000 calls simultaneous (hardcoded limit for the app), it wasn’t dying with Out of Memory anymore.

About the 1000 hard limit a bit, this limit ensures some of the 5000 flows will error, but this is by design, as I wanted to test this case also.

Database Timeout Errors

Now that the crashes were out of the way, I got to a 77% success rate on the test, which means only that many calls succeeded, rest failed (5xx errors):

failures

I investigated a bit, and the issue was the database writes and reads. I have a queue for the database, and any database operation that waits for more than few seconds gets imediately cancelled and results in a 5xx error (this is to ensure queued calls don’t cascade into errors one after another).

For context, as I try to minimise costs, I use SQLite (I only pay less than 1$ per month for the disk), but the server supports PostgreSQL also, which handles parallel writes and reads better.

Caching and next steps

The next step to better the results will be to add a caching layer between the app and the database. For this, there are 2 options, in memory one and distributed one:

  • Cachex: Library that works in memory for cache.
  • Redis: Distributed cache system, slower than in memory but works great at scale.

I will probably start with the in memory one, and then either do cache invalidation, or move to Redis.