The API server is write-heavy. It’s constantly saving state, setting people as online or offline, writing data about games, etc. There are more API writes than reads. I’m using SQLAlchemy and have highly optimized read queries… but writes… well…
I was finding that I was severely IO bound on writes. InnoDB, it seems, likes to write to the transaction log on disk every operation. Amazon’s RDS instances use EBS backing stores, which are nice and write-slow. At the lightest 3 updates per second, I was seeing per-operation disk writes spike up to 600ms and beyond.
I tweaked two settings:
The big one was innodb_flush_log_at_trx_commit – it seems that InnoDB synchronously writes to the transaction log (on disk) for every transaction commit – which would have been several times a second. Turning off innodb_flush_log_at_trx_commit stopped that from happening. Is that OK to do? Yup – because InnoDB will flush the log EVERY SECOND anyway. And it will do it asynchronously.
Now, I can handle the loss of data from 1 second. Some apps can’t. Your mileage will vary.
Turning off innodb_support_xa also moves from a 2-stage commit to a 1-stage commit. Again, it’s giving up a little transactional security, but I don’t particularly need it.
After… well, I’ll let this 24 hour log do my ‘splainin.
See? Much better. Since then, I’ve been trailing this sort of graph, where DB access is almost constant time:
So letting InnoDB write to the transaction log asynchronously is HUGE in an RDS, write-heavy environment. Or so it would seem.