InnoDB tweaking on AWS RDS for write-heavy apps

Along with the coolest game company on Earth, I launched a little game recently. The pick-up has been great… but I quickly learned a little something about RDS.

The API server is write-heavy. It’s constantly saving state, setting people as online or offline, writing data about games, etc. There are more API writes than reads. I’m using SQLAlchemy and have highly optimized read queries… but writes… well…

I was finding that I was severely IO bound on writes. InnoDB, it seems, likes to write to the transaction log on disk every operation. Amazon’s RDS instances use EBS backing stores, which are nice and write-slow. At the lightest 3 updates per second, I was seeing per-operation disk writes spike up to 600ms and beyond.

Wut. Slow.

I tweaked two settings:

The big one was innodb_flush_log_at_trx_commit – it seems that InnoDB synchronously writes to the transaction log (on disk) for every transaction commit – which would have been several times a second. Turning off innodb_flush_log_at_trx_commit stopped that from happening. Is that OK to do? Yup – because InnoDB will flush the log EVERY SECOND anyway. And it will do it asynchronously.

Now, I can handle the loss of data from 1 second. Some apps can’t. Your mileage will vary.

Turning off innodb_support_xa also moves from a 2-stage commit to a 1-stage commit. Again, it’s giving up a little transactional security, but I don’t particularly need it.

After… well, I’ll let this 24 hour log do my ‘splainin.

Much better.

See? Much better. Since then, I’ve been trailing this sort of graph, where DB access is almost constant time:

So letting InnoDB write to the transaction log asynchronously is HUGE in an RDS, write-heavy environment. Or so it would seem.

Tagged as: , , ,
  • http://cbcg.net Toby DiPasquale

    Good stuff! Congrats on the successful release, too!

  • Pingback: Reducing Writes on Amazon RDS | Kloppmagic.ca()

  • Robert Kulagowski

    Did you need to do something to actually activate this setting on RDS? I’ve made the change using the rds CLI, but don’t actually see any performance change in my application. Other folks have posed on the AWS RDS forum, but no definitive answers from Amazon.

  • http://davemartorana.com Dave Martorana

    Hey Robert,

    First, make sure your DB is in the appropriate parameter group – then, you’ll likely have to reboot your DB instance. That should get you there.

    If you continue to see huge write spikes, you might post to the forums, often an AWS engineer will do a quick hardware check to make sure the issue isn’t a hardware one.

  • http://www.facebook.com/ScrabbleBug Charlie Smith

    Does this mean that if innodb_flush_log_at_trx_commit=0 that the insert/update query wont take affect until after the 1 second? or does it mean that if I did an insert and the system crashed that it just means that those transactions wont be recoverable after the crash?

  • http://davemartorana.com Dave Martorana

    Hey Charlie,

    It’s not that the query won’t take effect, it does – it’s that it doesn’t flush the transaction to the transaction log, which would be used to rebuild your crashed DB.

    So, yes, you could theoretically lose up to an upper-bound of 1 second’s worth of transactional commits. For me and what I do, that’s a no-brainer, considering the median is 1/2 a second. For you, if you’re doing thousands of transactions per second, you may not want to open yourself to the possibility of losing up to one second’s worth of transactions.

    Dave

  • Josh Stuart

    Thank you for this!

  • bvanklinken

    Please be careful with this. Read the documentation on http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

    For durability and consistency in a replication setup that uses InnoDB with transactions:
    * If binary logging is enabled, set sync_binlog=1.
    * Always set innodb_flush_log_at_trx_commit=1.

    Meaning if you are using Multi-AZ this may not be such a good idea