MySQL: Time Delayed Replication
I was cruising the MySQL Forge Worklog when I came across the idea of Time Delayed Replication. I had never considered the benefits of deliberately keeping a slave server behind a master.
Kristian Koehntopp gives a good example:
Kristian Koehntopp writes:
TDS: Time delayed SQL_THREAD (Have a replication slave that is always
lagging 30 minutes behind).Currently, replication is a rolling recovery: To set up replication you
restore from a full dump with a binlog position. You then continously
download binlog and roll forward. In case of a master crash a slave is a
readily recovered instance (as opposed to a backup, which still has to be
restored).This protects against crashes, but not against oopses.
A time delayed slave (TDS) is a nice protection against oopses.
Sugar on top addendum: With the binlog being a table, any table type,
it will be very easy to delete an oopsing statement out of the
unapplied binlog queue. Currently, a simple time delayed slave will
protect you against oopses, but it will be very hard to extract the
relevant binlog portion out of the TDS replication log skipping the
oopsing statement.With a MyISAM binlog table, it is just a matter of DELETE FROM
REPLICATION.BINLOG WHERE STATEMENT_ID = 1717;
Definately a cool idea. Instead of having to restore from a backup, you just take down your master, pull out the busted query, let the server catch up, and bring the slave up as the new master. Doing a restore from our current database backups takes about 3-4 hours, and we’re not even huge. It seems like your downtime here would be limited to however long it takes your slave to catch up. Additionally, once you take down your master and remove your bogus query (truncate table perhaps?) you can allow the slave to replicate everything and catch up (no longer limited to 30 min behind, ideally). Depending on your traffic, this could limit your downtime in the case of a catastrophic loss to 20 minutes or so instead of hours, and your data loss would be far less.
7 Responses to MySQL: Time Delayed Replication
Leave a Reply Cancel reply
Recent Comments
- Anil on MySQL Triggers Tutorial
- Ashish on MySQL Triggers Tutorial
- David on iCal Agenda
- jon on IP address geolocation SQL database
- pim on IP address geolocation SQL database
- jnns on Redis Wildcard Delete
- K.C. Murphy on iCal Agenda
- BA on Experts Exchange should be removed from Google search results
- Andrew on Executing multiple curl requests in parallel with PHP and curl_multi_exec
- Stu on Executing multiple curl requests in parallel with PHP and curl_multi_exec
Recent Posts
- New Project: Jester
- Open New Terminal Tip
- Installing MySQLdb on MacOS Lion
- Headless VM Server Using Ubuntu 11.10
- Get rid of Facebook’s Awful Ticker
- Api Tester now hosted on Github
- Trac .11 jQuery bug
- Multiple Filetypes in Vim
- Git Tip: Setting Up Your Remote Server
- Install issue pymongo on OSX (setuptools out of date)
Categories
- amazon (1)
- answerbag (6)
- apache (9)
- apple (8)
- awk (2)
- bbedit (2)
- c++ (3)
- chrome (2)
- cluster (1)
- cocoa (1)
- collective intelligence (1)
- curl (3)
- db2 (1)
- demand media (1)
- ebay (1)
- eclipse (4)
- erlang (13)
- facebook (1)
- fortran (1)
- gen_server (1)
- git (5)
- google (4)
- haddad (1)
- hdf5 (1)
- html (1)
- innodb (1)
- itunes (1)
- java (2)
- jester (1)
- kvm (1)
- launchbar (1)
- leex (1)
- letsgetnuts.com (1)
- libvirt (1)
- links (6)
- linux (27)
- lucene (1)
- mac (16)
- memcached (1)
- misconception (1)
- mobile (1)
- mono (1)
- mssql (1)
- munin (1)
- mysql (31)
- numpy (1)
- oracle (1)
- php (23)
- puppet (4)
- pyparsing (1)
- pytables (1)
- python (11)
- q&a (1)
- quicksilver (1)
- rant (6)
- readynas (1)
- redis (2)
- regex (1)
- replication (1)
- search (1)
- shitty code (1)
- solr (3)
- spaces (1)
- sshfs (1)
- stored procedure (1)
- svn (5)
- textmate (2)
- tips (22)
- trac (1)
- tutorial (4)
- ubuntu (3)
- Uncategorized (4)
- unix (1)
- vim (3)
- virtual box (6)
- vmware (1)
- weird (3)
- wikipedia (1)
- windows (1)
- xcode (1)








been there, done that
we had a slave, lagging between 1-2hrs – not only for admin oopses, but also to recover from malicious user attacks.
since our traffic was hudge – during the day, the slave needed about the sime time to recover, as the lag was – eg, a 1h lag took 1h to catch up…
so every 15 minutes a perl script did start slave, checked if the lag is less than 2hrs – if it was, it did stop the slave, if it was larger, it left the slave running, and checked again in the next 15 minutes.
It’s useful enough, and enough people requested it, that Maatkit has a very elegant implementation of time-delayed replication. It does NOT rely on Seconds_behind_master, which is buggy and unreliable; neither does it read binary logs, which is unacceptably expensive.
I had completely forgot that I suggested that back in 2001. Heh.
I didn’t even realize this went back 7 years. Wow…
We can see a few tools floating around that implements delayed transaction by starting/stopping the replication thread on the slave. This method basically do the trick, but is indeed less than optimal. Not only because it’s a hack, and lacks fine grained control, and impose episodic burst loads on the network and master rep thread.
It would be in fact _way_ more useful if the slaves could still receive the replication logs in real time, as with standard replication, but they could only delay the effective execution of those logs on their database replicas. So that we could have a very efficient way to recover up to the last working point on a slave (ie. to tell a slave : “then now, dear slave, please apply every pending replog up to the transaction $nb, the one where the sysadmin borked the database / where some corruption happened”), then promote the slave as the new main database server, and get back to normal operations.
One could imagine a pool of daisy chained slaves running each with an increased delay (ie. slave1 lagging 1/2 h, slave2 lagging 1 h, slave3 1.5 h etc) : that way, we could choose to recover from the nearest (lesser lagging) slave that still have clean data, even if we discover the corruption a bit late, and we would have a very fast and flexible recovery-from-human-error mechanism. That would be, practically, a kind of real-time incremental hot standby backups (with almost-no-time recover from disaster)
What do you think about such a delayed replication flow ? is this possible/implementable in MySQL ?
Benpi –
I haven’t looked at the MySQL source, but I can’t imagine why it *wouldn’t* be possible. I just don’t think it’s a very high priority.
Obviously I like the idea, I said so in the above post
Just now examined the thread! great work.