MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes

Warning: This blog post is NOT a walk-through or tutorial. If you don’t know what you’re doing, you could seriously screw up your database. This is just talking theory and ideas on how I solved my problem.

The other night (really, the other morning) I had the wonderful pleasure of trying to convert a table with 40 millions from MyISAM to InnoDB. The reason for the conversion was this table had a high amount of queries, and a high amount of writes. With this table growing and updating daily, the table locks were killing us. Because MyISAM needs to peform a lock on the entire table, we would have dozens of queries backlog as locks and writes were taking place. It was causing the site to screech to a halt, waiting for the locks the be released. It was a total nightmare.

We decided to convert to InnoDB because it allowed for row level locking. There are some drawbacks to using InnoDB, it being slightly slower than MyISAM for reads, and our table is double the size after the conversion.

I started the convertion at 12:00 AM when the site’s traffic was at a low. I disconnected the website from the database so no reads-writes could be excecuted against it. I then created several backups of the entire database. It was a good thing I created several, because one of them actually got messed up and was useless. I downloaded a good copy of the database and restored it to my local machine. I tried issuing the ALTER TABLE command to the table, and after waiting for 2 hours, aborted it. I then did an export of just the table’s data, and truncated it. I then issued the ALTER TABLE command, which on a table with no data executed in less thant .01 seconds. I then restored the dumped data. At first, it would take 0.5 seconds for every 2000 records. However, by the I had restored just 4 million rows, it was taking 5 seconds or more every 2,o00 records. After some rough calculations (and being very tired at 3 AM), I figured it would take around 30 hours, and I said screw it and aborted the restore. I wrote an email to my boss saying that I wouldn’t be able to do the conversion in a timely manner, and we would have to seek a different solution.

Just before I was going to restore my original backup, I had one more idea. I had been reading so much about different storage engines that I remember reading about the MEMORY storage engine. Once again, if you are looking for simple tutorials, this isn’t for you, this is an advanced topic. I just don’t want to see someone saying in a comment that they converted their table to MEMORY, saw awesome performace, lost all their data in a reboot. The MEMORY engine does what it says, it stores data within memory. The HUGE issue is when the mysql instance is reset, the data is just lost.

However, I didn’t need a permamant use of the MEMORY table. If I understood the manuals correctly, InnoDB stores it’s data according to the Primary Key. This table Primary Key was a combination of two INT fields. MyISAM, however, didn’t have the data stored by the Primary Key. So the problem is as the restore proceeds further, its having to search where to insert the data in the table. So I thought “What if I can have it insert in the order it will store it?” I edited my local my.cnf file to allow for 4GB memory tables, and then altered my table locally to use the memory storage engine. Once again, I did this on my local database on my 8GB Memory machine on a dummy database. This was not my production database.

I then restored my dump of the table to my memory table. It went extremely quick, under five minutes easily. I then was able to issue a “ALTER TABLE tbl ORDER BY key1 ASC, key2 ASC”. Because it was in memory, it was extremely quick (although watching mysql use 4 GB of my memory was rather interesting). I then exported this newly sorted table. It was in an order like 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, etc. I then uploaded this table dump to the production server, and tried importing it again.

The task completed in under 30 minutes.

Between exporting, sorting, and re-importing it  had taken about 45 minutes. 30 hours compared to 45 minutes total, it worked so much better. I’ve learned if I have the ability to take a database offline, if I need to perform serious actions of pruning or anything else like that, using the MEMROY engine can make things go a lot smoother and faster. I wish I had known this the first time, that way I could have started at 12:00 AM and be done 45 minutes later. Instead, it took till 6:00 AM, but next time I’ll know better.

As always, hopefully this can help someone else.

18 thoughts on “MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes

  1. I’m having tons of problems with INNODB – constant error 1206. I’ve upped the pool, modified queries to include pk’s … But still I can’t do large deletes, inserts or updates. Big deletes (over 10,000 records) take hours. Insert into’s of more than (400K records takes 6-8 hours).
    So should I modify to MyISAM?

    Like

  2. sweet. I will put this in my idea locker.

    Like

  3. hey, actually, did you do this manually? I’m wondering about scripting this solution and if you worked on that at all.

    thanks. seeya

    Like

  4. Hi, I’m back. So, did you take a look at scripting a “SELET INTO OUTFILE …” + “LOAD DATA INFILE …” data migration? I’ve found using INFILE/OUTFILE *greatly* reduces data migration time. But I don’t know if it is as much of a reduction as you had using this technique.

    Thanks, seeya…

    Like

  5. @JDS

    Interesting, I’ll have to try that out. My guess is that going between files like that skips any type of temporary tables that are created. I’ll have to test that out, because while putting everything into memory was cool, sometimes that much memory isn’t available. That table that has 40 Million Rows now has around 450 Million Rows, and I don’t think it could hold it in memory entirely.

    Like

  6. imran ahmed rahi February 8, 2010 — 1:18 pm

    Hi,

    I read your article after searching google for many hours and visiting several sites. I really liked the MEMORY Storage engine. I knew it theoratically, but never thought how useful it could in situation like yours.
    I am facing similar problem but with inserts of 3 million records from csv file into mysql db.
    I am uploading a csv file with 3 million email lists, parsing them and storing information in INNODB Table. The problem is it takes around 45 minutes to upload 1 lakh contacts. I am targetting to upload 3 million contacts and looking for optimal way to do that.
    Can you give me suggestions towards my approach? I would be very happy.

    Thanks,

    Like

  7. Terrific do the job! This could be the sort of information that ought to be shared around the web. Shame about the search engines for not positioning this post higher!

    Like

  8. Wouldn’t using mysqldump with an order by argument, have been much simpler?
    So:
    1- mysqldump that outputs the rows in sorted order (without schema)
    2- create new table (with new name) with existing schema but new engine.
    3- import data into new table.
    4- drop old table
    5- rename new table to the original table’s name.
    Cheers!

    Like

  9. Or actually: you don’t even have to do a dump. Just do:
    Step 2, and then:
    INSERT INTO new_table (id, etc) VALUES (SELECT id, etc FROM old_table ORDER BY id, etc);
    Then step 4 and 5.

    Like

  10. this doesn’t sound like an advanced topic.. more like someone who doesn’t know what they are doing…

    Like

  11. Looking at some of the latest comments, I thought I should post an update to my story. I had tried to have MySQL dump the data in an ordered fashion. However, it would have to sort through the 40 million rows and because of the memory limits on the DB server, it would have to do this with a temporary table on the disk. So I would issue the command, and it would sit there for an hour while I waited for it to perform it’s operations, and I had no idea how close or far I was for the query to finish, so I aborted it.

    Having mysql dump the data as it was stored was much, much faster, and then inserting into a memory table, reorganizing the data, and re-exporting it was much faster than just summerizing it all into one query. I could look back and see why dumping it in an ordered fashion wasn’t working like I expected, but at the time, this was the shortest distance I could think of to get our website back up and running.

    Like

  12. If you’re having trouble with tmp tables being created on disk, you could modify tmp_table_size for the thread doing the conversion.

    Something like:

    SET tmp_table_size=4*1024*1024*1024; — 4 GB

    INSERT INTO … SELECT … FROM … ORDER BY …;

    This way you end with simpler query and still use memory for the temporary table.

    Like

  13. Justin, this is very interesting ! I have a server with 16gb ram and 4 processors having 4 cores each. I would love to know how to convert my tables (150 aprox) from innodb to memory using some script, I don’t mind loosing data in case reseting server because the data is for demostration only, and I’m gonna have several persons accesing this info at same time, that’s why I need to speed it up.
    There is about 2 million records in the whole database.

    Best regards

    Like

  14. Great Article Justin Carmony. Really amazing. In past I was there in situation where I could have worked like you.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close