Warning: This blog post is NOT a walk-through or tutorial. If you don’t know what you’re doing, you could seriously screw up your database. This is just talking theory and ideas on how I solved my problem.
The other night (really, the other morning) I had the wonderful pleasure of trying to convert a table with 40 millions from MyISAM to InnoDB. The reason for the conversion was this table had a high amount of queries, and a high amount of writes. With this table growing and updating daily, the table locks were killing us. Because MyISAM needs to peform a lock on the entire table, we would have dozens of queries backlog as locks and writes were taking place. It was causing the site to screech to a halt, waiting for the locks the be released. It was a total nightmare.
We decided to convert to InnoDB because it allowed for row level locking. There are some drawbacks to using InnoDB, it being slightly slower than MyISAM for reads, and our table is double the size after the conversion.
I started the convertion at 12:00 AM when the site’s traffic was at a low. I disconnected the website from the database so no reads-writes could be excecuted against it. I then created several backups of the entire database. It was a good thing I created several, because one of them actually got messed up and was useless. I downloaded a good copy of the database and restored it to my local machine. I tried issuing the ALTER TABLE command to the table, and after waiting for 2 hours, aborted it. I then did an export of just the table’s data, and truncated it. I then issued the ALTER TABLE command, which on a table with no data executed in less thant .01 seconds. I then restored the dumped data. At first, it would take 0.5 seconds for every 2000 records. However, by the I had restored just 4 million rows, it was taking 5 seconds or more every 2,o00 records. After some rough calculations (and being very tired at 3 AM), I figured it would take around 30 hours, and I said screw it and aborted the restore. I wrote an email to my boss saying that I wouldn’t be able to do the conversion in a timely manner, and we would have to seek a different solution.
Just before I was going to restore my original backup, I had one more idea. I had been reading so much about different storage engines that I remember reading about the MEMORY storage engine. Once again, if you are looking for simple tutorials, this isn’t for you, this is an advanced topic. I just don’t want to see someone saying in a comment that they converted their table to MEMORY, saw awesome performace, lost all their data in a reboot. The MEMORY engine does what it says, it stores data within memory. The HUGE issue is when the mysql instance is reset, the data is just lost.
However, I didn’t need a permamant use of the MEMORY table. If I understood the manuals correctly, InnoDB stores it’s data according to the Primary Key. This table Primary Key was a combination of two INT fields. MyISAM, however, didn’t have the data stored by the Primary Key. So the problem is as the restore proceeds further, its having to search where to insert the data in the table. So I thought “What if I can have it insert in the order it will store it?” I edited my local my.cnf file to allow for 4GB memory tables, and then altered my table locally to use the memory storage engine. Once again, I did this on my local database on my 8GB Memory machine on a dummy database. This was not my production database.
I then restored my dump of the table to my memory table. It went extremely quick, under five minutes easily. I then was able to issue a “ALTER TABLE tbl ORDER BY key1 ASC, key2 ASC”. Because it was in memory, it was extremely quick (although watching mysql use 4 GB of my memory was rather interesting). I then exported this newly sorted table. It was in an order like 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, etc. I then uploaded this table dump to the production server, and tried importing it again.
The task completed in under 30 minutes.
Between exporting, sorting, and re-importing it had taken about 45 minutes. 30 hours compared to 45 minutes total, it worked so much better. I’ve learned if I have the ability to take a database offline, if I need to perform serious actions of pruning or anything else like that, using the MEMROY engine can make things go a lot smoother and faster. I wish I had known this the first time, that way I could have started at 12:00 AM and be done 45 minutes later. Instead, it took till 6:00 AM, but next time I’ll know better.
As always, hopefully this can help someone else.