Over the last week at Dating DNA, we’ve re-engineered our internal mail system that users use to send messages to each other. It took a few days longer to convert and re-engineer everything, but we finally made the change. With the exception of two small bugs, it all went off without a hitch. But before all the different things I learned doing it fade, I wanted to document them here.
First off, converting from one large set of InnoDB tables to a new structure. I’ve done something similar in the past, and I think I’ve honed in a little bit more the technique. So here are some thoughts regarding doing this process.
- Run Conversion on Separate Server – Looking back, I wish I had just spun up a beefy server at Rackspace Cloud, transferred the data, and gone to town running my conversion scripts. There were a few times where I tanked our Database server to the point of noticing slower load times on our website.
- Write to SQL Script Instead Directly – At first when I wrote my conversion scripts, it would read a dataset for a user, and the do inserts for each row into the Database to the new tables. If you’re doing less than 200,000 rows, this probably is fine. But doing 5 million or so takes too long.
- Insert Data By Primary Key Order – You’d think I would have learned my lesson from the last time I did this, but I found out even for a smaller data set of a few million (instead of tens of millions before). Inserting by the correct order saves a ton of time. The easiest way to think of it is if you had to put files into a filing cabinet. It would be much faster to just move them in pre-sorted, instead of sorting them as you add them. Also, InnoDB doesn’t know whether you are inserting 10 rows, or 10 million. So when you insert unsorted, it will spend a lot of resources shuffling around pages.
- Write Your Conversions Tools to be Robust – What you don’t want is for your script to only be able to run start to finish, and if it breaks, you have to start all over. What you want is to be able to run sections, test, rerun sections, convert large all sections, then right before you change to the new script, run the conversion of the newer entries.
- Monitor Status, Progress, & Time Left – One thing you’ll definitely want to know is how long your script will take to run. You will get to be like Windows and try to guess how long something will take (and hopefully be more accurate). I like to keep it simple. I’m converting X1 million rows, I started a Y1 time, and now I’ve completed X2 conversions, and the current Time is Y2. So I figure out how long its taken me to do the ones I’ve finished, and calculate how long it will take to do the rest at the same rate.
Hopefully this will help out someone else who is making a large scale conversion from one table structure to another.