April 6, 2011Justin Carmony15 Comments

Restoring Large MySQL Dump – 900 Million Rows

This last weekend I had a fun opportunity of restoring roughly 912 Million Rows to a database. 902 Million belonged to one single table (902,966,645 rows, to be exact). To give an idea of growth, the last time I blogged about this database we had about 40 million rows. This giant table, the “Scores” table, is has a very small schema: two ints, a tiny int, and a DECIMAL(5,2).

Problem

Our current backup system uses mysqldump. It dumps a 25 GB sql dump file, which compresses to about 2.5GB using gzip. The last time we needed to restore a backup it was only about 9GB, and it took several hours.

This time, I created the database, and from the mysql prompt I issued the following command:

. /path/to/backup/database_dump.sql

It would run great until it got to about 10% of the way through the scores table. However, it would start to slow down. Because the rows were so small in our scores table, each INSERT statement had about 45,000-50,000 records. So each line had roughly 1MB of data.

At first it would insert a set of 50,000 in half a second or so. However, after a few million records, it would slow down to three second, and got to about 10 seconds per INSERT statement. This was a huge problem, given that I had roughly 18,000 INSERT statements, and at 10 seconds per INSERT, it would take 50 hours to restore. Our website was down during this restore, since it was our primary database. So being down for over two days was not an option.

While trying to diagnose the problem I noticed something. While using the MySQL command “show processlist” the thread for the Database Restore would be in the sleep state for 9-10 seconds, and then the query would execute in under 0.2 seconds. So it wasn’t a problem with MySQL storing the data, but a problem with reading the data from such a large database dump file.

So I tried from the server’s command line “mysql -u user_name -p database_name < /path/to/backup/database_dump.sql" with the same result. The longer into the file I got, the longer it was taking for MySQL to read the query.

Solution

So, after some thinking late at night at 3 AM, I came up with an idea. Why not split up the database sql dump into multiple files. So I used the linux “split” command like this:

cd /path/to/backup/
mkdir splits
split -n 200 database_backup.sql splits/sql_

This produced several dozen files in order, and it took about 10 minutes. The -n option told split to split each file up into 200 lines each. So the files were then named sql_aa, sql_ab, sql_ac all the way to sql_fg. Then, I did the following command using cat to pipe the files to mysql:

cd splits
cat sql_* | mysql -u root -p database_name

The only problem with this method is you don’t see a status report for each query executed, it just runs until you hit an error, displaying the error. If no errors occur, it will just return you to the prompt. So to monitor the progress I would execute a “show processlist;” command on mysql to see how far we were.

4 1/2 hours later, the entire database was restored. A few things to note, I didn’t try just using cat on the original file to see if it would read the file differently than the was mysql was trying. But the important thing is I got the database restored in a relatively timely manner.

Hopefully, in the very near future, we will have moved to a new score system that doesn’t have almost a billion rows in it.

15 thoughts on “Restoring Large MySQL Dump – 900 Million Rows”

ALastair April 9, 2011 — 12:55 pm

Thanks for the post.

Doesn’t spitting the file break the SQL…. surely it will break in the middle of an INSERT INTO… VALUES query?

Thanks

Al.

LikeLike

Reply
(Post author)

Justin Carmony April 9, 2011 — 4:49 pm

Well, the cat command will send the split files in order, so the new lines received by the mysql command won’t know the difference between newlines separating the different files. So the split command was to help it not read such a large file.

So in short, if I did a cat command on all the split files, or a cat command on the original, the output would be the same.

LikeLike

Reply
Alastair April 9, 2011 — 5:02 pm

Aha… thats awesome!! 🙂

Thanks Justin, I will test and then include in our disaster recovery plan to speed up our 1m row table – in readiness for then we are at 100m too 🙂

Thanks

Al

LikeLike

Reply
Ali December 16, 2011 — 6:45 pm

Is it possible to do the same thing in windows?

LikeLike

Reply
Ova January 13, 2012 — 8:53 pm

Ali, it is probably possible to implement this in windows by using a tools set that implement the linux cshell commands like grep, cat, |, etc.
I knows such tools set exists but I do not remember the name right now, but 5-10 minutes on google would give you the answer.

Justin, I enjoyed using your posts and they are very useful. Thank you!

LikeLike

Reply
Ova January 13, 2012 — 8:57 pm

btw, chewing gum pack $2, cigarettes pack $5, lunch with the team $10, dinner and movie with girlfriend $100, having the opportunity to work on a 900 mil rows table in the middle of the night? priceless! 😉
you lucky dog!

LikeLike

Reply
Shinu March 1, 2012 — 5:01 am

Hi,

I would like take backup of a Mysql DB with 200 million data, which is possible tool to get backup in everyday ?

Could someone please help me.

Thanks,

LikeLike

Reply
Allen Atamer March 21, 2012 — 2:12 pm

My environment is Windows 7 64-bit. My mysql backup file on this table is about 1.6 GB, approximately 6M records.I am using the free windows split tool found in “Swiss File Knife” and broke up the file into 51 Mb pieces.

So far I’m about 25% of the way through and I can’t see an appreciable difference between splitting the files and just running it in the one file. It’s still taking 10-15 seconds per chunk of INSERTs. Each insert is about 3300 records. At least it was worth a shot….

I also tried changing the my.ini setting for max_allowed_packet = 1G and restarted MySQL. That was on another forum. That didn’t help either.

LikeLike

Reply
(Post author)

Justin Carmony March 21, 2012 — 2:16 pm

Allen,

Another option to try is to make sure you add your indexes at the end and not before you insert all of your data. The index maintenance is a lot of overhead and its suppose to be faster to add indexes at the end versus maintaining the indexes through all the INSERTS.

Justin

LikeLike

Reply
vicky May 9, 2012 — 7:25 am

splendid technique justin it really works good but in place of n we need to replace with l in rhel5

LikeLike

Reply
Martin September 11, 2012 — 6:41 am

Erm… so back to your original problem. Why does it take ages to seek through your dump? What file system are you using? What disks/controllers are in the server?

LikeLike

Reply
Doug September 18, 2012 — 8:18 pm

For what it’s worth, I think the magic actually happens in the cat and not the split. My guess is that MySQL reads input files entirely into memory first but cat is a bit smarter about it. Anyhow, if you just cat the full sql file in as you do in your example you should get decent results. That worked for me.

LikeLike

Reply
Hima February 25, 2013 — 1:20 am

Hey,

I need one more your suggestion. how do i go with inserting into the database when my table data is spread across 2 files? Data i am talking is million of records in 2 files. How do i read one record from first file and find the corresponding info in another and put it in the DB.

Need your suggestion to architect the solution.

LikeLike

Reply
sukhdev sokhi March 1, 2013 — 8:16 am

Thanks for the Great solution

LikeLike

Reply
Louis Kidd November 22, 2014 — 6:18 pm

Justin, I would Just like to say you are an absolute legend, you have saved me from this dreaded slow recovery im stuck with, and have made it into a breeze! Thank you so much.

LikeLike

Reply