Check progress of a mysql database import

If you’ve ever had to do a huge mysql import, you’ll probably understand the pain of not being able to know how long it will take to complete.

At work we use the backup gem to store daily snapshots of our databases, the main one being several gigabytes in size. This gem basically does a mysqldump with configurable options and takes care of maintaining a number of old snapshots, compressing the data and sending notifications on completion and failure of backup jobs.

When the time comes to restore one of those backups, you are basically in the situation in which you simply have to run a mysql command with the exported sql file as input, which can take ages to complete depending on the size of the file and the speed of the system.

The command used to import the database snapshot from the backup gem may look like this:

tar -x -v -O -f database_snapshot.tar path_to_the_database_file_inside_the_tar_file.sql.gz | zcat | mysql -u mysql_user -h mysql_host -ppassword database_name

What this command does is untar the gzipped file and sending it as an input to a mysql command to the database you want to restore (passing it through zcat before to gunzip it).

And then the waiting game begins.

There is a way, though, to get an estimate of the amount of work already done, which may be a big help for the impatiens like myself. You only need to make use of the good proc filesystem on Linux.

The first thing you need to do is find out the tar process that you just started:

ps ax | grep "database_snapshot\.tar" | grep -v grep

This last command assumes that no other processes will have that string on their invocation command lines.

We are really interested in the pid of the process, which we can get with some unix commands and pipes, appending them to the last command:

ps ax | grep "database_snapshot\.tar" | grep -v grep | tail -n1 | cut -d" " -f 1

This will basically get the last line of the process list output (with tail), separate it in fields using the space as a delimiter and getting the first one (cut command). Note that depending on your OS and the ps command output you may have to tweak this.

After we have the pid of the tar process, we can see what it is doing on the proc filesystem. The information we are interested in is the file descriptors it has open, which will be in the folder /proc/pid/fd. If we list the files in that folder, we will get an output similar to this one:

[rails@ip-10-51-43-240 ~]$ sudo ls -l /proc/7719/fd
total 0
lrwx------ 1 rails rails 64 Jan 22 15:38 0 -> /dev/pts/1
l-wx------ 1 rails rails 64 Jan 22 15:38 1 -> pipe:[55359574]
lrwx------ 1 rails rails 64 Jan 22 15:36 2 -> /dev/pts/1
lr-x------ 1 rails rails 64 Jan 22 15:38 3 -> /path/to/database_snaphot.tar

The important one for our purposes is the number 3 in this case, which is the file descriptor for the file tar is unpacking.

We can get this number using a similar strategy:

ls -la /proc/19577/fd/ | grep "database_snaphot\.tar" | cut -d" " -f 9

With that number, we can now check the file /proc/pid/fdinfo/fd_id, which will contain something like this:

[rails@ip-10-51-43-240 ~]$ cat /proc/7719/fdinfo/3
pos:    4692643840
flags:  0100000

The useful part of this list is the pos field. This field is telling us in which position of the file the process is now on. Since tar processes the files sequentially, having this position means we know how much percentage of the file tar has processed so far.

Now the only thing we need to do is check the original file size of the tar file and divide both numbers to get the percentage done.

To get the pos field we can use some more unix commands:

cat /proc/7719/fdinfo/3 | head -n1 | cut -f 2

To get the original file size, we can use the stat command:

stat -c %s /path/to/database_snaphot.tar

Finally we can use bc to get the percentage by just dividing both values:

echo "`cat /proc/7719/fdinfo/3 | head -n1 | cut -f 2`/`stat -c %s /path/to/database_snaphot.tar` * 100" | bc -l

To put it all together in a nice script, you can use this one as a template:

file_path="<full path to your tar db snaphot>"
file_size=`stat -c %s $file_path`
file="<filename of yout db snapshot>"
pid=`ps ax | grep $file | grep -v grep | tail -n1 | cut -d" " -f 1`
fdid=`ls -la /proc/$pid/fd/ | grep $file | cut -d" " -f 9`
pos=`cat /proc/$pid/fdinfo/$fdid | head -n1 | cut -f 2`
echo `echo "$pos / $file_size * 100" | bc -l`

I developed this article and script following the tips in this stack overflow answer: http://stackoverflow.com/questions/5748565/how-to-see-progress-of-csv-upload-in-mysql/14851765#14851765