Check progress of a mysql database import
If you’ve ever had to do a huge mysql import, you’ll probably understand the pain of not being able to know how long it will take to complete.
At work we use the backup gem to store daily snapshots of our databases, the main one being several gigabytes in size. This gem basically does a
mysqldump with configurable options and takes care of maintaining a number of old snapshots, compressing the data and sending notifications on completion and failure of backup jobs.
When the time comes to restore one of those backups, you are basically in the situation in which you simply have to run a
mysql command with the exported
sql file as input, which can take ages to complete depending on the size of the file and the speed of the system.
The command used to import the database snapshot from the backup gem may look like this:
tar -x -v -O -f database_snapshot.tar path_to_the_database_file_inside_the_tar_file.sql.gz | zcat | mysql -u mysql_user -h mysql_host -ppassword database_name
What this command does is
untar the gzipped file and sending it as an input to a
mysql command to the database you want to restore (passing it through
zcat before to gunzip it).
And then the waiting game begins.
There is a way, though, to get an estimate of the amount of work already done, which may be a big help for the impatiens like myself. You only need to make use of the good
proc filesystem on Linux.
The first thing you need to do is find out the
tar process that you just started:
ps ax | grep "database_snapshot\.tar" | grep -v grep
This last command assumes that no other processes will have that string on their invocation command lines.
We are really interested in the
pid of the process, which we can get with some unix commands and pipes, appending them to the last command:
ps ax | grep "database_snapshot\.tar" | grep -v grep | tail -n1 | cut -d" " -f 1
This will basically get the last line of the process list output (with
tail), separate it in fields using the space as a delimiter and getting the first one (
cut command). Note that depending on your OS and the
ps command output you may have to tweak this.
After we have the
pid of the tar process, we can see what it is doing on the
proc filesystem. The information we are interested in is the file descriptors it has open, which will be in the folder
/proc/pid/fd. If we list the files in that folder, we will get an output similar to this one:
[rails@ip-10-51-43-240 ~]$ sudo ls -l /proc/7719/fd total 0 lrwx------ 1 rails rails 64 Jan 22 15:38 0 -> /dev/pts/1 l-wx------ 1 rails rails 64 Jan 22 15:38 1 -> pipe: lrwx------ 1 rails rails 64 Jan 22 15:36 2 -> /dev/pts/1 lr-x------ 1 rails rails 64 Jan 22 15:38 3 -> /path/to/database_snaphot.tar
The important one for our purposes is the number
3 in this case, which is the file descriptor for the file
tar is unpacking.
We can get this number using a similar strategy:
ls -la /proc/19577/fd/ | grep "database_snaphot\.tar" | cut -d" " -f 9
With that number, we can now check the file
/proc/pid/fdinfo/fd_id, which will contain something like this:
[rails@ip-10-51-43-240 ~]$ cat /proc/7719/fdinfo/3 pos: 4692643840 flags: 0100000
The useful part of this list is the
pos field. This field is telling us in which position of the file the process is now on. Since
tar processes the files sequentially, having this position means we know how much percentage of the file
tar has processed so far.
Now the only thing we need to do is check the original file size of the
tar file and divide both numbers to get the percentage done.
To get the
pos field we can use some more unix commands:
cat /proc/7719/fdinfo/3 | head -n1 | cut -f 2
To get the original file size, we can use the
stat -c %s /path/to/database_snaphot.tar
Finally we can use
bc to get the percentage by just dividing both values:
echo "`cat /proc/7719/fdinfo/3 | head -n1 | cut -f 2`/`stat -c %s /path/to/database_snaphot.tar` * 100" | bc -l
To put it all together in a nice script, you can use this one as a template:
file_path="<full path to your tar db snaphot>" file_size=`stat -c %s $file_path` file="<filename of yout db snapshot>" pid=`ps ax | grep $file | grep -v grep | tail -n1 | cut -d" " -f 1` fdid=`ls -la /proc/$pid/fd/ | grep $file | cut -d" " -f 9` pos=`cat /proc/$pid/fdinfo/$fdid | head -n1 | cut -f 2` echo `echo "$pos / $file_size * 100" | bc -l`
I developed this article and script following the tips in this stack overflow answer: http://stackoverflow.com/questions/5748565/how-to-see-progress-of-csv-upload-in-mysql/14851765#14851765