Check progress of a mysql database import
If you’ve ever had to do a huge mysql import, you’ll probably understand the pain of not being able to know how long it will take to complete.
At work we use the backup gem to store daily snapshots of our databases, the main one being several gigabytes in size. This gem basically does a mysqldump
with configurable options and takes care of maintaining a number of old snapshots, compressing the data and sending notifications on completion and failure of backup jobs.
When the time comes to restore one of those backups, you are basically in the situation in which you simply have to run a mysql
command with the exported sql
file as input, which can take ages to complete depending on the size of the file and the speed of the system.
The command used to import the database snapshot from the backup gem may look like this:
What this command does is untar
the gzipped file and sending it as an input to a mysql
command to the database you want to restore (passing it through zcat
before to gunzip it).
And then the waiting game begins.
There is a way, though, to get an estimate of the amount of work already done, which may be a big help for the impatiens like myself. You only need to make use of the good proc
filesystem on Linux.
The first thing you need to do is find out the tar
process that you just started:
This last command assumes that no other processes will have that string on their invocation command lines.
We are really interested in the pid
of the process, which we can get with some unix commands and pipes, appending them to the last command:
This will basically get the last line of the process list output (with tail
), separate it in fields using the space as a delimiter and getting the first one (cut
command). Note that depending on your OS and the ps
command output you may have to tweak this.
After we have the pid
of the tar process, we can see what it is doing on the proc
filesystem. The information we are interested in is the file descriptors it has open, which will be in the folder /proc/pid/fd
. If we list the files in that folder, we will get an output similar to this one:
The important one for our purposes is the number 3
in this case, which is the file descriptor for the file tar
is unpacking.
We can get this number using a similar strategy:
With that number, we can now check the file /proc/pid/fdinfo/fd_id
, which will contain something like this:
The useful part of this list is the pos
field. This field is telling us in which position of the file the process is now on. Since tar
processes the files sequentially, having this position means we know how much percentage of the file tar
has processed so far.
Now the only thing we need to do is check the original file size of the tar
file and divide both numbers to get the percentage done.
To get the pos
field we can use some more unix commands:
To get the original file size, we can use the stat
command:
Finally we can use bc
to get the percentage by just dividing both values:
To put it all together in a nice script, you can use this one as a template:
I developed this article and script following the tips in this stack overflow answer: http://stackoverflow.com/questions/5748565/how-to-see-progress-of-csv-upload-in-mysql/14851765#14851765