Content:
Creating backups of important files is something which is easy to do, but many people only consider once they have already lost their data. You should always keep multiple copies of your most important data, preferably with some of the copies off-site somewhere away from the others. Fire, theft and hardware failure can easily lead to data loss, so you’ll want to have backups to call upon.
It’s also important to check your backups from time to time. Over time, storage will undergo a process known as ‘bit rot’, where the data stored on the device gradually degrades. It can be hard to tell when this happens, though. The first you notice, the data is already damaged. A skip in a music file, a chunk missing from your photo. For a good backup solution, you need a way to detect this damage, and ensure you have a known good copy of each file to call upon.
This article will show you how to use MD5 hashes to keep track of your data integrity, and easily check for damaged files in the future.
Generating the Hashes
To generate a hash for everything in the current folder, run
md5sum * > hashes.chk
The output can be found in the hashes.chk
file.
If you want to include files found in folders, you can use find
to run md5sum
recursively.
find -type f -exec md5sum "{}" + > hashes.chk
This will create a single file, containing hashes for all files either within, or below the current directory.
Checking the Hashes
Checking the hashes is simple. md5sum
includes a command specifically for this purpose. In the directory containing hashes.chk
, run
md5sum -c hashes.chk
md5sum
will work through each file, and output a summary of files which do not match the hash listed in the file. This indicates that the file is damaged, and should be replaced from another backup.
If any of your backup files do need replacing, be sure to check that the copy you’re replacing it with isn’t also damaged. You should also check these hashes before creating new backups, or replacing files in another backup.