[TriLUG] md5sum question
Kevin Hunter
hunteke at earlham.edu
Thu Nov 6 18:19:10 EST 2008
Jason,
Downthread you say it's 3 inches above your head, so let me attempt an
explanation:
At 4:51pm -0500 Thu, 06 Nov 2008, Marc Wiatrowski wrote:
> find ./dir -type f -exec md5sum {} \; | sort | md5sum
Unix is all about breaking up and simplifying tasks. What Marc gave you is an
algorithm to get a "global" md5sum* of any set of files. In other words, he's
given you a way to calculate a unique signature for your set-of-files as a
*single entity*. He has broken this up into 3 (or 4) steps. Suggest breaking
each piece up to see what it does:
1. find the necessary files to check
find ./dir # find/list all files "rooted" at some base path
find ./dir -type f # normal files only. No symlinks, directories, etc.
example output:
$ find . -type f
./uncles_stitches.jpg
./intrepid-desktop-amd64-kde.iso
./receipt_for_donation.pdf
./map_to_rachaels.pdf
./rename_pc.zip
2. check each file individually, saving the output
... -exec md5sum {} \; # weird syntax, that effectively expands
# to: md5sum file1; md5sum file2; md5sum file3 ...
example output:
$ find . -type f -exec md5sum {} \;
3a97788d4b8d60b3c0b5b418620105c7 ./uncles_stitches.jpg
d3c846ef40b918ddf6154c3d4ce05379 ./intrepid-desktop-amd64-kde.iso
a3b9e35c8aeff480d2c4bb689d2078b9 ./receipt_for_donation.pdf
964420346ee1d95fa815162fd3c8cf74 ./map_to_rachaels.pdf
7e2b1f62fbdc52530a7c75c8c47a7296 ./rename_pc.zip
3. create a consistent ordering
The pipe concept is the intellectual hurdle. You're taking the output of
find (all those md5sums), and giving it to another command.
example output:
$ find . -type f -exec md5sum {} \; | sort
3a97788d4b8d60b3c0b5b418620105c7 ./uncles_stitches.jpg
7e2b1f62fbdc52530a7c75c8c47a7296 ./rename_pc.zip
964420346ee1d95fa815162fd3c8cf74 ./map_to_rachaels.pdf
a3b9e35c8aeff480d2c4bb689d2078b9 ./receipt_for_donation.pdf
d3c846ef40b918ddf6154c3d4ce05379 ./intrepid-desktop-amd64-kde.iso
4. get the "global" snapshot
Step 3, the sort, is the clever bit; it turns the hubbub of output into a
now guaranteed-and-consistent file. So, he creates the "global" md5sum by
summing the now guaranteed-order output:
example output
$ find . -type f -exec md5sum {} \; | sort | md5sum
a0aa83fc3e484425353797e194b92164 -
Hope this helps,
Kevin
* md5sum is what makes this all possible. It's basically an algorithm that
gives you a unique code or signature ("hash") of a file. Try making a file
and run it through md5sum. Change one byte, and run it again. The md5sum
will be completely different and -- importantly -- unique.
Thus, for the algorithm you're calculating the unique "signature" of the set-
of-files (as a single entity), and then checking that the unique signature is
that they're the same on the new system.
More information about the TriLUG
mailing list