[TriLUG] md5sum question

Kevin Hunter hunteke at earlham.edu
Thu Nov 6 18:19:10 EST 2008


Jason,

Downthread you say it's 3 inches above your head, so let me attempt an 
explanation:

At 4:51pm -0500 Thu, 06 Nov 2008, Marc Wiatrowski wrote:
> find ./dir -type f -exec md5sum {} \; | sort | md5sum

Unix is all about breaking up and simplifying tasks.  What Marc gave you is an 
algorithm to get a "global" md5sum* of any set of files.  In other words, he's 
given you a way to calculate a unique signature for your set-of-files as a 
*single entity*.  He has broken this up into 3 (or 4) steps.  Suggest breaking 
each piece up to see what it does:

1. find the necessary files to check
   find ./dir		# find/list all files "rooted" at some base path
   find ./dir -type f	# normal files only.  No symlinks, directories, etc.

example output:
$ find . -type f
./uncles_stitches.jpg
./intrepid-desktop-amd64-kde.iso
./receipt_for_donation.pdf
./map_to_rachaels.pdf
./rename_pc.zip

2. check each file individually, saving the output
  ... -exec md5sum {} \;   # weird syntax, that effectively expands
                           # to: md5sum file1; md5sum file2; md5sum file3 ...
example output:
$ find . -type f -exec md5sum {} \;
3a97788d4b8d60b3c0b5b418620105c7  ./uncles_stitches.jpg
d3c846ef40b918ddf6154c3d4ce05379  ./intrepid-desktop-amd64-kde.iso
a3b9e35c8aeff480d2c4bb689d2078b9  ./receipt_for_donation.pdf
964420346ee1d95fa815162fd3c8cf74  ./map_to_rachaels.pdf
7e2b1f62fbdc52530a7c75c8c47a7296  ./rename_pc.zip

3. create a consistent ordering
  The pipe concept is the intellectual hurdle.  You're taking the output of 
find (all those md5sums), and giving it to another command.

example output:
$ find . -type f -exec md5sum {} \; | sort
3a97788d4b8d60b3c0b5b418620105c7  ./uncles_stitches.jpg
7e2b1f62fbdc52530a7c75c8c47a7296  ./rename_pc.zip
964420346ee1d95fa815162fd3c8cf74  ./map_to_rachaels.pdf
a3b9e35c8aeff480d2c4bb689d2078b9  ./receipt_for_donation.pdf
d3c846ef40b918ddf6154c3d4ce05379  ./intrepid-desktop-amd64-kde.iso

4. get the "global" snapshot
  Step 3, the sort, is the clever bit; it turns the hubbub of output into a 
now guaranteed-and-consistent file.  So, he creates the "global" md5sum by 
summing the now guaranteed-order output:

example output
$ find . -type f -exec md5sum {} \; | sort | md5sum
a0aa83fc3e484425353797e194b92164  -

Hope this helps,

Kevin

* md5sum is what makes this all possible.  It's basically an algorithm that 
gives you a unique code or signature ("hash") of a file.  Try making a file 
and run it through md5sum.  Change one byte, and run it again.  The md5sum 
will be completely different and -- importantly -- unique.

Thus, for the algorithm you're calculating the unique "signature" of the set-
of-files (as a single entity), and then checking that the unique signature is  
that they're the same on the new system.



More information about the TriLUG mailing list