[TriLUG] Bit rot detection without ZFS/btrfs?

Sean Korb spkorb at gmail.com
Mon Jul 1 23:19:58 EDT 2013


The performance need (if there is one) might be at odds with maintaining
data integrity.  I was thinking of a fascinating panel discussion I went to
about exascale file systems.  The panel was unanimous that, at scale, POSIX
compliance would have to be abandoned and file handling, metadata, locking
and other duties would need to be in the software itself.  After all, the
software would know best what kind of tradeoffs were needed assuming it was
well designed.

I thought a fight was going to break out in the audience.  Such heresy!
Why, there were so many advances in Luster or GPFS or PNFS or... well, it
went on for a bit and it was an exciting time all around.

I'm thinking that in traditional file systems posix can only enhance your
ability to prevent bit rot, but maybe a feature of the software itself
would be to maintain check sum hashes for your data.  I can think of a few
scientists that would jump at the idea that their data was always checking
itself during job execution and if you didn't have to maintain metadata
efficiently, you could have performance leftover to do checksum duties.
Heck, a lot of embedded software just calls SQLite anyway. Do you really
*need* to keep permissions, ownership and directories?

Also this (not an example of a bit-rot mitigation plan but kind of neat):
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5750457&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5750457


On Mon, Jul 1, 2013 at 4:20 PM, Francois Dion <francois.dion at gmail.com>wrote:

> You'd have to write a tool to scrub your FS in the background, calculate a
> checksum and compare it against a checksum that is on a different area of
> the disk. That still doesn't give you end to end in flight data integrity,
> but that would be a good start.
>
> Francois
>
>
> On Mon, Jul 1, 2013 at 2:41 PM, Steve Litt <slitt at troubleshooters.com
> >wrote:
>
> > On Sun, 30 Jun 2013 20:42:22 -0400
> > "Randy Barlow" <randy at electronsweatshop.com> wrote:
> >
> > > Hello fellow Linux people,
> > >
> > > I've been wearing my tin foil hat a little bit too much lately, and
> > > I've started to become worried about all the files sitting on my
> > > various hard drives, bit rotting away.
> > >
> > > I am aware that ZFS and btrfs are designed to help with this problem,
> > > but ZFS has its licensing issues, and btrfs isn't yet "blessed" by
> > > the Linux elders.
> > >
> > > Do any of you use anything to detect file corruption on your disks?
> > > I'm mostly interested in detection at this point, as I think I can
> > > pretty well use my backups or off site backups to recover, but I need
> > > something to tell me when I need to do that.
> > >
> > > I've considered writing something homegrown to do this, as it's not
> > > terribly complicated. I could store a checksum on the FS extended
> > > attributes, or maybe just in a database of some kind.
> > >
> > > If anyone knows of a distro package that can do this, I'd love to
> > > hear about it. There are some interesting challenges to get past
> > > (lots of opportunities for false positives, for example when
> > > checksumming a file that is currently being written to by another
> > > process.)
> >
> > I wonder if there's a way to do this with rsync. You can make rsync
> > write to a log. On the destination machine, if a file changed but its
> > date didn't, that's pretty good evidence of bit rot.
> >
> > This has the added benefit of acting as a backup server, and if you use
> > cp -al each time, you keep older versions so you can restore
> > non-bit-rotted files after changing out your hard disk.
> >
> > Thanks,
> >
> > SteveT
> >
> > Steve Litt                *  http://www.troubleshooters.com/
> > Troubleshooting Training  *  Human Performance
> > --
> > This message was sent to: Francois Dion <francois.dion at gmail.com>
> > To unsubscribe, send a blank message to trilug-leave at trilug.org from
> that
> > address.
> > TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> > Unsubscribe or edit options on the web  :
> > http://www.trilug.org/mailman/options/trilug/francois.dion%40gmail.com
> > Welcome to TriLUG: http://trilug.org/welcome
> >
> --
> This message was sent to: Sean Korb <spkorb at gmail.com>
> To unsubscribe, send a blank message to trilug-leave at trilug.org from that
> address.
> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> Unsubscribe or edit options on the web  :
> http://www.trilug.org/mailman/options/trilug/spkorb%40gmail.com
> Welcome to TriLUG: http://trilug.org/welcome
>



-- 
Sean Korb spkorb at spkorb.org http://www.spkorb.org
'65,'68 Mustangs,'68 Cougar,'78 R100/7,'60 Metro,'59 A35,'71 Pantera #1382
"The more you drive, the less intelligent you get" --Miller
"Computers are useless.  They can only give you answers." -P. Picasso


More information about the TriLUG mailing list