[TriLUG] fslint

David Black dave at jamsoft.com
Wed Jan 9 22:26:21 EST 2008


Are you thinking of using this kind of solution to de-duplicate files
owned by different users?
As another poster mentioned, hardlinks have minor performance and a
"self maintaining" aspect over symlinks, at the cost of a little
flexibility.  Either way, each de-duplicated file has a single
owner+group and permissions bits.

I recall Netapp not too long ago talking publicly about a new automatic
de-duplication capability in data OnTAP, which hopefully avoids the
above limitation.   Hm, could some advanced Linux filesystem already
have it?  I don't know.

IMHO as soon as multiple users and permissions enter the picture,
de-duplication wants to become a function of the filesystem itself - at
least via another level of inode indirection and automatic cloning of
data blocks when such files get modified.  Until then, the
hash-compare-delete-link utilities like fslint appear to be safe to use
only on a single user's set of files, and then understanding all links
to each de-duplicated file will share the same set of permissions bits.

Dave

Mike Seda wrote:
> all,
> i have a problem where users are filling up a 2.3 TB partition much 
> quicker than expected.
>
> it turns out that there are about 5000 duplicate files there. each file 
> is around 200 MB. i was going to write my own deduplication script when 
>
>   



More information about the TriLUG mailing list