[TriLUG] Disk space calculations in Linux
Owen
oberry at trilug.org
Fri Jan 25 10:00:14 EST 2008
On Fri, Jan 25, 2008 at 12:42:32PM +1100, Jeremy Portzer wrote:
> William Sutton wrote:
> > While we're discussing... how much space gets wasted in overhead of files
> > that allocate a particular block size but don't use all of the blocks?
>
> I think you mean, files that don't use all the bytes in a block.
>
> That is an important difference - du - disk usage - will list the actual
> disk usage. The output of du will always be in increments of the file
> system block size (I'm not quite sure exactly how this is determined,
> but in most of my ext3 filesystems, this unit seems to be 4096 bytes,
> determined by running "dump2fs" - there may be simpler way to show this).
>
> For example, the following three files have these sizes shown by ls:
>
> $ ls -l dump*
> -rw-r--r-- 1 root root 0 Jan 24 20:40 dump0.txt
> -rw-r--r-- 1 root root 1 Jan 24 20:40 dump1.txt
> -rw-r--r-- 1 root root 120176 Jan 24 20:34 dump-hda6.txt
> -rw-r--r-- 1 root root 77492 Jan 24 20:34 dump-hdc1.txt
> -rw-r--r-- 1 root root 40910 Jan 24 20:33 dump.txt
>
> But du shows this:
>
> $ du -b dump*
> 0 dump0.txt
> 4096 dump1.txt
> 126976 dump-hda6.txt
> 81920 dump-hdc1.txt
> 40960 dump.txt
>
> Notice that a zero-byte file takes zero space on disk, but a 1-byte file
> takes 4096 bytes on disk, and all other files always use increments of 4096.
>
> For this reason, when you care about the actual space on disk, you
> should use "du" and not "ls". This difference normally doesn't amount
> to much, but it can if you have lots of very small files.
>
> Not sure if this answers a question anyone asked. :-)
Thanks Jeremy. Something else that is relevant to this subject is sparse
files. As per the example from Wikipedia:
$ dd if=/dev/zero of=sparse-file bs=1 count=1 seek=1M
$ ls -lh sparse-file
-rw------- 1 oberry oberry 1.1M 2008-01-25 09:55 sparse-file
$ du -sh sparse-file
5.0K sparse-file
The file is 1M in size but only occupies 5k of disk space. The most
dramatic real life example of this I've seen is with my rtorrent client
... initial file size may be a few hundered MB, but disk usage is only a
few MB's, which gradually increases as the file downloads.
Reading:
* http://en.wikipedia.org/wiki/Sparse_file
* http://en.wikipedia.org/wiki/Comparison_of_file_systems
Owen
More information about the TriLUG
mailing list