Having a too large block waste disk space (because only part of the block is filled), but also memory cache.
I verified that through a few simple tests.
On this Linux ext3 filesystem I have 4K blocks:
[root@mylinux test]# tune2fs -l /dev/VolGroup00/LogVol00
tune2fs 1.39 (29-May-2006)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 89e74fff-184b-489e-b83b-981a31b66756
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 76840960
Block count: 76816384
Reserved block count: 3840819
Free blocks: 61196276
Free inodes: 75955279
First block: 0
Block size: 4096
Fragment size: 4096
...
- Let's create 100000 files of 1 byte, and observ the space used:
[root@mylinux test]# df /; for i in {1..100000} ; do echo > ./file$i ; done ; df /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00 297640376 59261952 223015148 21% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00 297640376 59665048 222612052 22% /
[root@mylinux test]# echo $((59665048-59261952))
403096
=> We're using ~ 400 MB on the disk (for 0.1MB of data)
Interestingly, 'ls' will show the proper file size, but disk usage 'du' reports the amount really used on disk:
[root@mylinux test]# ls -l | head
total 800000
-rw-r--r-- 1 root root 1 Jan 25 13:52 file1
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10
-rw-r--r-- 1 root root 1 Jan 25 13:52 file100
-rw-r--r-- 1 root root 1 Jan 25 13:52 file1000
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10000
-rw-r--r-- 1 root root 1 Jan 25 13:53 file100000
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10001
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10002
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10003
[root@mylinux test]# ls | wc -l
100000
[root@mylinux test]# du -sh .
784M .
That is what we want, but it's an example where the sum of 'ls' sizes don't match 'du' or 'df' usage
- But perhaps more annoying (and less known) is the cache use:
Let's clear the cache, as much as we can:
[root@mylinux test]# sync ; echo 1 > /proc/sys/vm/drop_caches
And read all these files
[root@mylinux test]# free ; for i in {1..100000} ; do cat ./file$i > /dev/null ; done ; free
total used free shared buffers cached
Mem: 32935792 573268 32362524 0 424 22632
-/+ buffers/cache: 550212 32385580
Swap: 34996216 0 34996216
total used free shared buffers cached
Mem: 32935792 1042208 31893584 0 15640 471984
-/+ buffers/cache: 554584 32381208
Swap: 34996216 0 34996216
[root@mylinux test]# echo $((471984 - 22632))
449352
=>We're also using ~400MB of memory cache for 0.1MB of data!
That is because the cache contain whole blocks. So a large block could also cause cache waste.
(The number is larger because OS activities also used some cache during the test, including reading directory etc.)
- Now if the files are closer to the block size: Let's create 100000 files of about 3KB :
[root@mylinux test]# df /; for i in {1..100000} ; do dd if=/dev/sda bs=3000 count=1 of=./medfile$i &> /dev/null ; done; df /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
297640376 59665088 222612012 22% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
297640376 60068324 222208776 22% /
[root@mylinux test]# echo $((60068324-59665088))
403236
[root@mylinux test]# sync ; echo 1 > /proc/sys/vm/drop_caches
[root@mylinux test]# free ;for i in {1..100000} ; do cat ./medfile$i > /dev/null ; done ; free
total used free shared buffers cached
Mem: 32935792 867868 32067924 0 420 22468
-/+ buffers/cache: 844980 32090812
Swap: 34996216 0 34996216
total used free shared buffers cached
Mem: 32935792 1282484 31653308 0 14696 423976
-/+ buffers/cache: 843812 32091980
Swap: 34996216 0 34996216
[root@mylinux test]# echo $((423976-22468))
401508
Again we used 400MB of disk space, and cache. But this time to address ~ 300MB of data
This might be of some importance in some cases.
I am presently trying to optimize a very large source tree compilation (200 000 files), and with a more adjusted block size it helps to fit everything in the memory cache and hopefully improve performances.
No comments:
Post a Comment