Wednesday, January 25, 2012

block size matters for disk... and memory



Having a too large block waste disk space (because only part of the block is filled), but also memory cache.
I verified that through a few simple tests.

On this Linux ext3 filesystem I have 4K blocks:

[root@mylinux test]# tune2fs -l /dev/VolGroup00/LogVol00
tune2fs 1.39 (29-May-2006)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          89e74fff-184b-489e-b83b-981a31b66756
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              76840960
Block count:              76816384
Reserved block count:     3840819
Free blocks:              61196276
Free inodes:              75955279
First block:              0
Block size:               4096
Fragment size:            4096
...


  • Let's create 100000 files of 1 byte, and observ the space used:


[root@mylinux test]# df /; for i in {1..100000} ; do echo  > ./file$i ; done ; df /


Filesystem                          1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00     297640376  59261952 223015148  21% /
Filesystem                          1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00     297640376  59665048 222612052  22% /


[root@mylinux test]# echo $((59665048-59261952))
403096


=> We're using ~ 400 MB on the disk (for 0.1MB of data)

Interestingly, 'ls' will show the proper file size, but disk usage 'du' reports the amount really used on disk:

[root@mylinux test]# ls -l | head
total 800000
-rw-r--r-- 1 root root 1 Jan 25 13:52 file1
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10
-rw-r--r-- 1 root root 1 Jan 25 13:52 file100
-rw-r--r-- 1 root root 1 Jan 25 13:52 file1000
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10000
-rw-r--r-- 1 root root 1 Jan 25 13:53 file100000
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10001
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10002
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10003
[root@mylinux test]# ls | wc -l
100000
[root@mylinux test]# du -sh .
784M    .

That is what we want, but it's an example where the sum of 'ls' sizes don't match 'du' or 'df' usage



  • But perhaps more annoying (and less known) is the cache use:


Let's clear the cache, as much as we can:

[root@mylinux test]# sync ; echo 1 > /proc/sys/vm/drop_caches

And read all these files

[root@mylinux test]# free ; for i in {1..100000} ; do cat ./file$i > /dev/null ; done ; free


             total       used       free     shared    buffers     cached
Mem:      32935792     573268   32362524          0        424      22632
-/+ buffers/cache:     550212   32385580
Swap:     34996216          0   34996216


             total       used       free     shared    buffers     cached
Mem:      32935792    1042208   31893584          0      15640     471984
-/+ buffers/cache:     554584   32381208
Swap:     34996216          0   34996216


[root@mylinux test]# echo $((471984 - 22632))
449352

=>We're also using ~400MB of memory cache for 0.1MB of data!
That is because the cache contain whole blocks. So a large block could also cause cache waste.

(The number is larger because OS activities also used some cache during the test, including reading directory etc.)



  • Now if the files are closer to the block size: Let's create 100000 files of  about 3KB :
(Note I'm not using 4KB because a file of exactly 4K will need 2 blocks because of the i-node structure)

[root@mylinux test]# df /; for i in {1..100000} ; do dd if=/dev/sda bs=3000 count=1 of=./medfile$i &> /dev/null ; done; df /


Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                     297640376  59665088 222612012  22% /
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                     297640376  60068324 222208776  22% /
[root@mylinux test]# echo $((60068324-59665088))
403236


[root@mylinux test]# sync ; echo 1 > /proc/sys/vm/drop_caches


[root@mylinux test]# free ;for i in {1..100000} ; do cat ./medfile$i > /dev/null ; done ; free


             total       used       free     shared    buffers     cached
Mem:      32935792     867868   32067924          0        420      22468
-/+ buffers/cache:     844980   32090812
Swap:     34996216          0   34996216
             total       used       free     shared    buffers     cached
Mem:      32935792    1282484   31653308          0      14696     423976
-/+ buffers/cache:     843812   32091980
Swap:     34996216          0   34996216


[root@mylinux test]# echo $((423976-22468))
401508


Again we used 400MB of disk space, and cache. But this time to address ~ 300MB of data


This might be of  some importance in some cases.

I am  presently trying to optimize a very large source tree compilation (200 000 files), and with a more adjusted block size it helps to fit everything in the memory cache and hopefully improve performances.











No comments:

Post a Comment