philtortoise System/DBA+: block size matters for disk... and memory

Having a too large block waste disk space (because only part of the block is filled), but also memory cache.
I verified that through a few simple tests.

On this Linux ext3 filesystem I have 4K blocks:

[root@mylinux test]# tune2fs -l /dev/VolGroup00/LogVol00
tune2fs 1.39 (29-May-2006)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 89e74fff-184b-489e-b83b-981a31b66756
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 76840960
Block count: 76816384
Reserved block count: 3840819
Free blocks: 61196276
Free inodes: 75955279
First block: 0
Block size: 4096
Fragment size: 4096
...

Let's create 100000 files of 1 byte, and observ the space used:

[root@mylinux test]# df /; for i in {1..100000} ; do echo > ./file$i ; done ; df /

Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00 297640376 59261952 223015148 21% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00 297640376 59665048 222612052 22% /

[root@mylinux test]# echo $((59665048-59261952))
403096

=> We're using ~ 400 MB on the disk (for 0.1MB of data)

Interestingly, 'ls' will show the proper file size, but disk usage 'du' reports the amount really used on disk:

[root@mylinux test]# ls -l | head
total 800000
-rw-r--r-- 1 root root 1 Jan 25 13:52 file1
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10
-rw-r--r-- 1 root root 1 Jan 25 13:52 file100
-rw-r--r-- 1 root root 1 Jan 25 13:52 file1000
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10000
-rw-r--r-- 1 root root 1 Jan 25 13:53 file100000
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10001
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10002
-rw-r--r-- 1 root root 1 Jan 25 13:52 file10003
[root@mylinux test]# ls | wc -l
100000
[root@mylinux test]# du -sh .
784M .

That is what we want, but it's an example where the sum of 'ls' sizes don't match 'du' or 'df' usage

But perhaps more annoying (and less known) is the cache use:

Let's clear the cache, as much as we can:

[root@mylinux test]# sync ; echo 1 > /proc/sys/vm/drop_caches

And read all these files

[root@mylinux test]# free ; for i in {1..100000} ; do cat ./file$i > /dev/null ; done ; free

total used free shared buffers cached
Mem: 32935792 573268 32362524 0 424 22632
-/+ buffers/cache: 550212 32385580
Swap: 34996216 0 34996216

total used free shared buffers cached
Mem: 32935792 1042208 31893584 0 15640 471984
-/+ buffers/cache: 554584 32381208
Swap: 34996216 0 34996216

[root@mylinux test]# echo $((471984 - 22632))
449352

=>We're also using ~400MB of memory cache for 0.1MB of data!
That is because the cache contain whole blocks. So a large block could also cause cache waste.

(The number is larger because OS activities also used some cache during the test, including reading directory etc.)

Now if the files are closer to the block size: Let's create 100000 files of about 3KB :

(Note I'm not using 4KB because a file of exactly 4K will need 2 blocks because of the i-node structure)

[root@mylinux test]# df /; for i in {1..100000} ; do dd if=/dev/sda bs=3000 count=1 of=./medfile$i &> /dev/null ; done; df /

Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
297640376 59665088 222612012 22% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
297640376 60068324 222208776 22% /
[root@mylinux test]# echo $((60068324-59665088))
403236

[root@mylinux test]# sync ; echo 1 > /proc/sys/vm/drop_caches

[root@mylinux test]# free ;for i in {1..100000} ; do cat ./medfile$i > /dev/null ; done ; free

total used free shared buffers cached
Mem: 32935792 867868 32067924 0 420 22468
-/+ buffers/cache: 844980 32090812
Swap: 34996216 0 34996216
total used free shared buffers cached
Mem: 32935792 1282484 31653308 0 14696 423976
-/+ buffers/cache: 843812 32091980
Swap: 34996216 0 34996216

[root@mylinux test]# echo $((423976-22468))
401508

Again we used 400MB of disk space, and cache. But this time to address ~ 300MB of data

This might be of some importance in some cases.

I am presently trying to optimize a very large source tree compilation (200 000 files), and with a more adjusted block size it helps to fit everything in the memory cache and hopefully improve performances.

philtortoise System/DBA+

Wednesday, January 25, 2012

block size matters for disk... and memory

No comments:

Post a Comment