Originally posted by Jade
View Post
Announcement
Collapse
No announcement yet.
Real World Benchmarks Of The EXT4 File-System
Collapse
X
-
-
Originally posted by drag View PostSuse was a early adopter and proponent of ReiserFsv3. They have ReiserFS developers on staff. They show no sign of moving to support v4 in any meaningful way. They too depend heavily on the ability of Linux to compete with Unix, Windows, and especially Redhat. So you would think that if v4 offered a substantial advantage over the more mundane Linux file systems then they would jump at the chance to push their OS forward.
Suse dropped Reiser as it's default filesystem because of several technical problems, as well as problems related to maintenance especially after Chris Mason left (the people basically left holding th bag on maintaining it). That left basically Mahoney to look after it and with it's bug ridden past it just became to big of a headache. It also wasn't so shit hot in performance or reliabilty as well. I wouldn't be surprised if it is soon dropped from the supported filesystems all together in suse.
ReiserFS has no future. It's effectively dead. Time to put it up on the shelf with other innovations like the Superdisk 120 and the 80186.
Comment
-
Which bit of this didn't you understand?
LINUX FILESYSTEM BENCHMARKS
(includes Reiser4 and Ext4)
How to Make a Website with free web hosting services & cheap web hosting for ecommerce & small business hosting. Create & Make a Free Website with Affordable web hosting provider free website promotion tools & web stats. Free Website Builder, Templates, & Best Free Web Hosting. How to Create a Website
Some Amazing Filesystem Benchmarks. Which Filesystem is Best?
RESULT: With compression, REISER4, absolutely SMASHED the other filesystems.
No other filesystem came close (not even remotely close).
Using REISER4 (gzip), rather than EXT2/3/4, saves you a truly amazing 816 - 213 = 603 MB (a 74% saving in disk space), and this, with little, or no, loss of performance when storing 655 MB of raw data. In fact, substantial performance increases were achieved in the bonnie++ benchmarks.
We use the following filesystems:
REISER4 gzip: Reiser4 using transparent gzip compression.
REISER4 lzo: Reiser4 using transparent lzo compression.
REISER4 Standard Reiser4 (with extents)
EXT4 default Standard ext4.
EXT4 extents ext4 with extents.
NTFS3g Szabolcs Szakacsits' NTFS user-space driver.
NTFS NTFS with Windows XP driver.
Disk Usage in megabytes. Time in seconds. SMALLER is better.
Code:.-------------------------------------------------. |File |Disk |Copy |Copy |Tar |Unzip| Del | |System |Usage|655MB|655MB|Gzip |UnTar| 2.5 | |Type | (MB)| (1) | (2) |655MB|655MB| Gig | .-------------------------------------------------. |REISER4 gzip | 213 | 148 | 68 | 83 | 48 | 70 | |REISER4 lzo | 278 | 138 | 56 | 80 | 34 | 84 | |REISER4 tails| 673 | 148 | 63 | 78 | 33 | 65 | |REISER4 | 692 | 148 | 55 | 67 | 25 | 56 | |NTFS3g | 772 |1333 |1426 | 585 | 767 | 194 | |NTFS | 779 | 781 | 173 | X | X | X | |REISER3 | 793 | 184 | 98 | 85 | 63 | 22 | |XFS | 799 | 220 | 173 | 119 | 90 | 106 | |JFS | 806 | 228 | 202 | 95 | 97 | 127 | |EXT4 extents | 806 | 162 | 55 | 69 | 36 | 32 | |EXT4 default | 816 | 174 | 70 | 74 | 42 | 50 | |EXT3 | 816 | 182 | 74 | 73 | 43 | 51 | |EXT2 | 816 | 201 | 82 | 73 | 39 | 67 | |FAT32 | 988 | 253 | 158 | 118 | 81 | 95 | .-------------------------------------------------.
The raw data (without filesystem meta-data, block alignment wastage, etc) was 655MB.
It comprised 3 different copies of the Linux kernel sources.
Disk Usage: The amount of disk used to store the data.
Copy 655MB (1): Time taken to copy the data over a partition boundary.
Copy 655MB (2): Time taken to copy the data within a partition.
Tar Gzip 655MB: Time taken to Tar and Gzip the data.
Unzip UnTar 655MB: Time taken to UnGzip and UnTar the data.
Del 2.5 Gig: Time taken to Delete everything just written (about 2.5 Gig).
Each test was preformed 5 times and the average value recorded.
To get a feel for the performance increases that can be achieved by using compression, we look at the total time (in seconds) to run the test:
bonnie++ -n128:128k:0 (bonnie++ is Version 1.93c)
Code:.-------------------. | FILESYSTEM | TIME | .-------------------. |REISER4 lzo | 1938| |REISER4 gzip| 2295| |REISER4 | 3462| |EXT4 | 4408| |EXT2 | 4092| |JFS | 4225| |EXT3 | 4421| |XFS | 4625| |REISER3 | 6178| |FAT32 | 12342| |NTFS-3g |>10414| .-------------------.
How to Make a Website with free web hosting services & cheap web hosting for ecommerce & small business hosting. Create & Make a Free Website with Affordable web hosting provider free website promotion tools & web stats. Free Website Builder, Templates, & Best Free Web Hosting. How to Create a Website
Last edited by Jade; 04 December 2008, 06:23 AM.
Comment
-
Originally posted by drag View Postbut keep in mind that unlike Ext2->Ext3->Ext4 each new Reiser file system is rewritten from scratch and are not related to one another in any direct manner.
Comment
-
Originally posted by mctop View PostHi,
first of all, thanks for the articel and benchmark.
We are planning to buy a new raid system with around 4TB of storage capacity (actual we have 2TB on ext3). On monthly scheduled administration days we reboot the main server for maintenance (new kernel, surely kick all nfs clients ...). So, from time to time, the raid system will check (tunefs could avoid this, but for safety reasons we perform the complete disk check) the data. This needs hours where you just can wait and wait ....
So, if ext4 would reduce this checking time, i would immediatley change.
Any experiences or a possibility to check this???
Thanks in advance
Here is a Linux admin comparing ZFS with linux filesystems:
Here is a Linux guy setting up a home file server ZFS:
ZFS + 48 SATA discs + dual Opteron and no hardware raid (just plain SATA controller), writes more than 2 GB/sec:
For some testing I'm creating right now 8 raid-5 devices under SVM with 128k interleave size. It's really amazing how much x4500 server can...
And SUN is selling a new Storage device, 7000. Read about "The Killer App". You could download and play with that analysis software that uses unique DTrace in a VMware image (which simulates several discs with ZFS raid):
Create a ZFS raid:
# zpool create myZFSraid disc0 disc1 disc2 disc3
and that is all. No formatting needed, just bang away immediately. Dead simple administration.Last edited by kebabbert; 04 December 2008, 07:16 AM.
Comment
-
bonnie nonsense, and XFS tweaks
first, the bonnie++ benchmark is nonsense. I downloaded the benchmark suite, and
pts/test-resources/bonnie/install.sh makes a bonnie script that will run
Code:./bonnie_/sbin/bonnie++ -d scratch_dir/ -s $2 > $LOG_FILE 2>&1" > bonnie
-n 30:50000:200:8 would be a more interesting test, probably. (file sizes between 50kB (not kiB) and 200B, 30*1024 files spread over 8 subdirs)
A few people have pointed out that XFS has stupid defaults, but nobody posted a good recommendation. I've played with XFS extensively and benchmarked a few different kinds of workloads on HW RAID5 and on single disks. And I've been using it on my desktop for several years now. For general purpose use, I would recommend:
Code:mkfs.xfs -l lazy-count=1,size=128m -L yourlabel /dev/yourdisk mount with -o noatime,logbsize=256k (put that in /etc/fstab)
-l size=128m: XFS likes to have big logs, and this is the max size.
mount -o logbsize=256k: That's log buffer size = 256kiB (of kernel memory). The default (and max with v1 logs) is 32kiB. This makes a factor of > 2 performance difference on a lot of small-file workloads. I think logbufs=8 has a similar effect (the default is 2 log bufs of size 32k. I haven't tested logbus=8,logbsize=256k. The XFS devs frequently recommend to people asking about perf tuning on the mailing list that they use logbsize=256k, but they don't mention increasing logbufs too.
If you have an older mkfs.xfs, get the latest xfsprogs, 2.10.1 has better defaults for mkfs (e.g. unless you set RAID stripe params, agcount=4, which is about as much parallelism as a single disk can give you anyway. The old default was much higher agcount, which could slow down when the disk started to get full.)
Or just use your old mkfs.xfs and specify agcount:
Code:mkfs.xfs -l lazy-count=1,size=128m -L label /dev/disk -d agcount=4 -i attr=2
If you want to start tuning, read up on XFS a bit. http://oss.sgi.com/projects/xfs/ (unfortunately, there's no good tuning guide anywhere obvious on the web site). Read the man page for mkfs.
You can't change the number of allocation groups without a fresh mkfs, but you can enable version 2 logs, and lazy-count, without mkfs. xfs_admin -j -c1 will switch to v2 logs with lazy-count enabled. xfs_growfs says growing the log size isn't supported, which is a problem if you have less than the max size of 128MB, since XFS loves large logs. It lets it have more metadata ops on the fly, instead of being forced to write them out sooner.
if your FS is bigger than 1TB, you should mount with -o inode64, too. Note that contrary to the docs, noikeep is the default. I checked the kernel sources, and that's been the case for a while, I think. Otherwise I would recommend using noikeep to reduce fragmentation.
If you're making a filesystem only a couple GB, like a root fs, a 128MB log will take a serious chunk of the available space. You might be better of with JFS. I'm currently benchmarking XFS with tons of different option combinations for use as a root fs... (XFS block size, and log size, lazy-count=0/1, mount -o logbsize=, and block dev readahead and io elevator)
I use LVM for /usr, /home, /var/tmp (includes /var/cache and /usr/local/src), so my root FS currently is a 1.5GB JFS filesystem that is 54% full. It's on a software RAID1.
Since I run Ubuntu, my /var/lib/dpkg/info has 9373 files out of the total 20794 regular files (27687 inodes) on the filesystem, most of them small.
export LESS=iM
find / -xdev -type f -ls | sort -n -k7 | less -S
then look at the % in less's status line. or type 50% to go to 50% of the file position.
<= 1k: 45%
<= 2k: 52%
<= 3k: 58% (mostly /var/lib/dpkg/info)
<= 4k: 59%
<= 6k: 62%
<= 8k: 64%
<= 16k: 71% (a lot of kernel modules...)
<= 32k: 85%
<= 64k: 93%
<= 128k: 96%
> 1M: 0.2% (57 files)
(I started doing this with find without -type f, and there are lots of small directories (that don't need any blocks outside the inode): < 1k: 59%; < 2k: 64%; < 3k: 68%)
Every time dpkg upgrades a package, or I even run dpkg -S, it reads /var/lib/dpkg/info/*.list (and maybe more). (although dlocate usually works as a replacement for dlocate -S). This usually takes several seconds when the cache is cold on my current JFS filesystem that I created ~2 years ago when I installed the system. This is what I notice as slow on my root filesystem currently. JFS is fine with hot caches, e.g. for /lib, /etc, /bin, and so on. But dpkg is always very slow the first time.
Those small files are probably pretty scattered now, and probably not stored in anything like readdir() order or alphabetical order. I'm hoping XFS will do better than JFS at keeping down fragmentation, although it probably won't. It writes files created at the same time all nearby (it actually tries to make contiguous writes out of dirty data). It doesn't look at where old files in the same directory are stored when trying to decide where to put new files, AFAIK. So I'll probably end up with more scattered files. At least with XFS's batched writeout, mkdir info.new; cp -a info/* info.new; mv ... ; rm -r ...; will work to make a defragged copy of the directory and files in it. (to just defrag the directory, mkdir info.new; ln info/* info.new/; That can make readdir order = alphabetical order. Note using *, which expands to a sorted list, instead of using just cp -a, which will operate in readdir order. dpkg doesn't read in readdir order, it goes (mostly?) alphabetically by package name (based on its status file).)
Anyway, I'm considering using a smaller data block size, like -b size=2k or size=1k, (but -n size=8k, I definitely don't want smaller blocks for directories. There are a lot of tiny directories, but they won't waste 8k because there's room in the inode for their data. See directory sizes with e.g. ls -ld. Larger directory block sizes help to reduce directory fragmentation. And most of the directories on my root filesystem that aren't tiny are fairly large. xfs_bmap -v works on directories, too, BTW). XFS is extent-based, so a small block size doesn't make huge block bitmaps even for large files.
I think I was finding that smaller data block sizes were using more CPU than the default 4k (=max=page size) in hot-cache situations. I compared some results I've already generated, and 1k or 2k does seem slightly faster for untarring the whole FS; drop_caches; tar c | wc -c (so stat+read) ; drop_caches; untar again (overwrite); drop_caches; read some more, timing each component of that. My desktop has been in single-user mode for 1.5 days testing this. I should post my results somewhere when I'm done... And I need to find a good way to explore the 5 (or higher) dimensional data (time as a function of block size, log size, logbuf size, lazy-count=0/1, and deadline vs. cfq, and blockdev --setra 256, 512, or 1024 if I let my tests run that long...).
BTW, JFS is good, and does use less CPU. That won't reduce CPU wakeups to save power, though. FS code mostly runs when called by processes doing a read(2), or open(2), or whatever. Filesystems do usually start a thread to do async tasks, though. But those threads shouldn't be waking up at all when there's no I/O going on.
I decided to use JFS for my root FS a couple years ago after reading
http://www.sabi.co.uk/blog/anno05-4th.html#051226b. I probably would have used XFS, but I hadn't realized that to work around the grub-install issue you just have boot grub from a USB stick or whatever, and type root (hd0,0); setup (hd0). I recently set up a bioinformatics cluster using XFS for root and all other filesystems. It works fine, except that getting GRUB installed is a hassle.
Also BTW, there's a lot of good reading on www.sabi.co.uk. e.g. suggestions for setting up software RAID, http://www.sabi.co.uk/blog/0802feb.html#080217, and lots of filesystem stuff:
XFS is wonderful for large files, and has some neat other features. If you download torrents, you usually get fragmented files because they start sparse and are written in the order the blocks come in. xfs can preallocate space without actually writing it, so you end up with a minimally-fragmented file. azureus has an option to use xfs_io's resvsp command. Linux now has an fallocate(2) command which should work for XFS and ext4. posix_fallocate(3) should use it. I'm not sure if fallocate is actually implemented for xfs yet, but I would hope so since its semantics are the same. And I don't know what glibc version includes an fallocate(2) backend for posix_fallocate(3).
And xfs has nice tools, like xfs_bmap to show you the fragmentation of any file.Last edited by Peter_Cordes; 06 December 2008, 08:09 PM.
Comment
-
Next time consider io_thrash for a benchmark
http://sourceforge.net/project/showf...kage_id=298597 is open source, well documented, and creates a workload that simulates a high end transaction processing database engine.
Disclosure: I manage the product / product (GT.M - http://fis-gtm.com and http://sourceforge.net/projects/fis-gtm) that released io_thrash.
Comment
-
The bonnie++ options used in the benchmarks at:
How to Make a Website with free web hosting services & cheap web hosting for ecommerce & small business hosting. Create & Make a Free Website with Affordable web hosting provider free website promotion tools & web stats. Free Website Builder, Templates, & Best Free Web Hosting. How to Create a Website
were bonnie++ -n128:128k:0
The -n128 means that the test wrote, read and deleted 128k (131,072) files. These were first sequentially, then randomly, written/read/deleted to/from the directory.
The :128k:0 means that every file had a random size between 128k (131,072 bytes) and zero. So the average file-size was 64k.
Comment
-
Originally posted by Kazade View PostI'll be honest, I'm a little confused about using games as a benchmark for a filesystem. Games load resources from the disk before the game play starts, everything from that point on is stored in either RAM or VRAM while the game is in play (unless of course you run out of memory). Only an insane game developer would read or write from the disk during gameplay because it would kill frame rate.
If you were timing the loading times (or game saves) fair enough, but using the frame rate as a bench mark seems pointless.
I think testing game performance isn't a bad idea, but average FPS isn't a good indicator. A utility that works like fraps should be utilized which will show lowest fps/highest fps. The lowest fps score would be the more interesting statistic in a game known to load textures on the fly, even if running under wine.Last edited by psycho_driver; 04 December 2008, 03:17 PM.
Comment
-
Originally posted by kjgust View PostOh dear.. Well first off how can I say this.. You just made me CHOKE on my coffee. Haha, you know, the only time I used reiserFS, it was a bad experience, eventually . So even if it is faster, its definitely not as proven or as reliable as something like EXT3. I personally wouldn't be surprised to see ReiserFS3 be removed from the Linux Kernel eventually. Because from my experience at least, and what I've heard from others, its really not that good.
I've used ReiserFS twice, and both times I had catostrophic filesystem failures within about a year.
Comment
Comment