Announcement

**nils_** · 11 January 2016, 12:44 PM

And maybe try a kernel build with -march=native and -O3?

**dxxvi** · 11 January 2016, 05:06 PM

Originally posted by Michael View Post

Unfortunately given the ad-blocking situation ...

I have to admit that I'm using ublock origin when reading your site and I don't think I will disable it soon because I want to see it working (to see how good it is). I never understand how ads work. Does it help you anything if I just look at the ads and never click on them? If it does, I am happy to write some js script and run it with phantomjs to go to your site and maybe click on the ads as well (of course the user agent will be exactly the same as the one coming from a real web browser).

**geearf** · 12 January 2016, 01:33 AM

Originally posted by GraysonPeddie View Post

Perhaps Phoronix Test Suite does not understand latency at all...

Perhaps, but then I don't see the point of the test, these kernels/patchset are tuned for latency..

**F1esDgSdUTYpm0iy** · 12 January 2016, 12:59 PM

Originally posted by geearf View Post

Perhaps, but then I don't see the point of the test, these kernels/patchset are tuned for latency..

The point of the tests then becomes to see whether or not the latency tuning has had a significant detrimental effect on its performance/bandwidth. And, for the most part, that does not seem to be the case for Liquorix.

Which is good. Because that just confirms what I've already found through simply using Liquorix -- It's actually pretty damned good.

**geearf** · 12 January 2016, 04:14 PM

Originally posted by F1esDgSdUTYpm0iy View Post

The point of the tests then becomes to see whether or not the latency tuning has had a significant detrimental effect on its performance/bandwidth. And, for the most part, that does not seem to be the case for Liquorix.

Which is good. Because that just confirms what I've already found through simply using Liquorix -- It's actually pretty damned good.

Well yes, of course but you don't get to know what latency gain you get

Though you might say that's better to see yourself than test as you wrote.

**F1esDgSdUTYpm0iy** · 13 January 2016, 07:05 AM

Originally posted by geearf View Post

Though you might say that's better to see yourself than test as you wrote.

Exactly, also because latency is quite hard to measure reliably.

**Michael** · 13 January 2016, 08:01 AM

Originally posted by dxxvi View Post

I have to admit that I'm using ublock origin when reading your site and I don't think I will disable it soon because I want to see it working (to see how good it is). I never understand how ads work. Does it help you anything if I just look at the ads and never click on them? If it does, I am happy to write some js script and run it with phantomjs to go to your site and maybe click on the ads as well (of course the user agent will be exactly the same as the one coming from a real web browser).

Simply viewing the ads is how I'm paid.

**geearf** · 13 January 2016, 07:01 PM

Originally posted by dxxvi View Post

I have to admit that I'm using ublock origin when reading your site and I don't think I will disable it soon because I want to see it working (to see how good it is). I never understand how ads work. Does it help you anything if I just look at the ads and never click on them? If it does, I am happy to write some js script and run it with phantomjs to go to your site and maybe click on the ads as well (of course the user agent will be exactly the same as the one coming from a real web browser).

In few words, you have about 3 ways for ads:
- Pay on impression (That's what we have here, it means that Michael get some money for every thousand times an ad has been shown). Impressions are just about you seeing the ad/brand (akin to TV ads).
- Pay on click (that's how Google makes most of its money). Clicks are for you to either go to a buying website, or at least look more into whatever you're interested (like restaurants/other businesses that call people in the street to come see what they offer)
- Pay on conversion (much rarer, as the advertiser can easily lie to the publisher... but it still exists, for example Michael's Amazon affiliate link, not exactly an ad per se, but the principle is similar.). This is more straightforward, it's about paying for something the ad viewer did.

**ryao** · 15 January 2016, 01:19 PM

Originally posted by Michael View Post

Unfortunately given the ad-blocking situation, lack of other support, and just trying to make ends meet, it really isn't feasible doing the fully investigative part for articles like this where I am not the domain expert on all areas. The point is pushing out a lot of data, ensuring all of the data/benchmarks are reproducible, using relevant hardware, and that hopefully others interested in said results and with more domain expertise on particular area can share some insight or carry out their own research thanks to the data at hand, etc.

If you ask, I am sure that people with actual expertise would be happy to share. Brendan described how easy it is to produce misleading numbers fairly well:

Originally posted by Brendan Gregg

I've seen countless slide decks, blog posts, and articles that present an impressive bar chart of comparitive results, but then no supporting technical evidence. It's been my job to get to the bottom of many of these, and I typically find that they are wrong or misleading almost every time. The primary reason is that they have been run passively, "fire and forget" style, with no additional analysis, and all problems were overlooked.

That said, I have seen my remarks ignored enough times in the past that I am not going to spend much of my time explaining things unless I see something done with what I say. People in other areas of the open source stack feel similarly, although I am not yet as pessimistic as they are, so here are a few tips.

You could run the fs micro-benchmarks Brendan Gregg published:

https://gist.github.com/brendangregg...9698c70d9e7496

The rationale of each is well defined and the results can be meaningful in that context, but they do not provide a complete picture as soon as you step outside of it.

To give an example, here is another "benchmark", although it is really just a sanity test:

http://kevinclosson.net/2012/03/06/y...s-versus-ext4/

That does not work very well on ext4 because all writes through `ext4_file_write_iter()` serialize on the inode lock. That includes AIO and DirectIO. That test will not work on ZFS because we do not support O_DIRECT (due to the lack of standardization and the XFS semantics being incompatible with CoW in general), but you can modify the script to use oflag=sync rather than oflag=odirect, which does work and actually does reveal a scaling issue.

That scaling issue is that although ZFS employs fine grained locking to ensure that writes to different regions of the file can be done concurrently, synchronous operations (e.g. fsync, O_SYNC/O_DSYNC/O_FSYNC, msync, etcetera) serialize in the ZIL commit on the per dataset ZIL. There is some batching that allows multiple ZIL commits to be done simultaneously, but by the time a log commit has started, all other committers that missed that batch must wait for it to finish, with the committers aggregated into the next write out. Consequently, ZFS will scale better than ext4 does with multiple synchronous writers, but neither can presently touch XFS in synchronous I/O when it uses O_DIRECT (which implies O_DSYNC on XFS). It does phenomonally on this by avoiding serialization both at the inode and log level.

As for the actual relevance of concurrent synchronous writes, they are important for keeping latencies down in workloads that rely on synchronous writes such as databases (atomic commits and logging) and virtual machines (flushes) on low latency solid state media. This did not matter for rotational media, but it poses a scaling bottleneck when things become solid state. The bottleneck can be fixed by changing the code to pick the location of the next log commit at the start of an in-progress commit, allow others to go forward after that is picked and block them on the completion of their in-flight predecessors. That is easier said than done, but it is doable and would be a low latency version of what we have today without a disk format change. Lower latencies are possible with a disk format change, but they would logically come after pipelining the intent log.

It is important to note here that the hardware needs to actually support queue depths greater than one and have sufficient headroom for concurrency to matter. The former is not the case on certain early SATA disks that are internally PATA and use PATA-to-SATA bridge chips, various PATA hardware (unless it supported ATA's crippled TCQ) and most likely other hardware of which I am unaware. The latter is unlikely on most hardware, although it is not impossible. Running through something slow like USB 2.0 would be an obvious way to achieve it.

More generally, you should try to abide by best practices when configuring things. In the context of the slightly modified variation of Kevin Closson's test, ZFS has a default 128KB recordsize, so if it were performing well on synchronous IO, we would see a performance penalty unless proper configuration is used. Similarly, a database administrator is not going to run their 8KB recordsize database on a filesystem with a 128KB recordsize or put a pool into production that suffers read-modify-write on disk sectors (e.g. assuming 512-byte sectors on 4096-byte sectors). Coincidentally, that is one reason why separate datasets are recommended for databases' data files and logs, with another reason being recordsize optimization. Such advice is well documented:

Performance tuning - OpenZFS

http://open-zfs.org/wiki/Performance_tuning

The remark about block device sector sizes also applies to other filesystems such as XFS, which used 512-byte sectors by default the last that I checked, although not ext4 unless the device sector size exceeds its 4096-byte default. When automating things, it is important to check these details. If want numbers from worst case configuration scenarios, which do have value when taken as such, you could do runs for properly configured and unconfigured storage stacks.

I can say more, but I am mostly saying this to demonstrate that the advice can be made available should people interested in publishing benchmarks want to publish numbers that are actually meaningful.

Announcement

Linux 4.3 vs. Liquorix 4.3 vs. Linux 4.4 Kernel Tests

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment