Announcement

**markg85** · 06 July 2020, 06:46 AM

Just for fun.
I did an strace -c on a folder opened in dolphin. The folder only contains 2048 jpg files (wallpaper sizes of full hd resolution)

Without thumbnail presentation (they are generated before this run)
28114 syscalls!
Look here for the full output: https://p.sc2.nl/TWKfB

With thumbnail presentation (note that dolphin does that for only those that are in your view, so this is even optimized)
39984 syscalls!
Look here for the full output: https://p.sc2.nl/un8r5

Keep in mind that this is Dolphin. Thus it uses KIO. That means you don't see the actual opening of the thumbnail files in strace as that's done in IO slave handlers in other processes. Having said that, now i'm REALLY curious what those statx calls, read and writes do and where they come from. It's also interesting to see so many errors! I think i have to investigate a bit there to figure out what's going on.

Note the high time percentage of futex too. There's room for improvements i think. But does it matter? This many stat calls and dolphin still starts in mere milliseconds and you're off browsing the files. In other terms, reducing stat calls here to the bare minimum is likely not going to result in any perceived speedup. But it will likely result in the CPU being done (much) faster thus for energy savings this is beneficial. If it's worth it is another matter entirely

**markg85** · 06 July 2020, 09:19 AM

Originally posted by atomsymbol

I hope you do realize that there exist far better [research] methods of how to increase performance of kernel-userspace transitions, by a factor of 10 at least compared to what Linux is currently doing.

Enlighten me

As far as i know, io_uring is the best there is going to be. And it won't even come close to a 10x improvement (more like 0.2x) the main difference being fully async in nature.
So i'm really curious to know your 10x claim. Do back it up by an article that shows the performance supremacy.

**pal666** · 06 July 2020, 09:52 AM

Originally posted by ssokolow View Post

It's getting more popular to compile code to ABIs

it's api, not abi. there are two reasons: it's faster and it avoids toctou

**pal666** · 06 July 2020, 09:53 AM

Originally posted by atomsymbol

Is there a real-world scenario in which open/read/close of sysfs and procfs files is a bottleneck?

/bin/top

**pal666** · 06 July 2020, 10:18 AM

Originally posted by atomsymbol

Question: If it wasn't an issue not to have the readfile() Linux syscall on 386/486 CPUs in the 90-ties

it was

Originally posted by atomsymbol

Just a note: SATA and NVMe have very similar random 4K read IOPS at queue depth 1 (about 10000 IOPS) because it (presumably) is limited by the technology and not limited by SATA speeds.

my nvme drive is about 8 times faster. and it's not even dram-based. sata has inherent overhead including cpu overhead. and you can run more than 1 app at a time, if you run 32 apps in parallel, you'll get qd32 for free

Originally posted by atomsymbol

Just a 2nd note: Many people are archiving data and storing videos on HDDs, not SSDs, because of capacity and because of cost per terabyte.

such usecases will not benefit from subj

**pal666** · 06 July 2020, 10:20 AM

Originally posted by atomsymbol

I hope you do realize that there exist far better [research] methods of how to increase performance of kernel-userspace transitions, by a factor of 10 at least compared to what Linux is currently doing.

i hope you do realize that using some imaginary os instead of real linux is out of question

**markg85** · 06 July 2020, 05:28 PM

Originally posted by atomsymbol

NaCl.

Enlighten me again and this time with links and research to back your claims up.
I call your claim bullshit unless proven wrong.

Your credibility with me is declining super fast now.
If i search for NaCl all i find is "Google Native Client" which has absolutely nothing to do with this conversation.

So bring proof or just don't post. As this is getting tiresome.

Note, there actually is one system that could get up to 10x the performance. That's the playstation 5, but that again has nothing to do with this. I was asking for linux and now. Not hypothetical or other platforms.

**ssokolow** · 06 July 2020, 11:07 PM

Originally posted by pal666 View Post

it's api, not abi. there are two reasons: it's faster and it avoids toctou

I meant what I said. I was referring to how, if you were to hack together some machine code by hand, the execution environment still wouldn't allow you to make calls outside the set described by the API and, in idealized form, would lack the "vocabulary" to describe such calls.

**markg85** · 07 July 2020, 01:35 PM

Originally posted by atomsymbol

NaCl is part of a research area. A person can choose to read articles from that research area or choose not to read them. Dismissing a whole research area based on literally just a couple of sentences from a web search engine isn't a good strategy.

If you are familiar with the research area to which NaCl belongs then please reply with an argument why you believe that the core idea behind NaCl cannot be used to achieve the speedup I am claiming can be achieved.

You claim something has a 10x improvement.
You expect me to read all there is about NaCl without giving me a single hint as to where i'm even supposed to look.

No deal!

I am dismissing NaCl as i'm not going to read up hundreds of articles and long videos to "verify your claim". I'm not here for that.
Till you prove your point with numbers and articles to back it up, i'm going to downright call your claims total bullshit.

It would be theoretically impossible to get a 10x improvement on the low level file reading where the hardware remains as is and just one part of the software stack changes. It "would" be possible to make that claim in end user libraries where there are loads of indirections in between all making it a bit slower. But at the low level: bullshit. Just remember, again, that even io_uring isn't improving performance much. Only by a small margin, say 30 percent max. Very far from 10x, more like 0.3x. The benefit, yet again, is in the async nature.

**markg85** · 07 July 2020, 01:59 PM

Originally posted by atomsymbol

In my opinion, you are missing quite a lot by not ever reading the NaCl paper. It's just a single paper, not a hundred papers.

Well, how about this: If a file is cached in memory and its size is 1 byte then I am claiming it is possible to achieve a speedup of 100x.

Seriously.. That's how you plan on winning this argument.
By cheating...

We OBVIOUSLY mean loading from disk.
End of discussion. Your arguments are without proof. I'm done responding to this.

Announcement

New readfile() System Call Under Review For Reading Small~Medium Files Faster

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment