Announcement

**ferry** · 20 November 2018, 01:58 PM

Originally posted by duby229 View Post

Sorry to say it, but CFQ wasn't really made for desktop and actually desktops should use the no-op scheduler if they are on a spinning harddrive, if you use a SSD then BLK-MQ

EDIT: I guess I should explain a bit. HDDs have integrated drive electronics whith their own logic and their own caches and so the no-op scheduler is best for them. SSD's on the other hand are basically more or less RAID-like controllers and they need more external logic and that's what BLK-MQ was designed for.

EDIT: CFQ was really made for SCSI-like controllers where their would be tons of disk accesses and latency is split between them all. So it's not made for a typical desktop where there would be minimal disk accesses but latency is important.

EDIT: Windows basically issues IO in timeslices and it puts a context switch between every time slice. And in every scenario where the process isn't ready in time for the time slice, the time slice is filled with context switches till the next time slice...

Really? Have you tried running gitk on the linux repo on a HDD and then scroll down a bit? After continuous disk access start and your mouse freezes just wait a little longer until you get bored with it and reboot.

BTW this shows that any ordinary user can cause a linux system to lock up.

Apparently IO Wait's are causing the kernel to lock up. If there had been any CPU time left you could have stopped (not killed) gitk and all would be fine. But AFAIK there is no scheduler available that would restrict a process in CPU time based on it's disk access. And none are planned. Because it is not considered the schedulers job :-)

**duby229** · 20 November 2018, 02:07 PM

Originally posted by ferry View Post

Really? Have you tried running gitk on the linux repo on a HDD and then scroll down a bit? After continuous disk access start and your mouse freezes just wait a little longer until you get bored with it and reboot.

BTW this shows that any ordinary user can cause a linux system to lock up.

Apparently IO Wait's are causing the kernel to lock up. If there had been any CPU time left you could have stopped (not killed) gitk and all would be fine. But AFAIK there is no scheduler available that would restrict a process in CPU time based on it's disk access. And none are planned. Because it is not considered the schedulers job :-)

And so I suppose you too are using CFQ, which understandable, but it's not the right scheduler to use for a desktop linux. As I said I would suggest no-op for a spinning HDD. Check your kernel configuration there is a good chance it already has several schedulers built, you can flip between several of them and test them under the scenario you described above. It's a good use case and I think if you tryed something besides CFQ you'd get the results you expect.

EDIT: Please understand that CFQ was made to split latency between all disk access, so if 99.9% of that is some (highly parallel) load, then only 0.1% would be left for everything is, that's why CFQ isn't the right IO scheduler for desktops. So I advise you to experiment with others.

**F.Ultra** · 20 November 2018, 03:36 PM

Originally posted by uid313 View Post

But Windows have async I/O.
Linux does not.

Linux doesn't? So what exactly have I been using select/poll/epoll for all these years?

**jacob** · 20 November 2018, 03:46 PM

Originally posted by Britoid View Post

Linux already has advanced ACL.

No it doesn't. It only has the so-called POSIX ACLs, which are both more complicated to use AND less flexible than Windows aka nfs4 ACLs.

Another issue with Linux ACL support is that they can only apply to files, all other types of kernel objects are subject to the brain-dead Unix-style access control model.

**jacob** · 20 November 2018, 03:48 PM

Originally posted by F.Ultra View Post

Linux doesn't? So what exactly have I been using select/poll/epoll for all these years?

You have been using it for what it is used for, which is not Async IO.

**ferry** · 20 November 2018, 04:11 PM

Originally posted by duby229 View Post

And so I suppose you too are using CFQ, which understandable, but it's not the right scheduler to use for a desktop linux. As I said I would suggest no-op for a spinning HDD. Check your kernel configuration there is a good chance it already has several schedulers built, you can flip between several of them and test them under the scenario you described above. It's a good use case and I think if you tryed something besides CFQ you'd get the results you expect.

EDIT: Please understand that CFQ was made to split latency between all disk access, so if 99.9% of that is some (highly parallel) load, then only 0.1% would be left for everything is, that's why CFQ isn't the right IO scheduler for desktops. So I advise you to experiment with others.

Great story. But did you note I started with: Really? Have you tried ...

Just to make sure I just did try your suggestion. Here is what I have:
SSD on sda, HDD on sdb. Both are btrfs but in their own pool. My kernel has noop, cfq and deadline.
I have @ (/) and @home (/home) on sdb, /tmp is on sda and so is swap. Also I have a /home/me/tmp on sda and this has the kernel repo.

Running gitk on repo hangs the mouse and everything else in 10 sec. Not possible to ssh to remedy.

EDIT It seems that to reliably bring the machine on it's knees I need to have Chromium running. Now I am guessing that lack of memory caused by gitk triggers something in Chromium to heavily access the disk.

**F.Ultra** · 20 November 2018, 04:58 PM

Originally posted by jacob View Post

You have been using it for what it is used for, which is not Async IO.

Ok, so I fire away a read to network or disk, my code continues to run and later when the data have been read from the network/disk I can act on that event. In which world is that not async?

**duby229** · 20 November 2018, 05:43 PM

Originally posted by ferry View Post

Great story. But did you note I started with: Really? Have you tried ...

Just to make sure I just did try your suggestion. Here is what I have:
SSD on sda, HDD on sdb. Both are btrfs but in their own pool. My kernel has noop, cfq and deadline.
I have @ (/) and @home (/home) on sdb, /tmp is on sda and so is swap. Also I have a /home/me/tmp on sda and this has the kernel repo.

Running gitk on repo hangs the mouse and everything else in 10 sec. Not possible to ssh to remedy.

EDIT It seems that to reliably bring the machine on it's knees I need to have Chromium running. Now I am guessing that lack of memory caused by gitk triggers something in Chromium to heavily access the disk.

Cool, so flip the no-op scheduler on and try again, I bet you have a better experience. But for the SSD ypou may want to consider building a new kernel with all the multi-queue stuff to get the BLK-MQ scheduler as an option.

**TheCycoONE** · 20 November 2018, 06:25 PM

F.Ultra because in Linux a userspace thread is blocked while writing or reading from a file, and in Windows it doesn't have to be. In the case of epoll (which doesn't apply to files) a user thread has to call epoll_wait to find if there are updates. In Windows you can pass a callback to the kernel for file operations instead.

Synchronous and Asynchronous I/O - Win32 apps

https://docs.microsoft.com/en-us/windows/desktop/fileio/synchronous-and-asynchronous-i-o

There are two types of input/output (I/O) synchronization: synchronous I/O and asynchronous I/O. Asynchronous I/O is also referred to as overlapped I/O.

**jacob** · 20 November 2018, 06:50 PM

Originally posted by F.Ultra View Post

Ok, so I fire away a read to network or disk, my code continues to run and later when the data have been read from the network/disk I can act on that event. In which world is that not async?

All right, so how exactly do you do this? The Linux aio_*** API (or anything built on top of it) is only an imitation of async IO. Since as of 2018 the Linux kernel still doesn't have real, production quality AIO, those functions simply fire up worker threads that do normal synchronous IO in the background. Obviously this introduces overhead that can become very significant as the number of concurrent IO requests increases.

In comparison the NT kernel has built-in support for async IO where the read() and write() system calls can perform all the requests internally in kernel space and then send you a signal or invoke a callback function when they complete. There have been attempts to implement the same thing on Linux but so far none of them is ready to be merged in. Actually AIO and ACLs are two areas (some would even say "THE" two areas) where Linux is lagging behind the NT kernel.

Announcement

Linux File-Systems Keeps Getting Better, But More Improvements Are Sought

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment