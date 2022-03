Networked workloads are intensive on the poll arming side, as most receive operations will be triggered async by poll. For that kind of poll triggering, we have allocated req->apoll dynamically and that serves as our poll entry. This means that the poll->events and poll->head are not part of the io_kiocb cachelines, and hence often not hot in the completion path. When profiling workloads, io_poll_check_events() shows up as hotter than it should be, exactly because we have to pull in this cacheline separately.



Cache state in the io_kiocb itself instead, which avoids pulling in unnecessary data in the poll task_work path. This reduces overhead by about 3-4%.

After revolutioning Linux storage I/O, the kernel's IO_uring interface is continuing to be buffed into shape for handling Linux networking needs too.In recent months there has been work on IO_uring network zero-copy send and other efforts around making IO_uring appealing for network use-cases on Linux. Linux block subsystem maintainer and lead IO_uring developer Jens Axboe has been involved in this networking effort and on Wednesday announced more optimizations.Axboe's new patch series achieves a 3~4% reduction for network-related workloads with IO_uring. That performance test is based on a model of Thrift's network handling.Given all the interest and promising efforts around IO_uring, it will be fun to see where it ends up by year's end.