No announcement yet.

AMD EPYC 7003 "Milan" Performance On Ubuntu Linux Six Months After Launch

  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD EPYC 7003 "Milan" Performance On Ubuntu Linux Six Months After Launch

    Phoronix: AMD EPYC 7003 "Milan" Performance On Ubuntu Linux Six Months After Launch

    It's been a half-year already since AMD introduced the EPYC 7003 "Milan" processors that continue performing well and gaining marketshare. While the recently released Ubuntu 21.10 is not a long-term support (LTS) release, for those wondering what this latest Linux distribution means for EPYC 7003 series performance, here is a look at its performance across many benchmarks against that of Ubuntu 21.04 that was released right after the Milan launch and then Ubuntu 20.04 as the current LTS stable series.

  • #2
    I can definitely verify these results. Ubuntu 21.10 + Linux 5.15 is legit. Haven't been happier.

    The nice thing about 21.10 is that it's a smooth upgrade to 22.04 LTS, and then you can stay locked in. GNOME 40 and 21.10 is in a solid place if you have newer hardware and are cautious to take the jump on a newly-released operating system. But If you're comfortable with >= 18.04, then you should just stay put and grab a newer Mesa PPA and run a newer kernel and start fresh on 22.04 LTS.

    *edit*: Michael- can you run some block scheduler tests soon comparing mq-deadline vs kyber vs none vs bfq?

    Why I ask is, I've been noticing block scheduler set to 'none' is *blazingly* fast launching programs and general moving around the OS. But it slightly performs worse than BFQ in benchmarks, which also launches programs fast, but not noticeably as faster as 'none'

    Curious if anyone else has noticed the same. I haven't gamed in a bit but I'd be interested how none and bfq vary in gaming right now. I should mention I'm on a 3x860 EVO 500GB -> 1.5TB RAID0 configuration, but apples to apples, none is fast. Which makes sense is when you consider there's no code (or very little) code being run for 'none' lol. Screen capture for those curious: -- that is stupid fast
    Last edited by perpetually high; 22 October 2021, 01:29 PM.


    • #3
      Optimizations: 25% with MariaDB/MySQL, 10% with Cassandra. Good!


      • #4
        The Milan (Zen3 epycs) look pretty awesome and it's awesome to see the software stack improving. I figured they would be more available than the Ryzen chips (since the Epyc's are more expensive), that doesn't seem to be the case. Anyone know where to get an Epyc 7313p and motherboard or barebones setup at somewhere near MSRP?


        • #5
          Originally posted by perpetually high View Post
          I haven't gamed in a bit but I'd be interested how none and bfq vary in gaming right now.
          Haven't done any benchmarks myself, but I always stick to BFQ (even with a NVMe SSD), because I like the fact that it's default mode of operation prioritizes lower latency over throughput.

          However, if You are looking for some benchmarks, then the original Italian author of this awesome IO-scheduler has done these himself here:

          In particular, I'm loving this statement:

          Because of the very high speed of the drive, results are essentially good with all schedulers. But only with BFQ no frame gets lost.


          • #6
            Originally posted by Linuxxx View Post

            Haven't done any benchmarks myself, but I always stick to BFQ (even with a NVMe SSD), because I like the fact that it's default mode of operation prioritizes lower latency over throughput.
            Linuxxx I found some *really* good information that explained what I was noticing about none scheduler and RAID.

            So I mentioned earlier I was using a RAID0 configuration of three Samsung 860 EVO's using Intel's hardware RAID (set in the the bios, and then you create the empty RAID volume, and then the Ubuntu installer figures it out and installs the appropriate mdraid packages. Ridiculously easy by the way if anyone wants to try it out.)

            Anyways, so check out the following. This explains why I was seeing that lightening fast behavior of the 'none' scheduler:

            The NOOP scheduler is the simplest I/O scheduler for the Linux kernel. This scheduler was developed by Jens Axboe.

            The NOOP scheduler inserts all incoming I/O requests into a simple FIFO queue and implements request merging. This scheduler is useful when it has been determined that the host should not attempt to re-order requests based on the sector numbers contained therein. In other words, the scheduler assumes that the host is unaware of how to productively re-order requests.

            There are (generally) three basic situations where this situation is desirable:

            If I/O scheduling will be handled at a lower layer of the I/O stack. Examples of lower layers that might handle the scheduling include block devices, intelligent RAID controllers, Network Attached Storage, or an externally attached controller such as a storage subsystem accessed through a switched Storage Area Network. Since I/O requests are potentially rescheduled at the lower level, resequencing IOPs at the host level uses host CPU time on operations that will just be undone at the lower level, increasing latency/decreasing throughput for no productive reason.

            Because accurate details of sector position are hidden from the host system. An example would be a RAID controller that performs no scheduling on its own. Even though the host has the ability to re-order requests and the RAID controller does not, the host system lacks the visibility to accurately re-order the requests to lower seek time. Since the host has no way of judging whether one sequence is better than another, it cannot restructure the active queue optimally and should, therefore, pass it on to the device that is (theoretically) more aware of such details.

            Because read/write head movement doesn't impact application performance enough to justify the reordering overhead. This is usually the case with non-rotational media such as flash drives or solid-state drives (SSDs).
            Source: Wikipedia

            Sorry for all the bolds, the whole thing is relevant but I don't want anyone to miss the important points.

            Also check this out about BFQ:

            BFQ is a proportional-share I/O scheduler, with some extra low-latency capabilities. In addition to cgroups support (blkio or io controllers), BFQ’s main features are:

            BFQ guarantees a high system and application responsiveness, and a low latency for time-sensitive applications, such as audio or video players;

            BFQ distributes bandwidth, and not just time, among processes or groups (switching back to time distribution when needed to keep throughput high).

            In its default configuration, BFQ privileges latency over throughput. So, when needed for achieving a lower latency, BFQ builds schedules that may lead to a lower throughput. If your main or only goal, for a given device, is to achieve the maximum-possible throughput at all times, then do switch off all low-latency heuristics for that device, by setting low_latency to 0. See Section 3 for details on how to configure BFQ for the desired tradeoff between latency and throughput, or on how to maximize throughput.

            As every I/O scheduler, BFQ adds some overhead to per-I/O-request processing. To give an idea of this overhead, the total, single-lock-protected, per-request processing time of BFQ—i.e., the sum of the execution times of the request insertion, dispatch and completion hooks—is, e.g., 1.9 us on an Intel Core [email protected] (dated CPU for notebooks; time measured with simple code instrumentation, and using the script of the S suite [1], in performance-profiling mode). To put this result into context, the total, single-lock-protected, per-request execution time of the lightest I/O scheduler available in blk-mq, mq-deadline, is 0.7 us (mq-deadline is ~800 LOC, against ~10500 LOC for BFQ).

            Scheduling overhead further limits the maximum IOPS that a CPU can process (already limited by the execution of the rest of the I/O stack). To give an idea of the limits with BFQ, on slow or average CPUs, here are, first, the limits of BFQ for three different CPUs, on, respectively, an average laptop, an old desktop, and a cheap embedded system, in case full hierarchical support is enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not set (Section 4-2): - Intel i7-4850HQ: 400 KIOPS - AMD A8-3850: 250 KIOPS - ARM CortexTM-A53 Octa-core: 80 KIOPS

            So that all actually makes a ton of sense. "none" is pretty much no overhead but a *dumb* scheduler. BFQ is a very smart scheduler that can allocate resources efficiently and favor low-latency (default) or throughput, through tunables.

            So I think a good rule of thumb is if you're running hardware RAID, opt for "none", and if you're running an SSD, go for bfq or mq-deadline.

            Side note, and fun fact about Jens Axboe (from his Wikipedia): "Axboe is the current Linux kernel maintainer of the block layer and other block devices, along with contributing the CFQ I/O scheduler, Noop scheduler, Deadline scheduler and splice IO architecture." Guy's a flat out winner.