Announcement

Collapse
No announcement yet.

AMD Ryzen 7040 Series Shows Great AVX-512 Performance For Laptops / Mobile / Edge

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tomin
    replied
    Originally posted by Michael View Post

    Uhhh page 5? There are OpenVINO benchmarks.
    This was a reply to coder. They had done some calculations of the results in a previous comment.

    Leave a comment:


  • Michael
    replied
    Originally posted by Tomin View Post
    I missed that you only included some OpenVINO bechmarks, not all of them. That explains it, thanks!

    (It was mentioned even in the part I quoted, but I still didn't see it. )
    Uhhh page 5? There are OpenVINO benchmarks.

    Leave a comment:


  • Tomin
    replied
    I missed that you included only some OpenVINO bechmarks, not all of them. That explains it, thanks!

    (It was mentioned even in the part I quoted, but I still didn't see it. )
    Last edited by Tomin; 16 July 2023, 07:22 AM. Reason: only included -> included only

    Leave a comment:


  • coder
    replied
    Originally posted by Tomin View Post
    That table looks a little strange. Previously leaving out OpenVINO made the GeoMean to swing from AMD side to Intel,
    This table shows the subscores, but it's based on the exact same data I used to recompute the overall GeoMeans.

    I didn't omit all OpenVINO benchmarks, just the fp16 ones. Those gave zero speedup on the Intel CPUs but a > 2x speedup on Phoenix. Since this was simply due to the presence or absence of a few instructions, it doesn't really tell us how the overall implementation compares, and that's what I wanted to see. That was my rationale for omitting them from my recomputed scores.

    Originally posted by Tomin View Post
    this looks like Intel is stronger on that benchmark so how could that be true.
    So, you're right that Intel gained a bigger overall speed up on the remainder of OpenVINO tests than Phoenix did. Since the core counts differ, we can't necessarily infer it to mean that Phoenix' AVX-512 implementation is that much worse. It could be that Phoenix simply gains less benefit due to being more memory-bottlenecked, with twice as many cores to feed.
    Last edited by coder; 16 July 2023, 03:03 AM.

    Leave a comment:


  • Tomin
    replied
    Originally posted by coder View Post
    More Insights
    Following up on the post where I found AVX-512 benefited Intel's Ice Lake and Tiger Lake more than Phoenix, if you exclude the OpenVINO fp16 benchmarks, here are some other points of interest.

    I computed GeoMean for each benchmark program, to see where AVX-512 helped the most/least.
    That table looks a little strange. Previously leaving out OpenVINO made the GeoMean to swing from AMD side to Intel, but this looks like Intel is stronger on that benchmark so how could that be true. Am I reading this somehow wrong?

    Leave a comment:


  • coder
    replied
    More Insights

    Following up on the post where I found AVX-512 benefited Intel's Ice Lake and Tiger Lake more than Phoenix, if you exclude the OpenVINO fp16 benchmarks, here are some other points of interest.

    I computed GeoMean for each benchmark program, to see where AVX-512 helped the most/least.
    Program Benches i7-1065G7 i7-1165G7 R7 7840U
    Embree 4.1
    3
    1.070
    1.117
    1.189
    OpenVKL 1.3.1
    1
    1.293
    1.302
    1.239
    OSPRay 2.12
    4
    1.446
    1.433
    1.412
    OSPRay Studio 0.11
    6
    1.044
    1.105
    1.108
    oneDNN 3.1
    5
    1.694
    1.686
    1.561
    Cpuminer-Opt 3.20.3
    8
    1.876
    1.915
    1.473
    OpenVINO 2022.3
    8
    1.750
    1.755
    1.635
    miniBUDE 20210901
    2
    1.288
    1.233
    1.280
    libxsmm 2-1.17-3645
    1
    1.065
    0.986
    1.082
    TensorFlow 2.12
    6
    1.550
    1.509
    1.858

    The Benches column indicates how many benchmarks that program had. This affects how strongly its performance is weighted in the final GeoMean. This also shows how one could influence the test suite to swing final results one way or another. For instance, more Cpuminer-Opt tests would make the Intel CPUs' AVX-512 support look a lot better, meanwhile more TensorFlow tests would benefit AMD's Phoenix.

    BTW, a key part of my sanity-checks, to make sure I hadn't made any data-entry errors, was to look at the per-test speedup and scan for any outliers.
    Last edited by coder; 15 July 2023, 11:29 PM.

    Leave a comment:


  • coder
    replied
    Originally posted by MorrisS. View Post
    Shouldn't apply AVX2 instructions once AVX512 is missed?
    Yes, but AVX2 doesn't have all of the corresponding instructions of AVX-512.

    To partially plug the hole left by ripping out AVX-512, Intel added AVX-VNNI, but none of the CPUs in this comparison have those instructions.

    Leave a comment:


  • coder
    replied
    Originally posted by sophisticles View Post
    Personally I want Intel to release a pure E-core processor for the desktop.
    It's not quite the same thing, but the N-series can give you a taste of what it would be like. The N300 and N305 even have 8 Gracemont cores.

    The downsides, as you probably know:
    • BGA, not socketed.
    • Only 1 memory channel.
    • I/O is only PCIe 3.0 @ 9 lanes.
    • Smaller iGPU, in some models.

    So, not a bad option for powering mini-PCs, but not something you can pair with a dGPU or otherwise use in an I/O-heavy configuration.

    Originally posted by sophisticles View Post
    There's rumors Intel will be releasing a very high core count E-core only Xeon,
    Sierra Forest is more than a rumor.

    Originally posted by sophisticles View Post
    I want a desktop version, something like 50 E-cores would be perfect for my use cases and I suspect for a lot of people's use cases.
    I've heard rumors of future desktop CPUs with up to 32 E-cores. With 8 P-cores, that would give you 48 threads.

    Maybe it'd be pointless, though. 2-channel memory is already a big enough bottleneck with just 32 threads.

    Leave a comment:


  • sophisticles
    replied
    Originally posted by drakonas777 View Post
    The implementation of Intel hybrid architecture is not elegant, to put it politely. They should either make E cores more fat, or P cores less fat. Disabling AVX512 is a workaround. Making P cores less fat is probably better approach. AVX512 is not that important on client platforms and SMT/HT is not that important when there is a bunch of E cores to handle highly parallel loads.
    Personally I want Intel to release a pure E-core processor for the desktop.

    There's rumors Intel will be releasing a very high core count E-core only Xeon, I want a desktop version, something like 50 E-cores would be perfect for my use cases and I suspect for a lot of people's use cases.

    Leave a comment:


  • MorrisS.
    replied
    Originally posted by coder View Post
    Especially because Zen 4 has the same overall dispatch width for AVX/AVX2 and AVX-512. 1536 bits per cycle.

    A significant portion of the benefit comes from capabilities that are unique to AVX-512, like:

    VDPBF16PS Calculate dot product of two Bfloat16 pairs and accumulate the result into one packed single precision number
    If you don't have that, then OpenVINO is going to do the computation using fp32. So, that's a clear case where you get about double the throughput.
    Shouldn't apply AVX2 instructions once AVX512 is missed?

    Leave a comment:

Working...
X