Announcement

Collapse
No announcement yet.

RHEL9 Raises Base Target For x86_64 CPUs Plus Possible Optimized Libraries With glibc-hwcaps

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by ms178 View Post
    This is an implementation detail which is becoming less true with newer generations on Intel CPUs.
    wrong 10900K and 11900K are both 14nm and this downclock to 1,7ghz problem does not go away with newer generation of intel cpus as long as they use 14nm.
    and it also does not go away with intels 10nm... why? simple: the 11900K has 5,2ghz
    the 10nm cpus with the newest 10+++ node only have 4ghz.
    intels 10nm node have more transistors on 1mm² because it is 3D structure but also not true for the AVX512 downclock problem because it is 3D structure the transistors are still "BIG"
    means they put more transistors on 1mm² but they failed to make the transistors smaller...

    so you see your point just shows complete imcompetence newer intel cpus like 11900K do not solve this problem. also 10nm of intel does not solve the problem.
    intel right now build 7nm note the 2D style with smaller transitors without the 3D structure of 10nm... this means the 7nm note will increase the AVX512 downclock from 1,7ghz to maybe 3ghz.
    what you do not get is this: intel does not have the 7nm note YET but AMD already produce in 5nm!!! and this is a fact. apple was the first to get 5nm note and AMD also has 20-30% of the 5nm after apple.

    this means in the moment intel does have 7nm it is a fact that AMD already have 5nm cpus everywhere. so this means your point is even a failure in 7nm because AMD will have 5,5ghz cpus in 5nm at this time.

    Originally posted by ms178 View Post
    Also this doesn't mean that this will be an issue with upcoming AMD CPU's. There are workloads where in spite of the downclocking, the benefits are still noticeable. These workloads are the last hold outs where Intel still trumps AMD in benchmarks. These might not be relevant to many people, but they are for some.
    i know 100% for sure that AMD WILL not build in AVX512 per one single core.
    believe it or not AMD has emulation for AVX512 instruction set to run it on AVX256
    zen4 also will not have AVX512 per core. instead it has AVX256 per core and the AVX units of 2 cores can put the units together to calculate AVX512.

    Originally posted by ms178 View Post
    Let's wait and see how AMD will implement it this time, while being good enough for most use cases at that time, this implementation strategy had its drawbacks, Intel's AVX2 implementation was faster where it was utilized fully. There is a chance for AMD to implement AVX-512 better than Intel though, without downclocking. AMD always hinted: We will implement it when the cost (die size) is reasonable and when they could avoid the downclocking, and with each newer process node these issues fade away. They originally wanted these workloads to be shifted onto the GPU - I don't know if you've noticed, but their HSA efforts failed, as this programming model never got wide enough traction and OpenCL versions above 2.0 haven't gotten any traction in the industry either.
    AMDs strategy is a strategy of downsizing they will only implement AVX units who can run at full speed without any downclocking. ZEN4 in 5nm you can expect a 256bit avx unit per core with 2 cores able to put the units together to perform AVX512.

    Originally posted by ms178 View Post
    At least Matt Pharr thinks differently, but he could make use of AVX-512 to a bigger extend, the speedups he achieved speak for themselves which showed good scaling in both vectorization and parallelization even in code which wasn't suited for this. He argued that from a performance per area perspective, wider vector engines were superior than putting more cores on the die. AMD went the opposite route with much success, but that is to a large part only reflecting Intel's inability to strike back on newer process nodes during the last several years. We are still stuck on a refined Skylake architecture on the desktop with Intel today. If they had kept innovating as they should have, AMD would have had a harder time coming back into the market. Also Intel failed for a long time to push software more to make better use these advanced capabilities of their CPUs, it took ages until RedHat finally drove this x86-feature level thing, why hasn't Intel done so already 5-10 years ago?
    i think outside of intel no one cares about "He argued that from a performance per area perspective" instead they all go after "performance per watt".
    also intels 10nm is 3D structure means high transistor count per area (same bullshit)
    and AMDs 5/7nm is smaller transistors and more performance per watt.

    "Intel's inability to strike back on newer process nodes during the last several years."

    intel for years did big mistake by only care for transistor count per dollar and profit and because of this they build 3D tranistors with many tranistors per 1mm² yes this keep profit and dollar high but it fails on "performance per watt" because the tranistors are still big and need a lot of power.
    AMD is multible times better the chiplet design keep the costs low and 5/7nm keep the performance per watt up.

    do not expect any good from intel the next 2 years. intel does have 7nm cpus with AVX512 in 2 years. the next 2 years AMD will rule for sure with zen4 in 5nm...

    and keep in mind even if intel has 7nm right now their monolite die is more expensive than the chiplet design.

    also keep in mind intel also go the downsizing way but they go for ARM-Big.Little design

    but apple 1M cpu shows big.little design will not save them because you do not need more than 4 small cores... means even a 16core design with 8 fast and 8 small cores is a failure because you do not need more than 4 small cores.
    Phantom circuit Sequence Reducer Dyslexia

    Comment


    • #22
      How to check those 7 items?
      Last edited by elatllat; 13 January 2021, 11:08 AM.

      Comment


      • #23
        Ah
        CMPXCHG16B=CX16
        LAHF-SAHF=LAHF_LM

        HTML Code:
        grep -m 1 "model name" /proc/cpuinfo;grep -P "flags|Features" /proc/cpuinfo | cut -f 2 -d : | tr " " "\n" | sort -u | grep -iP "^(CX16|LAHF_LM|POPCNT|SSE4.1|SSE4.2|SSSE3)"
        model name : 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        cx16
        lahf_lm
        popcnt
        sse4_1
        sse4_2
        ssse3
        and I assume there is an error in the artical because SSE3 would not be listed when sse4_1 is.
        So 6/6.
        Last edited by elatllat; 13 January 2021, 11:06 AM.

        Comment

        Working...
        X