Announcement

Collapse
No announcement yet.

An Introduction To Intel's Tremont Microarchitecture

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by milkylainen View Post

    Can you elaborate that question a bit please?
    6-wide x86 instruction decode
    * Dual 3-wide clusters Out of order
    * Wide decode without the area of a uop cache
    * Optional single cluster mode based on product targets

    The clusters and removal of the micro operations cache seems ARM-like, real time oriented. Then the optional single cluster... I'm just not sure what the product target they have in mind. Robotics maybe? Weird.

    Comment


    • #22
      Originally posted by Space Heater View Post

      Yes multi-tenancy is important, but if you have critical tasks that aren't embarassingly parallel we still have to deal Amdahl's law and therefore single-thread performance is still relevant.
      Heh, I guess you don't even know what that Law means, do you? The law states the theoretical maximum you can get when the number of cores is huge, like hundreds or thousands. In real world, the core count is typically from 4 to 32. So even the law you refer to tells that you get serious speedups with more cores.

      Comment


      • #23
        Originally posted by Alex/AT View Post
        Design target: single-thread performance.
        Someone tell them it's 2019 already.
        I skimmed through it, but we're talking about a microarchitecture targeted at low-power, battery-powered devices, right?
        If that's so, single-thread performance matters because it means you may set cores to low-states if you can handle most of the work (say, messaging or some basic social networking) on one or two cores, which means you can save a ton of energy, but have plenty of power to run more expensive tasks when needed.

        Comment


        • #24
          Originally posted by caligula View Post

          Heh, I guess you don't even know what that Law means, do you? The law states the theoretical maximum you can get when the number of cores is huge, like hundreds or thousands. In real world, the core count is typically from 4 to 32. So even the law you refer to tells that you get serious speedups with more cores.
          Not sure what you're trying to say here - you seem to be implying that I think multi-core designs are not good or are misguided - which is just a complete fabrication on your part. I am saying single-threaded performance is still hugely important for general purpose workloads and that we can't just dismiss it because "it's 2019". It's not a zero-sum game, we can't ignore single-threaded performance and likewise we can't ignore thread-level parallelism.

          By the way, the original paper by Amdahl was an argument for focusing on single-threaded performance in processor designs, and that it is essentially impossible to get linear speed increases as you increase the core count on almost all real-world workloads. Did you actually read the paper and believe I'm missing something or did you just want to misconstrue what I'm saying?

          Comment


          • #25
            Originally posted by milkylainen View Post
            Absolutely rubbish statement. Ask any ASIC CPU engineer or any CPU architect.
            Rubbish indeed. There are already ASICs and (in the worst case) ARM where this highly-PR'd no-niche 'architecture' modification is positioned to.

            Comment


            • #26
              Originally posted by mrugiero View Post
              I skimmed through it, but we're talking about a microarchitecture targeted at low-power, battery-powered devices, right?
              No, we are not talking about ARM

              Comment


              • #27
                Originally posted by Space Heater View Post

                Single-thread performance still matters in 2019, it's unfortunately not the case that every task can be parallelized to the nth degree.
                True, but multiple cores are very useful. For example, on my Nexus 6, for some idiotic reason, cores are shut down as battery charge level decreases. So when the battery reaches 75% and less, 2 cores are turned off and only 2 are available. There's a huge drop in performance and responsiveness.

                It turns out, that multiple cores able to run multiple processes/threads in parallel can significantly boost responsiveness - who knew?

                Comment


                • #28
                  Originally posted by F.Ultra View Post

                  There are lot's of workloads that are single core by nature or where parallelization increases too much latency. Running "more than one task" is not decremental to a high single core performant CPU.
                  Actually, it is. When you have multiple cores/processors, you're actually running things in parallel. Not just providing the appearance of running things in parallel. It does make a big difference as far as responsiveness.

                  Comment


                  • #29
                    Originally posted by tildearrow View Post

                    Where is my 4.0GHz ARM processor?!

                    (also, they suck at tasks like compiling and video decoding...)
                    Haha, that's true. The only reason mobile devices work so well is due to heavy limitations and strict enforcement of those limitations - no swap, use a lot of specialized chips for various tasks (video decoding and encoding, GPU, low power sensor hub, special chips for recognizing hotwords like Ok Google , and special camera image processing chips etc.),

                    This along with other actions such as killing background apps to reclaim memory, killing/suspending background apps to lower CPU usage and enforcing strict entry points for app execution.

                    Comment


                    • #30
                      Originally posted by caligula View Post

                      Heh, I guess you don't even know what that Law means, do you? The law states the theoretical maximum you can get when the number of cores is huge, like hundreds or thousands. In real world, the core count is typically from 4 to 32. So even the law you refer to tells that you get serious speedups with more cores.
                      Hm, sounds like it can be applied for GPUs then. After all they have thousands of "cores" that do computations in parallel. In fact, I believe that's why NVIDIA dGPUs perform better on average than AMD dGPUs - better power efficiency that allows them to run at significantly higher clock speeds. Thanks to AMD using 7 nm they're now able to run at higher clock speeds, thus enabling better performance and allowing them to be more competitive with NVIDIA.

                      Comment

                      Working...
                      X