An Introduction To Intel's Tremont Microarchitecture

starshipeleven replied

25 October 2019, 08:56 AM
Originally posted by Alex/AT View Post

Design target: single-thread performance.
Someone tell them it's 2019 already.

FYI: single-threaded performance is still very much a thing in 2019, especially for a weak core.
Likes 2
Leave a comment:
starshipeleven replied

25 October 2019, 08:53 AM
Originally posted by sandy8925 View Post

True, but multiple cores are very useful. For example, on my Nexus 6, for some idiotic reason, cores are shut down as battery charge level decreases. So when the battery reaches 75% and less, 2 cores are turned off and only 2 are available. There's a huge drop in performance and responsiveness.

It turns out, that multiple cores able to run multiple processes/threads in parallel can significantly boost responsiveness - who knew?

FYI: Android is not a real-time OS by any stretch of the imagination, and it usually does not even use the soft-realtime features from Linux kernel (so it won't just interrupt its processing when a high-priority input arrives), it's running 90% bloat, the CPU schedulers in the default firmware were written by hitting the keyboard with a fist multiple times without looking at the screen, and so on and so forth.

Really you can't use that as a reason to "add moar cores".

Last edited by starshipeleven; 25 October 2019, 09:15 AM.
Leave a comment:
sykobee replied

25 October 2019, 05:54 AM
Originally posted by c117152 View Post

What's up with that decoder's width?

In typical usage, it's half that width.

At a branch instruction, it runs both options through the decoders, and then parks the one that fails - it assumes the branch will eventually be taken, at which time it can just switch decoders and keep on running, like a relay race.
Leave a comment:
DavidC1 replied

25 October 2019, 04:50 AM
Originally posted by duby229 View Post

Then I guess it might be surprising to you that AMDs GPUs have many times more execution units than nVidias...

What? No they don't. The RTX 2080 Ti has 4352 CUDA cores. The Vega VII has 3840 CUs. GFlops-wise they are the same. Each CUs or CUDA cores are capable of 2 FLOPs per cycle.
Leave a comment:
uid313 replied

25 October 2019, 04:31 AM
Originally posted by tildearrow View Post

Where is my 4.0GHz ARM processor?!

(also, they suck at tasks like compiling and video decoding...)

Most ARM processors are not at so high frequencies because they are aimed at mobile devices, but I guess it would be possible to make a 4 GHz ARM processor if it was designed for workstations and servers.

I don't know about compiling, but aren't ARM processors really good for video decoding considering all phones and tablets that are used for video decoding with very little power usage?
Leave a comment:
archsway replied

25 October 2019, 03:10 AM
Originally posted by sandy8925 View Post

True, but multiple cores are very useful. For example, on my Nexus 6, for some idiotic reason, cores are shut down as battery charge level decreases. So when the battery reaches 75% and less, 2 cores are turned off and only 2 are available. There's a huge drop in performance and responsiveness.

It turns out, that multiple cores able to run multiple processes/threads in parallel can significantly boost responsiveness - who knew?

It turns out, that Android is extremely bloated with far too many background processes doing nothing useful and needs many cores to be responsive - who knew?

Originally posted by sandy8925

Actually, it is. When you have multiple cores/processors, you're actually running things in parallel. Not just providing the appearance of running things in parallel. It does make a big difference as far as responsiveness.

That's called the RT patchset.

Originally posted by sandy8925

Haha, that's true. The only reason mobile devices work so well is due to heavy limitations and strict enforcement of those limitations - no swap,

Have you heard of zram?

use a lot of specialized chips for various tasks (video decoding and encoding, GPU, low power sensor hub, special chips for recognizing hotwords like Ok Google,

"That chip over there is the OTG USB controller (with buggy drivers that cause kernel panics), and right next to it is what we call the 'Ok' chip. We plan to put an 'Alexa' chip on next year's version."

and special camera image processing chips etc.),

This along with other actions such as killing background apps to reclaim memory,

Because even the app launcher uses 300MB of RAM.

killing/suspending background apps to lower CPU usage

But obviously only the ones not sending tracking data to Google.

and enforcing strict entry points for app execution.
Leave a comment:
mrugiero replied

25 October 2019, 02:07 AM
Originally posted by Alex/AT View Post

No, we are not talking about ARM

Oh. I was misguided by people comparing with ARM, actually
Leave a comment:
ldesnogu replied

25 October 2019, 01:34 AM
Originally posted by sandy8925 View Post

Haha, that's true. The only reason mobile devices work so well is due to heavy limitations and strict enforcement of those limitations - no swap, use a lot of specialized chips for various tasks (video decoding and encoding, GPU, low power sensor hub, special chips for recognizing hotwords like Ok Google , and special camera image processing chips etc.),

This along with other actions such as killing background apps to reclaim memory, killing/suspending background apps to lower CPU usage and enforcing strict entry points for app execution.

You're mixing things that are unrelated. The OS does things to save power. The CPU too but one doesn't imply the other.

Also FYI Apple ARM CPU, yeah the one in phones, beats many current Intel/AMD CPU at running the CPU only benchmark SPEC 2006. Welcome to 21st century.
Leave a comment:
duby229 replied

25 October 2019, 01:26 AM
Originally posted by sandy8925 View Post

No it doesn't surprise me. I said that NVIDIA's performance lead was due to higher clock speeds and not due to higher number of execution units.

thats not the full story. nVidia's execution units are hella more complex than AMD's and have a much higher IPC. even that's not the full story because nvidia instructions aren't equal to AMD instructions, so IPC as a metric isn't real.

Last edited by duby229; 25 October 2019, 01:29 AM.
Likes 2
Leave a comment:
Guest replied

25 October 2019, 01:22 AM
Originally posted by duby229 View Post

Then I guess it might be surprising to you that AMDs GPUs have many times more execution units than nVidias...

No it doesn't surprise me. I said that NVIDIA's performance lead was due to higher clock speeds and not due to higher number of execution units.
Leave a comment:

Announcement

An Introduction To Intel's Tremont Microarchitecture

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: