AMD AOCC 4.0 Arrives For Squeezing More Performance Out Of Zen 4

gganesh replied

23 November 2022, 11:59 AM
Originally posted by ptr1337 View Post

The only warning/error the complete log does show:

https://paste.cachyos.org/p/884367a.log

Im on CachyOS(archlinux), Kernel 6.0.9 got compiled
Config: https://paste.cachyos.org/p/fd6becb.9-2-cachyos-lto
CPU: 5900X

Thank you! We didn't have much to offer with ThinLTO. We have root caused the issue. We are trying to resolving it and also checking\reducing the compilation time!
Will keep you posted!
Leave a comment:
ptr1337 replied

20 November 2022, 04:50 PM
Originally posted by gganesh View Post

Hi! I work for AMD and in AOCC!
Can you please let us know what error you are facing and your settings for the compilation. We would like to track this issue down and resolve it.
Thank you!

The only warning/error the complete log does show:

https://paste.cachyos.org/p/884367a.log

Im on CachyOS(archlinux), Kernel 6.0.9 got compiled
Config: https://paste.cachyos.org/p/fd6becb.9-2-cachyos-lto
CPU: 5900X

Clang 14, 15 and 16 are working correctly on the same source.

And because the performance of the compiler itself:

AOCC
Till the point where the compiler failed to compile the kernel:
```
Executed in 23.25 mins fish external
usr time 480.67 mins 131.00 micros 480.67 mins
sys time 23.72 mins 112.00 micros 23.72 mins
```

Clang-15 compiled with ThinLTO, PGO and optimized with BOLT Source:https://mirror.cachyos.org/llvm-bolt-15.tar.zst
I stopped the compilation at the point where AOCC failed:
```
__________________________________________________ ______
Executed in 594.25 secs fish external
usr time 180.36 mins 200.00 micros 180.36 mins
sys time 19.06 mins 90.00 micros 19.06 mins

```

How it can be that a compiler is that much slower?

Regards

Peter Jung

Last edited by ptr1337; 20 November 2022, 05:05 PM.
Likes 1
Leave a comment:
ptr1337 replied

20 November 2022, 04:19 PM
Originally posted by gganesh View Post

Hi! I work for AMD and in AOCC!
Can you please let us know what error you are facing and your settings for the compilation. We would like to track this issue down and resolve it.
Thank you!

Oh, sorry I have not noticed the message. Sorry for late response.
I have not saved the log. I will run a compilation once again with AOCC THINLTO and 6.0.10.
Mainly it is failing at the linking step cause of missing modules.
I will let you know as soon I have the log.

If you are on developer on AOCC, did you tested the compiler performance itself? Why it is that much slower then clang 15 or aocc 3.0 (which also was a bit slow).
I think, AMD could easily compile the AOCC easily with THINLTO and PGO, it does improve the performance a lot of the compiler.
I made some benchmarks, about improving CLANG and compiling it with THIN LTO + PGO and on top also bolt the clang binary, here you can find them.
https://github.com/ptr1337/llvm-bolt...-15-pgothinlto

Really consider to do this in a final release of the compiler.

Last edited by ptr1337; 20 November 2022, 04:48 PM.
Likes 1
Leave a comment:
gganesh replied

16 November 2022, 06:05 AM
Originally posted by ptr1337 View Post

Interesting, was a bit excited about this release, but its really sad that it got based on llvm 14.
Anyways just tested a compilation of the linux kernel with AMD AOCC CLANG + FULL LTO and it was not successful.
With the default installed clang, also based on the latest clang 14 release the kernel compiled well.

Besides that:

The compiler is slow... Commonly I built a FULL LTO Kernel between 15-20 min with my 5900X.
When aocc has failed at the linking of vmlinuz, the compilation was already more then half a hour.

Hi! I work for AMD and in AOCC!
Can you please let us know what error you are facing and your settings for the compilation. We would like to track this issue down and resolve it.
Thank you!
Likes 1
Leave a comment:
PerformanceExpert replied

14 November 2022, 10:04 AM
Originally posted by david-nk View Post

So they want their own shiny product just for the sake of it, that 100% falls under NIH syndrome for me. It's not really feasible to develop a competitive compiler from scratch, especially not for AMD, so it makes sense that it's based on LLVM. And of course it's not in any official distro repos, forcing users to jump through hoops if they want the full performance from their CPU. Most of AMD's decisions are really questionable lately.

No, you can't blame this on AMD at all - they were forced to do it by Intel. AMD sued Intel for anti-competitive practices and the way Intel used ICC to get ahead of AMD on SPEC. AMD won and got major payouts. The end result was AMD developing AOCC to counter ICC. Both compilers only exist to give good SPEC scores. Both do it by adding very questionable optimizations that would never be accepted by the open source community. Both compilers have nasty ULA clauses that only allow you to use them/report results on Intel or AMD hardware.
Likes 1
Leave a comment:
david-nk replied

14 November 2022, 08:52 AM
Originally posted by coder View Post

According to PerformanceExpert, it has some proprietary optimizations that AMD doesn't want to share.

I think the article (or another commenter?) said it's based on LLVM, so I wouldn't say it's NIH-syndrome.

So they want their own shiny product just for the sake of it, that 100% falls under NIH syndrome for me. It's not really feasible to develop a competitive compiler from scratch, especially not for AMD, so it makes sense that it's based on LLVM. And of course it's not in any official distro repos, forcing users to jump through hoops if they want the full performance from their CPU. Most of AMD's decisions are really questionable lately.
Leave a comment:
coder replied

14 November 2022, 02:58 AM
Originally posted by david-nk View Post

I'm not really in the loop about this, why does AOCC exist in the first place? Are the AMD guys having trouble getting their patches accepted upstream or is this a manifestation of NIH syndrome?

According to PerformanceExpert, it has some proprietary optimizations that AMD doesn't want to share.

I think the article (or another commenter?) said it's based on LLVM, so I wouldn't say it's NIH-syndrome.
Leave a comment:
david-nk replied

14 November 2022, 02:23 AM
I'm not really in the loop about this, why does AOCC exist in the first place? Are the AMD guys having trouble getting their patches accepted upstream or is this a manifestation of NIH syndrome?
Leave a comment:
coder replied

13 November 2022, 07:09 PM
Originally posted by PerformanceExpert View Post

Instruction scheduling on OoO cores has been pretty much abandoned - there isn't much a compiler can do there besides perhaps reducing register pressure and spills. So that's why this gets low priority.

I've found the -mtune option in GCC 10 to be worth a few %, when comparing against baseline x86-64 -- a significant amount, in today's competitive market. That said, I didn't try comparing between a recent vs. current micro-architecture.

I think cost models are useful for more than just scheduling. I believe they can influence which instructions the compiler generates. Given differences in the number of ports and their restrictions & limitations, it stands to reason that could still be a relevant factor.

Also, I believe name99 indicated that recent Apple cores can perform instruction fusion. However, because this happens early, the instructions to be fused must be consecutive. Therefore, LLVM contains a list of such instructions so the compiler can emit them in pairs, which Apple does take care to update.

I know Intel does micro-op fusion, though I'm not sure about AMD. Perhaps that's subject to similar restrictions?

Last edited by coder; 13 November 2022, 07:13 PM.
Leave a comment:
PerformanceExpert replied

13 November 2022, 10:57 AM
Originally posted by coder View Post

This is somewhat self-defeating logic, because I think fewer people are basing their purchasing decisions on SPEC or using AOCC/ICC, as time goes on. Furthermore, regardless of what you say, it's really in their interest to get basic stuff, like instruction cost models, updated for new/upcoming CPUs. Once there are engineering samples in the wild, that stuff is no longer secret.

I wonder if AMD might not have prioritized it, due to the lack of serious competition for Genoa. That could change, when their 3D cache model faces off against Sapphire Rapids Xeon Max (HBM).

You're right that SPEC as a benchmark is becoming less important, but closed-source compilers are still popular for selling servers. It allows for workload specific optimizations that cannot be used by your competitors (Intel's ICC is very infamous for this). I much prefer comparisons using GCC/LLVM (including for SPEC) since then you compare real CPU performance rather than who wrote the best compiler tricks!

Generally the most important aspect of supporting new CPUs is ensuring you can generate code for the correct ISA. For Zen 4 that would be use of AVX-512 as that gives the largest gains by far. This is not as easy as it seems given the rather complex ISA and many extensions - there was a lot of work in GLIBC recently to ensure that AVX-512 string functions correctly check for the ISA extensions they require (this may be an issue in other AVX-512 code out there since it has been written for and tested only on Intel CPUs).

Fine tuning cost models and schedulers gives far less gain. Micro-architectures have converged and are fairly similar nowadays, so the tuning from previous generation(s) works just fine. Instruction scheduling on OoO cores has been pretty much abandoned - there isn't much a compiler can do there besides perhaps reducing register pressure and spills. So that's why this gets low priority.
Likes 1
Leave a comment:

Announcement

AMD AOCC 4.0 Arrives For Squeezing More Performance Out Of Zen 4

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: