Announcement

**Jumbotron** · 21 December 2022, 06:40 AM

Wow! Nothing more to say.

**gosh000** · 21 December 2022, 09:01 AM

Originally posted by Jumbotron View Post

Wow! Nothing more to say.

I have something to say - what are numbers for C and Rust?

**coder** · 21 December 2022, 09:16 AM

Originally posted by gosh000 View Post

I have something to say - what are numbers for C and Rust?

I don't think it should matter, unless there's something I'm missing about Rust's memory-management behavior.

Even then, you can't lump all C programs into a single category. It just depends on the program.

Here are some more MGLRU benchmarks, using a dual-EPYC 75F3 (32-Core, each) server:

https://www.phoronix.com/news/MGLRU-...22-Performance

**gosh000** · 21 December 2022, 10:25 AM

Thank you, coder

I was curious what numbers are for languages/runtimes without garbage collector.

**igxqrrl** · 21 December 2022, 02:14 PM

Decades ago I took a parallel programming class. We were using an ncube system with I don't remember how many CPUs. One of the assignments was to write a parallel matrix multiply algorithm and look at how it scaled as we increased the number of CPUs used.

One of the guys in the class was a Russian wunderkind. His single-CPU performance blew everyone else's full parallel performance out of the water, because he has looked at the CPU architecture and tweaked his algorithm to fit code and data entirely in the cache.

As a hardware engineer we get excited about new CPUs that offer 15% higher IPC, but there is still so much performance to be gained from software alone. This is a great example.

**gosh000** · 21 December 2022, 02:56 PM

Being resourceful helps. I read (recently) that Amazon Web Services contributed code to FFMPEG project - https://aws.amazon.com/blogs/opensou...on-processors/

**coder** · 21 December 2022, 04:32 PM

Originally posted by igxqrrl View Post

Decades ago I took a parallel programming class. We were using an ncube system with I don't remember how many CPUs. One of the assignments was to write a parallel matrix multiply algorithm and look at how it scaled as we increased the number of CPUs used.

One of the guys in the class was a Russian wunderkind. His single-CPU performance blew everyone else's full parallel performance out of the water, because he has looked at the CPU architecture and tweaked his algorithm to fit code and data entirely in the cache.

Perhaps one of the lessons you learned was that column-major accesses are a great way to trigger cache thrashing? This is definitely true of image processing, as image widths have an annoying tendency to be a multiple of some significant power of 2.

**miskol** · 21 December 2022, 05:35 PM

Originally posted by gosh000 View Post

Being resourceful helps. I read (recently) that Amazon Web Services contributed code to FFMPEG project - https://aws.amazon.com/blogs/opensou...on-processors/

I also see many merge requests in x264 and x265 for arm64

Announcement

Linux MGLRU Results Are Looking Great On Ampere Altra

Linux MGLRU Results Are Looking Great On Ampere Altra

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment