Originally posted by defaultUser
View Post
Announcement
Collapse
No announcement yet.
Hammering The AMD Ryzen 7 1800X With An Intense, Threaded Workload
Collapse
X
-
- Likes 1
-
If mem. bandwidth is not enough, you could test the 5960X with only two channels instead of 4.
Leave a comment:
-
Originally posted by defaultUser View PostProbably a good idea would be to add to the article a run of the triad benchmark, that measures the memory bandwidth. This would show the possible advantage of memory bandwidth of the Core I7 against the ryzen.
This would also bode very well for the upcoming Zen server chips, which will have *eight* channels per socket.
- Likes 1
Leave a comment:
-
Probably a good idea would be to add to the article a run of the triad benchmark, that measures the memory bandwidth. This would show the possible advantage of memory bandwidth of the Core I7 against the ryzen.
Leave a comment:
-
Originally posted by Del_ View PostFor the solver parts, it is questionable how much AVX has to offer, since the load is limited by memory bandwidth. However, for the numerical differentiation, efforts are under way to exploit it:
This is a work-in-progress PR which demonstrates possible performance improvements for flow_ebos. originally brought up by @andlaus. When specializing the the Evaluation class for a given number of...
Yes, both benches are memory bandwidth limited when you scale up. The first is a combination of numerical differentiation and an incomplete LU factorization preconditioner combined with a conjugate gradient linear solver. The second is an elliptic problem, and hence uses an algebraic multigrid based solver. Both solver approaches are known to be memory bandwidth limited and cover a large and important part of HPC, i.e., numerical solutions to partial differential equations. Typically, SMT has nothing to offer on these type of problems.
Leave a comment:
-
Originally posted by chuckula View PostDo you know whether or not this package uses AVX heavily?
This is a work-in-progress PR which demonstrates possible performance improvements for flow_ebos. originally brought up by @andlaus. When specializing the the Evaluation class for a given number of...
Originally posted by bridgman View PostNice article. I had not been aware of OPM before this.
At first glance the results suggest memory being the limiting factor - was the 5960 running quad channel ? If so, have you been able to get decently high memory speeds on the Ryzen mobos yet, or are you still running dual-channel 2133 MHz ?
- Likes 2
Leave a comment:
-
Originally posted by Brane215 View Post
1. Not true. I am frequently looking for bang/buck optimums.
2. Since Intel was until now only player here, it was only choice to optimize for. Now the picture is changing and I suspect that someone will look into what can be done with core allocations etc. Heck, even current compilers don't have Zen backends.
3. It's of no use to compare price of new part with that from ebay. Apples and oranges.
4. There is no data about memory frequency, which does influence CCX communication bandwidth directly
5, code was compiled with -mtune=generic, so far from what one would use doing soem real work.
- Likes 1
Leave a comment:
-
Originally posted by chuckula View PostThis goes to show the strength of Intel's overall core integration strategy at heavy-duty workloads that aren't just L1-cache centric microbenchmarks.
If an AMD part from 2014 had the same margin of victory that we see here over a higher-clocked Intel part with an equivalent core count that was just released this month, then not one person here would be calling the Intel part good, even if it was somewhat cheap (although a $500 chip on a platform that has seen major motherboard support issues isn't exactly "cheap" in any book).
Incidentally, even if the 5960X out of a new box is still expensive, if you are smart you can find very good open box deals.
2. Since Intel was until now only player here, it was only choice to optimize for. Now the picture is changing and I suspect that someone will look into what can be done with core allocations etc. Heck, even current compilers don't have Zen backends.
3. It's of no use to compare price of new part with that from ebay. Apples and oranges.
4. There is no data about memory frequency, which does influence CCX communication bandwidth directly
5, code was compiled with -mtune=generic, so far from what one would use doing soem real work.
- Likes 3
Leave a comment:
-
Nice article. I had not been aware of OPM before this.
Originally posted by chuckula View PostThis goes to show the strength of Intel's overall core integration strategy at heavy-duty workloads that aren't just L1-cache centric microbenchmarks.
Michael At first glance the results suggest memory being the limiting factor - was the 5960 running quad channel ? If so, have you been able to get decently high memory speeds on the Ryzen mobos yet, or are you still running dual-channel 2133 MHz ?
At one thread, the upscaling test showed similar single-core performance between the Ryzen 7 1800X and Xeon E3 1270 v5 that are clocked the same, but with this Xeon E3 part retailing for almost $200 less for this quad-core + HT workstation CPU.
The single-thread run *is* useful as a measurement of single-core performance but cost/performance comparisons really should be done at either 8-thread or at "best results for each chip", ie 8 threads for Ryzen and 4 threads for the Xeon.Last edited by bridgman; 16 March 2017, 03:13 PM.
- Likes 7
Leave a comment:
-
Hey Michael, thanks for the in-depth analysis.
Two questions:
1. Do you know whether or not this package uses AVX heavily?
2. Would it be possible to show some power-consumption numbers during one of the longer runs that uses all of the cores? [just the 5960X and 1800X would be fine]Last edited by chuckula; 16 March 2017, 02:18 PM.
Leave a comment:
Leave a comment: