Announcement

**polarathene** · 11 June 2017, 06:48 AM

Originally posted by Michael View Post

I'll need to see if I have any motherboards that allow limiting PCI-E 3.0 slots to PCI-E 2.0..... Back in the day I know some DRM drivers had options for whether to run at PCI 1 or 2.0, but don't recall anything for 2 vs. 3.... Or am I missing something?

I remember wanting to find good information on this too, for some reason it didn't occur to me that Phoronix most definitely had the hardware resources and experience to do a good job at this :P I've heard GPUs can go into the x1 and x4 slots too(either requiring a bit of modifying on the slot or a "riser"? I'd be curious how well that works.

While at it, if the results have some cases where the bandwidth is higher than what thunderbolt supports which eGPUs rely on, it'd be nice to have confirmation if highend cards get capped performance on certain workloads(like compute with heavy bandiwdth usage). I know there was a youtube video covering the razer eGPU and PCI speeds of x16 and x8 3.0 I think(maybe x4 too), but I don't think it was informative enough.

**polarathene** · 11 June 2017, 06:53 AM

Originally posted by quaz0r View Post

I'm sure you were truly dying to know that.

"There's probably a better way."

No? Your idea of an improvement was move the GPU from a flexible PCIe slot to a socket only supporting GPUs? So I asked about multi-gpu, where with PCIe no problem, usually you can have two cards + an intergrated gpu no problem these days. I imagine paying for additional socket per GPU would bring the price up more?

I don't see what is wrong with GPU using PCIe slots, many other cards can benefit from using these, USB and thunderbolt is a good example. Highspeed networking cards too.

**Azpegath** · 11 June 2017, 07:29 AM

Originally posted by polarathene View Post

I remember wanting to find good information on this too, for some reason it didn't occur to me that Phoronix most definitely had the hardware resources and experience to do a good job at this :P I've heard GPUs can go into the x1 and x4 slots too(either requiring a bit of modifying on the slot or a "riser"? I'd be curious how well that works.

While at it, if the results have some cases where the bandwidth is higher than what thunderbolt supports which eGPUs rely on, it'd be nice to have confirmation if highend cards get capped performance on certain workloads(like compute with heavy bandiwdth usage). I know there was a youtube video covering the razer eGPU and PCI speeds of x16 and x8 3.0 I think(maybe x4 too), but I don't think it was informative enough.

I'm actually running a R290 in a PCI-E 2.0 slot, since the various tests I've seen show no difference at all, so I haven't prioritised buying a new motherboard (with all that it entails).

**torsionbar28** · 11 June 2017, 02:12 PM

Originally posted by Nelson View Post

overkill? How do you feed multiple xeon-phis or GPUs for compute?

A single PCI-E 1x slot is fine for xeon-phi or GPU compute. Google for "bitcoin rig" to see real world examples. GPU on a 1x PCI-E slot performs no differently than on 16x PCI-E slot for compute applications. I've benchmarked it myself. That's the entire point - the bandwidth intensive stuff happens *inside* the card, with its own dedicated CPU and memory, so the only thing traversing the PCI-E bus is the initial data set getting loaded in, and the results coming out. These faster 4.0 and 5.0 PCI-E revisions do nothing to benefit GPU compute applications.

**torsionbar28** · 11 June 2017, 02:18 PM

Originally posted by quaz0r View Post

OMG seriously it is time to honestly re-evaluate where we are at and where we are going with regard to computing, and implement a better overall design. This PCI expansion slot business is a throwback to the early days of computing where anything and everything might and did exist as an expansion card. This PCI 5000.0 with x3000 lanes and whatever is for one thing and one thing only: a GPU. Whether you are into compute or gaming or whatever, that is what it is for. Just make motherboards with a GPU socket already, and dispense with this expansion slot nonsense. Or whatever the answer is, surely there must be an answer more appropriate than a x3000 PCI slot that you plug this huge board + processor into and oftentimes have to plug in a bunch of extra power connectors and everything too. You are practically plugging a second motherboard+proc+ram+power into a friggin PCI slot. It is ridiculous.

Sounds like someone doesn't know his history, and isn't familiar with modern enterprise computing. It's not uncommon to have multiple 10Gb ethernet cards, and multiple 32Gb Fibre Channel cards installed in a server. Add a few GPU's for compute, and you can quickly consume a whole lot of PCI-E expansion slots. Power is more efficiently delivered to these cards via larger gauge wiring, as opposed to motherboard traces to the slot. To keep costs low, technology like PCI-E expansion bus is shared between server and peecee. Are you proposing that we go back to the days of extremely expensive proprietary hardware, rather than low cost commodity pc's? Because that's what it sounds like. Visit apple.com and consider buying a macbook or an iPad - you won't have worry yourself with all this complicated technical stuff.

**edgmnt** · 11 June 2017, 06:12 PM

How different is graphics memory, really? The bulk of it can be single-ported DRAM. Only the framebuffer makes use of dual-porting, but that's usually a small fraction.

It makes me wonder whether unifying system and video RAM makes sense, where computing resources are seen from an asymmetric multi-processing and NUMA point of view. This may require a higher degree of standardization.

**GI_Jack** · 11 June 2017, 07:31 PM

Originally posted by bug77 View Post

Yeah, we're not exactly bandwidth starved (outside of special use cases), but it doesn't hurt to have it readily available in the future. What we are starting to be short of is PCIe lanes. Pretty soon all internal storage will be connected to PCIe, external stuff will go Thunderbolt. If not more lanes, we'd need some kind of multiplexer that can shove several slower peripherals onto a single PCIe lane or something like that.

Why do we need more PCI-e lanes then? Why isn't 22 lanes enough? Do we run more than 22 devices? I think not.

In the future, devices that are currently x16 or x8 can run on x4 and x2, and perhaps even x1.

Ideally, we could have one device per lane, with only special exceptions. It'd make the wiring far more simple.

**jabl** · 12 June 2017, 03:51 AM

Originally posted by torsionbar28 View Post

A single PCI-E 1x slot is fine for xeon-phi or GPU compute. Google for "bitcoin rig" to see real world examples. GPU on a 1x PCI-E slot performs no differently than on 16x PCI-E slot for compute applications. I've benchmarked it myself. That's the entire point - the bandwidth intensive stuff happens *inside* the card, with its own dedicated CPU and memory, so the only thing traversing the PCI-E bus is the initial data set getting loaded in, and the results coming out. These faster 4.0 and 5.0 PCI-E revisions do nothing to benefit GPU compute applications.

You seem to be assuming that bitcoin mining and/or whatever you have benchmarked is representative of all GPGPU applications. If this were true, why, for instance, did Nvidia go through all the trouble and expense of creating NVLink for their high-end Teslas instead of just using a 1x PCI-E connector?

**jabl** · 12 June 2017, 04:03 AM

Originally posted by edgmnt View Post

How different is graphics memory, really? The bulk of it can be single-ported DRAM. Only the framebuffer makes use of dual-porting, but that's usually a small fraction.

I think the main difference between GDDRX used in consumer graphics cards and DDRX used for CPU's is that GDDR is designed to be soldered on the board and can thus provide more BW than the slot-based design for plain DDR. The flip side being that you can expand DDR memory.

That being said, the memory controllers are vastly different. CPU memory controllers are optimized to minimize latency for a few threads, and make extensive use of caching, prefetching etc. etc. GPU's throw all that out of the window and instead rely on a massive number of hardware threads to drive the memory subsystem in parallel. Thus a GPU optimized design doesn't care much for latency, aggregate BW is all that matters.

Higher end GPU type devices (NVidia Tesla, Xeon Phi) use HBM2/HMC nowadays rather than GDDR/DDR which is, again, different.

It makes me wonder whether unifying system and video RAM makes sense, where computing resources are seen from an asymmetric multi-processing and NUMA point of view. This may require a higher degree of standardization.

As alluded to above, unified memory means compromises for either the CPU side, the GPU side, or both. That being said, from a programming perspective, unified, or at least cache coherent memory is easier, and I think this is something vendors are looking into.

**KellyClowers** · 12 June 2017, 10:17 AM

Originally posted by zamadatix View Post

ExpressCards aren't common anymore because Thunderbolt can provide x4 PICe, USB 3.1, and Display Port over the same connector - there is no need to update the ExpressCard standard, it's been superseded.

No. Far better to have an internal, but easily swappable card than stupid external device on a cord

Announcement

PCI Express 4.0 Is Ready, PCI Express 5.0 In 2019

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment