Announcement

**pszilard** · 07 February 2017, 05:50 PM

Originally posted by defaultUser View Post

I really do not see a point of thing like this. For all measures is very expensive nearing the price of the Nvidia DGX-1 (maxed out configurations) and without all the perks of this system. Like tight integration with the nvidia software (nvidia now make a ton of deep learning libraries) even if most of these libraries are opensource or have an opensource equivalent is nice to have a thing ready to go. From the hardware standpoint, there is no mention to support NVLink, that is really important for large scale problems using GPU nor there any indications of infiniband support out of the box. For people starting with these thing probably a better option is to spin some aws instances (prices ~0.9 up to 15 a hour).

There's some truth in that it's a rather expensive machine, but let me add a few corrections/points.

The big difference in the system is that you get Broadwell CPUs and PCIE P100s. You can actually ~match the CPU side of DGX-1 config in a ~$80k setup (or get 30-40% better CPU/cache at max); on the other hand the PCIE version of P100 is clocked ~10% lower and has no NVLink. There is probably little to no space for Ethernet and IB cards, but you can get ConnectX 4 EDR for <=$1k per adapter, so it's not a huge bump in price. Of course, if you need IB, you'll need to pull out some of the GPUs.

Whether the PCI-E vs NVLink matters or not depends on the problem. If you ask the marketing folks, you'll hear one thing but if you ask the engineers, you might very well hear another. Many problems have enough independent parallelism that as long as you program things smartly you can overlap communication and computation and get decent performance even though you have to use PCIE. Now, what matters a lot is what kind of PCIE arch/topology did they implement. Depending on how that looks like it can be shit or great (see this blog by R Walker if you're interested in details: https://exxactcorp.com/blog/explorin...communication/)

Secondly, much if not all the software that NVIDIA develops works equally well on a System76, Dell, Supermicro, etc. machines, even on your own hack-box -- as long as you buy Tesla. There are optimizations that they can do with NVLink present, but whether you need it and how much the difference it makes is i) very problem-dependent and ii) in many cases non-trivial high-performance engineering (often research) question. AFAIK, actually model-parallel learning allows scaling on PCIE (PLX trees) equally well as on NVLink. Check this talk if you want to learn more, Scott Le Grand is a super-smart guy (second half, from about 18 minutes):
video: http://on-demand.gputechconf.com/gtc...deo/S6492.html
PDF: http://on-demand.gputechconf.com/gtc...r-dynamics.pdf

**pszilard** · 07 February 2017, 05:55 PM

Originally posted by defaultUser View Post

For Scientific computing if you are communicating too much with the gpu (CPU <-> GPU), basically you are negating the advantage of the GPU.

Nope. As long as you hide communication by overlapping with computation you've a free ride. Secondly, as soon as you have non-embarrassingly parallel problems that you need to run across multiple GPUs/nodes you _need_ to communicate anyway.

Originally posted by defaultUser View Post

For deep learning actually this is very common, and the performance is sub optimal not only due to the use of CPU but the time to copy data back and forth. Due to these factors things like NVLink (that allows inter GPU and CPU <-> GPU, communications with a higher bandwidth and reduced latency ) are really useful/desired.

I'm no derrrplearner, but AFAIK, that's only true if you rely only on data-parallelism. As Scott Le Grand said "Go Model Parallel or Go Home"

(see above)

**pszilard** · 07 February 2017, 05:57 PM

PS: I'm curious to know the PCIE topology of the System76 machine. Anybody has specs?

**horizonbrave** · 09 February 2017, 02:09 AM

Well, isn't the motto still "fuck you" nvidia (for their poor Linux and general attitude)?
I can't see any product from System 76 with a discrete AMD graphic card or APU... well shouldn't someone say System 76 fuck you?
I'm not attracted by their offer.. CompuLab on the other hand is doing good!

**sarmad** · 09 February 2017, 02:16 AM

Originally posted by horizonbrave View Post

Well, isn't the motto still "fuck you" nvidia (for their poor Linux and general attitude)?
I can't see any product from System 76 with a discrete AMD graphic card or APU... well shouldn't someone say System 76 fuck you?
I'm not attracted by their offer.. CompuLab on the other hand is doing good!

I think I read somewhere that they might consider AMD GPUs when the driver situation improves.

Announcement

System76 Rolls Out A NVIDIA-Powered GPU Linux Server

Comment

Comment

Comment

Comment

Comment