Announcement

**Weasel** · 13 February 2019, 12:38 PM

Originally posted by Stefem View Post

What many fail to realize is that the critical and demanding part isn't learn to write in CUDA (which is basically C or other common language with some extension) but to design or redesign the algorithm for massively parallel throughput oriented processors, once you have done that it really doesn't matter if it's written in C/C++ CUDA, CUDA Fortran or whatever, it will be trivial to port to OpenCL. That's true especially considering that OpenCL followed the steps lay down by CUDA and have a really similar programming model.
Programming for GPU (or any other throughput oriented processor) require a completely different approach compared to a CPU, you are forced to change your programming paradigm in order to take advantage of its strength (that's why being x86, although not binary compatible, wasn't a selling point for Intel's Xeon Phi), this force you to rethink what you are doing to exploit as much parallelism as possible, to the extent that your new code will probably run better even on your old CPU only cluster.
HPC developers have also different needs which can't be fulfilled by OpenCL alone, that's why they put so much pressure on NVIDIA to develop CUDA Fortran, C++ and Python variants.

Basically, what you're trying to say is, it weeds out trash developers who think they are good because they "make super simple code" with a million branches everywhere so that any moron can understand. It's somewhat closer to actual hardware implementations, so of course it's harder for those simpletons.

**Stefem** · 13 February 2019, 01:55 PM

Originally posted by Weasel View Post

Basically, what you're trying to say is, it weeds out trash developers who think they are good because they "make super simple code" with a million branches everywhere so that any moron can understand. It's somewhat closer to actual hardware implementations, so of course it's harder for those simpletons.

I'm not sure I correctly understand what you said, that's my fault of course.

**Weasel** · 14 February 2019, 12:24 PM

Originally posted by Stefem View Post

I'm not sure I correctly understand what you said, that's my fault of course.

I'll try to summarize.

The number one enemy of auto-parallelization are branches and different code paths. Stuff like conditional if statements, if you don't know what that is.

However, most devs which preach writing "simple, clean code" for uneducated monkeys and other bullshit like that (reminder: that's an opinion, for example for someone in hardware design, branchless code is very clean), love polluting it with branches upon branches upon branches for "simplicity" and to keep the code "clean" and "readable" and "maintainable" and other retarded asspull words (which are not measurable and have zero scientific basis, you just have to either go along with it or disagree, you can't argue about it with facts).

Meanwhile if you want to write efficient parallelizable code, or GPU code, you have to use a different mindset and use some actual brain instead of writing "simple code" for monkeys. You have to use masks and think of signal flow instead of placing branches everywhere. This is also how hardware gets designed, but of course it gets designed in specialized languages. It's still much closer than code with branches. Hardware is by definition branchless.

**Stefem** · 22 February 2019, 12:18 PM

Originally posted by Weasel View Post

I'll try to summarize.

The number one enemy of auto-parallelization are branches and different code paths. Stuff like conditional if statements, if you don't know what that is.

However, most devs which preach writing "simple, clean code" for uneducated monkeys and other bullshit like that (reminder: that's an opinion, for example for someone in hardware design, branchless code is very clean), love polluting it with branches upon branches upon branches for "simplicity" and to keep the code "clean" and "readable" and "maintainable" and other retarded asspull words (which are not measurable and have zero scientific basis, you just have to either go along with it or disagree, you can't argue about it with facts).

Meanwhile if you want to write efficient parallelizable code, or GPU code, you have to use a different mindset and use some actual brain instead of writing "simple code" for monkeys. You have to use masks and think of signal flow instead of placing branches everywhere. This is also how hardware gets designed, but of course it gets designed in specialized languages. It's still much closer than code with branches. Hardware is by definition branchless.

Now it's clear, sorry but English is not my first language and I didn't understand if you were serious or sarcastic

Announcement

NVIDIA Opens Up The Code To StyleGAN - Create Your Own AI Family Portraits

Comment

Comment

Comment

Comment