Announcement

Collapse
No announcement yet.

System76 Launches The Thelio Mega With Threadripper + Four GPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    It is a nice classy case!

    Comment


    • #22
      Originally posted by s_j_newbury View Post
      As I understand it HIP should be source translatable from CUDA and work on AMD. Has anybody tried running CUDA applications translated through "hipify" on ROCm?

      My hardware doesn't support ROCm due to lacking PCIe atomics, as I've previously mentioned so I can't try it myself.
      Last time I experimented with converting CUDA code to OpenCL, it worked OK for simple stuff, but was... less than successful for anything else. I've not tried recently, though; it may be more forgiving now.

      ...

      Anyone else confused? How can you get 256GB of system memory ("including 128GB ECC memory")?
      Originally posted by "System76 Blog Page
      Send this shopping list straight to your purchasing manager: Thelio Mega maxes out at 4 NVIDIA Quadro GPUs, 128 threads on an AMD Ryzen Threadripper CPU, 256GB of memory (including 128GB ECC memory), 96TB of storage (including 32TB NVMe storage), and Dual 10GB Ethernet ports, which operate at speeds 10 times faster than the speed of a standard Ethernet port.
      I didn't think you could mix ECC/non-ECC RAM... or at least, not and have the ECC still work.

      I'd like to see this sort of thing but with that Asrock Epyc board which can take 1TB (2TB when the 256GB DIMMs arrive) of RAM, and still have quad GPUs.

      edit: It says how power hungry the RTX 3090 is that the maximum number of those is three (presuming supply ever arrives, I guess)... I'm actually quite impressed: I maxed a box out (quad RTX 8000s, 256GB RAM, 3990WX, 32TB of NVMe, 24TB of SATA SSDs) and it came to less than $50,000. More than half that is just the GPUs, so that's not bad going. If I lived in the US, I'd think seriously about using them for work systems.
      Last edited by Paradigm Shifter; 21 October 2020, 01:03 AM.

      Comment


      • #23
        Originally posted by Paradigm Shifter View Post

        Last time I experimented with converting CUDA code to OpenCL, it worked OK for simple stuff, but was... less than successful for anything else. I've not tried recently, though; it may be more forgiving now.
        HIP isn't OpenCL, it's AMDs "CUDA".
        Originally posted by AMD
        The Heterogeneous Computing Interface for Portability (HIP) is a vendor-neutral C++ programming model for implementing highly tuned workload for GPUs. HIP (like CUDA) is a dialect of C++ supporting templates, classes, lambdas, and other C++ constructs.
        I'm actually surprised it hasn't gained more traction.

        Comment


        • #24
          The price of UDIMM ECC and speed rating makes the EPYC cheaper with RDIMM ECC. unless you play in the 32 to 64GB where it doesn't matter and don't really need ECC. Just run your code 2xand compare the end result. If you need >256GB of ram, then the run time makes it hard to do it 2x, in my case 2 days vs 4 days run time.

          Comment


          • #25
            Originally posted by s_j_newbury View Post
            HIP isn't OpenCL, it's AMDs "CUDA". I'm actually surprised it hasn't gained more traction.
            Ah, I just assumed they'd rebranded their implementation of it, rather than completely abandoned OpenCL as a compute infrastructure and gone about spending all the time, energy and money on making a direct CUDA competitor, after having spent so much time, energy and money trying to push OpenCL.

            And while I'd love to see a competitor to CUDA (which is one of the reasons I'm actually cheering for Intel to make oneAPI a success) most of the places that use CUDA heavily... like stability, or rather, dislike change.

            It'll take at least 5 years of consistent, dedicated support without platform shakeups for both oneAPI and HIP to be looked at seriously, and probably another 5 after that for them to make any actual inroads against CUDA.

            Perhaps I'm being pessimistic, but CUDA has an absolutely dominant position, and nVidia show no signs of slowing their GPGPU push.

            Originally posted by tchiwam View Post
            The price of UDIMM ECC and speed rating makes the EPYC cheaper with RDIMM ECC. unless you play in the 32 to 64GB where it doesn't matter and don't really need ECC. Just run your code 2xand compare the end result. If you need >256GB of ram, then the run time makes it hard to do it 2x, in my case 2 days vs 4 days run time.
            Agreed. Except multiply the timescales I deal with by months, rather than days.

            Comment


            • #26
              Originally posted by Paradigm Shifter View Post
              Ah, I just assumed they'd rebranded their implementation of it, rather than completely abandoned OpenCL as a compute infrastructure and gone about spending all the time, energy and money on making a direct CUDA competitor, after having spent so much time, energy and money trying to push OpenCL.
              We have not abandoned OpenCL - it's just another language that runs on the ROCm framework, like HIP and HCC.

              Comment


              • #27
                Originally posted by bridgman View Post
                We have not abandoned OpenCL - it's just another language that runs on the ROCm framework, like HIP and HCC.
                Thanks.

                Comment


                • #28
                  Originally posted by Paradigm Shifter View Post

                  Agreed. Except multiply the timescales I deal with by months, rather than days.
                  I do have a while true loop forever code, there will never be a end date on those 2 calculations and I get a pearl once in a while. But these 2 pieces of code are nearly benefiting from a random bit error, the program code is small enough to fit in the L1 cache, data fits in L2 in one case and L1 in the other so it is at least protected. And the errors should in theory go away and recover.

                  I did a type of trolley save in my long processing codes, going trough the pipe locking a state and allowing after to flush and before to prime while going trough all states once a day. I have saved my bacon a few times with this and allowed me to do some replays. Took 1 threads (IO speed limit ) and a few locks.

                  Comment

                  Working...
                  X