Announcement

Collapse
No announcement yet.

AMD Begins Prototyping CRIU Support For ROCm Compute

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MadeUpName
    replied
    Miner wants to switch his rig over to do his day job and then switch it back at the end of the day.

    Leave a comment:


  • baryluk
    replied
    Originally posted by bezirg View Post
    What is the usecase for this? Genuinely curious.
    One of the uses is HPC (High Performance Computing). You might be running a multi-day (or multi-week) simulation on a cluster of computers with GPUs and CPUs. If during that period any of them fails, or crashes, your computations are scraped basically, which is a huge waste of resources and money (researchers usually apply for grant for HPC cluster time, and then are allowed to use that time, but no more, so if the system or program, or one node crashes, they are screwed, possibly delaying their research by a year or two). Granted, not all HPC uses are like that, but a lot of them are. With CRIU, you periodically (i.e. every 2 hours) do a dump of state of CPU, GPU, memory, network sockets, opened files, etc, to the storage, then continue. This process usually will often take just minutes. If the system crashes, you can restore from the previous good checkpoint, and continue. You pay a little in inefficiency and the time to test the code that it actually can do checkpoint and restore (but that can be tested on smaller jobs or even on a single workstation), but you improve reliability a lot, and you are safe from risks I mentioned above.

    Leave a comment:


  • Setif
    replied
    Originally posted by bezirg View Post
    What is the usecase for this? Genuinely curious.
    Containers.
    CRIU - Wikipedia
    CRIU

    Leave a comment:


  • tildearrow
    replied
    I just hope this comes in handy for GPU resets....

    (by the way, CRIU sounds a lot like cryo)

    Leave a comment:


  • bezirg
    replied
    What is the usecase for this? Genuinely curious.

    Leave a comment:


  • phoronix
    started a topic AMD Begins Prototyping CRIU Support For ROCm Compute

    AMD Begins Prototyping CRIU Support For ROCm Compute

    Phoronix: AMD Begins Prototyping CRIU Support For ROCm Compute

    As part of AMD's growing HPC focus and maturing of their Radeon Open eCosystem GPU compute stack, they ended out this week by making public a prototype implementation of CRIU support for ROCm...

    https://www.phoronix.com/scan.php?pa...CRIU-Prototype
Working...
X