Announcement

**bezirg** · 01 May 2021, 02:08 PM

What is the usecase for this? Genuinely curious.

**tildearrow** · 01 May 2021, 02:18 PM

I just hope this comes in handy for GPU resets....

(by the way, CRIU sounds a lot like cryo)

**Setif** · 01 May 2021, 03:02 PM

Originally posted by bezirg View Post

What is the usecase for this? Genuinely curious.

Containers.
CRIU - Wikipedia
CRIU

**baryluk** · 01 May 2021, 10:11 PM

Originally posted by bezirg View Post

What is the usecase for this? Genuinely curious.

One of the uses is HPC (High Performance Computing). You might be running a multi-day (or multi-week) simulation on a cluster of computers with GPUs and CPUs. If during that period any of them fails, or crashes, your computations are scraped basically, which is a huge waste of resources and money (researchers usually apply for grant for HPC cluster time, and then are allowed to use that time, but no more, so if the system or program, or one node crashes, they are screwed, possibly delaying their research by a year or two). Granted, not all HPC uses are like that, but a lot of them are. With CRIU, you periodically (i.e. every 2 hours) do a dump of state of CPU, GPU, memory, network sockets, opened files, etc, to the storage, then continue. This process usually will often take just minutes. If the system crashes, you can restore from the previous good checkpoint, and continue. You pay a little in inefficiency and the time to test the code that it actually can do checkpoint and restore (but that can be tested on smaller jobs or even on a single workstation), but you improve reliability a lot, and you are safe from risks I mentioned above.

**MadeUpName** · 02 May 2021, 04:41 PM

Miner wants to switch his rig over to do his day job and then switch it back at the end of the day.

Announcement

AMD Begins Prototyping CRIU Support For ROCm Compute

AMD Begins Prototyping CRIU Support For ROCm Compute

Comment

Comment

Comment

Comment

Comment