Announcement

**sinepgib** · 08 March 2023, 09:58 AM

Originally posted by Weasel View Post

The fact that they have to change the C side API shows how much Rust sucks.

It's so sad that they get to fix bugs :'(

[PATCH RFC 10/18] drm/scheduler: Add can_run_job callback - Asahi Lina

https://lore.kernel.org/dri-devel/[email protected]/

[PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down - Asahi Lina

https://lore.kernel.org/dri-devel/[email protected]/

Those are the only changes to existing C code. There are many little helpers added to Rust specific files, but that's all that actually changes for other drivers. Bug fixes. Terrible.

**mdedetrich** · 08 March 2023, 11:23 AM

Originally posted by Weasel View Post

The fact that they have to change the C side API shows how much Rust sucks.

Or more accurately, reflects how much you have no clue about what you are talking about

**Weasel** · 08 March 2023, 04:06 PM

Originally posted by sinepgib View Post

It's so sad that they get to fix bugs :'(

[PATCH RFC 10/18] drm/scheduler: Add can_run_job callback - Asahi Lina

https://lore.kernel.org/dri-devel/[email protected]/

[PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down - Asahi Lina

https://lore.kernel.org/dri-devel/[email protected]/

Those are the only changes to existing C code. There are many little helpers added to Rust specific files, but that's all that actually changes for other drivers. Bug fixes. Terrible.

What bug fixes? The first one is literally a change to accommodate her Rust driver. LOL. (funnily how it got shot down immediately, too)

And I wasn't even talking about these but the other changes they wanted... (but kept it minimal "for now")

Why clowns like to post links when they don't or can't even read them? Are you oiaohm's disciple?

**Weasel** · 08 March 2023, 04:08 PM

Originally posted by mdedetrich View Post

Or more accurately, reflects how much you have no clue about what you are talking about

Yeah just like the guy above you, link without even understanding what they link to.

**mdedetrich** · 08 March 2023, 07:13 PM

Originally posted by Weasel View Post

Yeah just like the guy above you, link without even understanding what they link to.

The first one is nothing specific to Rust, its adding a generic callback and the second one is resolving a bug.

Get a bigger shovel, you are going to need it

**smitty3268** · 09 March 2023, 12:29 AM

Originally posted by Weasel View Post

And I wasn't even talking about these but the other changes they wanted... (but kept it minimal "for now")

Please point out these other changes they wanted which show that Rust sucks.

I'm sure we're all interested to see it, since the patches this article was about apparently don't cover it.

**Ironmask** · 09 March 2023, 01:08 AM

Don't engage Weasel, he's just a troll. Last thread he was in he derailed it for several pages arguing that any software that doesn't handle literally every possible allocation failure is bad software made by bad programmers. Trust me, he has nothing interesting to say.

**oiaohm** · 09 March 2023, 03:28 AM

Originally posted by Weasel View Post

What bug fixes? The first one is literally a change to accommodate her Rust driver. LOL. (funnily how it got shot down immediately, too)

And I wasn't even talking about these but the other changes they wanted... (but kept it minimal "for now")

Why clowns like to post links when they don't or can't even read them? Are you oiaohm's disciple?

No Weasel read again. The first post is to support a feature the rust driver is first to use. Yes you want to claim it to accommodate rust driver but that not really the case.

[PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down - Asahi Lina

https://lore.kernel.org/dri-devel/[email protected]/

Need to read second patch the person quoted.

drm_sched_fini() currently leaves any pending jobs dangling, which
causes segfaults and other badness when job completion fences are
signaled after the scheduler is torn down.

Explicitly detach all jobs from their completion callbacks and free
them. This makes it possible to write a sensible safe abstraction for
drm_sched, without having to externally duplicate the tracking of
in-flight jobs.

This shouldn't regress any existing drivers, since calling
drm_sched_fini() with any pending jobs is broken and this change should
be a no-op if there are no pending jobs.

I like how rust driver developer write should not call drm_sched_fini() with any pending jobs. Has it happened before where drm_sched_fini() is called with jobs not closed and there be issues reported yes it has with amdgpu in 2022 with some really quirky behavior then 2021 with Intel. This problem has not just happened once.

Making each open source drm driver track Job status individually has been a source of bugs.

Both patches are linked to the same problem Weasel. Before the rust GPU drivers the design is all GPU drivers basically host their own internal job tracking. After both of these changes the generic job tracking is the only one. Duplicate tracking of in-flight jobs is cause of sync errors if not careful yes historic ways AMD and Intel and other in kernel graphics drivers got this wrong.

Yes I found out many times post the key information as the second link and lazy weasel would not read the second one and go off on a target with the first one as normal without seeing the first one why it is is explained by the second one.

Please note those both patches are part of the same patch set as in the same collective work.

Re: [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down - Asahi Lina

https://lore.kernel.org/dri-devel/[email protected]/#t

I actually don't know of any way to actively abort jobs on the firmware,
so this is pretty much the only option I have. I've even seen
long-running compute jobs on macOS run to completion even if you kill
the submitting process, so there might be no way to do this at all.
Though in practice since we unmap everything from the VM anyway when the
userspace stuff gets torn down, almost any normal GPU work is going to
immediately fault at that point (macOS doesn't do this because macOS
effectively does implicit sync with BO tracking at the kernel level...).

Weasel the Apple GPU is cursed with lack of documentations and quirks. Lot of the issues this is running into have been issue for other reversed engineered drivers.

And if you also read it all the threads about it you will also find its rust mandated safety that the rust using developer wants to unsafe code as safe as possible.

Yes the can_run_job callback is partly explained by the quirk here you put job into Apple GPU you may not be able to stop it so have just tanked the complete system. So limited GPU knowledge there are more cases that a job cannot schedule you don't have option of start a job on a GPU and go opps I had to kill that and reQueue. This is why there is a lot of debate if can_run_job callback will or will not be let in.

Weasel something to consider lot of the Linux DRM stack is design on the theory that you have full GPU documentation access and those reversing drivers don't. These features the AGX Apple DRM driver developer is asking for could have been asked for by a person coding another driver written in C that is also missing full GPU documentation. Same set of problems here.
1) Lack of means to kill stuff on demard.
2) lack of understanding of what fencing need to be done.

Do note how AMD and Nvidia developers talk about how everything should be explicit sync and here is the AGX apple devices with implicit sync saying jobs will not terminate on GPU until implicit sync is complete. AGX is not the only embedded GPU with wackiness like this. Of course it can get worse in embedded GPu where you do the soft reset and the tasks before the hard reset and in fact still running though the gpu so limiting max jobs until they are cleared out.(yes when on soc chip power cycling the GPU is not a option)

So weasel both of these changes could be pure generic just by badness the first user just happens to be a rust coded driver. The problems of jobs not ending when expected predate AGX Apple DRM causing quite a few reported kernel panics with embedded GPUs. The Linux kernel DRM generic stack design has presumed well behaving GPU hardware bad news the real world is not always that case then it been up to the GPU driver to cover for this these changes are update the DRM generic stack to handle these cases better.

**Weasel** · 09 March 2023, 10:10 AM

Originally posted by oiaohm View Post

No Weasel read again. The first post is to support a feature the rust driver is first to use. Yes you want to claim it to accommodate rust driver but that not really the case.

And I bet it's needed due to the way Rust handles "safety". And she'd have to use "unsafe" otherwise with the proposed solutions to the patch. Oh no.

**Weasel** · 09 March 2023, 10:11 AM

Originally posted by mdedetrich View Post

The first one is nothing specific to Rust, its adding a generic callback and the second one is resolving a bug.

Get a bigger shovel, you are going to need it

See my post to oiaohm. I bet she doesn't like the alternatives proposed because she'd have to use more "unsafe" in Rust.

Announcement

Initial Rust DRM Abstractions, AGX Apple DRM Driver Posted For Review

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment