Originally posted by Weasel
View Post
Not accounted for is correct. Yes not intended is also could be correct. There is a little miss. These issues with kernels having some quirk causing wineserver to deadlock get reported when reduced to unique cases about 10 times a year. Out of these cases less than 1 percent is a full kernel stop. This means that for 99%+ of these issues a watchdog process would be able to pick up that wineserver has stopped and step in prevent everything from going out of control. Kernel full stop the system already dead.
This is a mistake you can in fact design to mostly counter by adding a watchdog.
Car crashes are hard to account for yet we still build in seat belts and airbags.
Kernel blocking itself and scheduler then not giving thread any more CPU time happens more than we like and it lot more common of a issue.
Please note OS deadlocked process(type 3 problem) is not a Linux/BSD/Unix/Macos unique thing when you look into windows stuck processes you find that this happens on Windows as well as a rare event..
Thing is a wineserver deadlocked by any cause can equal the windows PE applications going into totally stupid code paths. Why this comes a major design issue is when Windows PE applications wine is running go out of control because the wineserver is deadlocked.
Minor design change adding a watchdog to detect when wineserver has stalled and to terminate all applications connected to that wineserver is the counter to 99+% of the reported cases of type 3 deadlocking.
Weasel any deadlock mitigation that does not have watchdog of some form is not designed to control deadlocks to the max possible. MS Windows does have a few watchdog against type 2 and 3 deadlocks but they are not perfect.
Wine aiming to be bug to bug compatible with windows it really should have some anti- type 2 and type 3 deadlock watchdog.
Weasel like it or not this is a design flaw. Yes a design flaw because you have failed to account for something in the design.
Getting deadlocked without way out due to OS quirk majority of cases this is a design flaw you forgot in your design to allow that OS kernel can have quirks that result in processes not keeping on getting CPU time. For servers stalled out this way without watchdog is particular bad.
Comment