Originally posted by arQon
View Post
The reality if you can totally break the system by attempt to run too many threads you cannot get safety critical certification under IEC 61508 in many cases.
IEC 61508 turns out to be a real pain in the but for design a RTOS system. Yes stopping processes from starting can make a solution fail IEC 61508 same with going soft and letting system go soft.
Remember the killing processes part from IEC 61508 is that you go into a controlled fail not a uncontrolled fail. Same with letting system go soft for a short while to perform a controlled crash.
Yes the safety critical being in the Linux kernel version of Real-time does mean if are pushing too hard and have not set cgroup that hard limits applied you fall though to soft realtime when too many threads for the cpu to handle is pushed this is not error on the Linux kernel side. This is error of not understanding safety critical hard real-time kernel behavour that you have to declare fail if CPU is overloaded otherwise fail to soft is to be presumed. Yes the declared hard limits means that OS should stop you from start threads that could possible overload the CPU.
Safety critical Hard Realtime RTOS has some extra defined behavours over a Hard Realtime OS. Yes Hard Realtime RTOS where you can break the system completely because you tried scheduled too many threads is not a safety critical RTOS.
Redhat and other parties for cars and so on want Safety Critical RTOS solutions.
Linux Kernel real-time behavour does not make sense to some because they are not think that Linux kernel is being designed for safety critical usages long term or that Linux kernel as SIL 1 and SIL 2 certifications in safety critical done and is working on SIL3.
Fail though to soft real-time from hard real-time mode is something people don't understand safety critical requirements think is a error when in fact its a functional requirement. Also not knowing safety critical requirements means you think that trying to schedule more RT threads than you have cores to handle is a valid failure to completely crash the system as it up to the person developing the real-time solution to avoid this problem why its not valid under safety critical is because you have to try to mitigate developer errors as well as this is safety critical meaning uncontrolled errors equal someone dead so all errors if possible must be controlled failure . Controlled failure still might kill someone by it way less likely to kill someone then uncontrolled failure.
Originally posted by arQon
View Post
Please note PREEMPT_RT getting mainline is not end of story either then its getting more automated testing up to make sure new patches are not breaking the real-time support.
Realtime and Safety Critical happen to be different things but where they overlap create some horrible OS/solution designer headaches.
Comment