Announcement

Collapse
No announcement yet.

Linux Kernel Patches Posted For Bringing Up Tesla's Full Self-Driving SoC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by fl1pm View Post
    that makes it even more bizzare that Tesla is able to run its so called "Full Self-Driving" software on Linux
    I don't think it's weird, at all. They can do a lot of development, prototyping, and evaluation using Linux, and only deploy the finished deep learning models, software, and algorithms on the proper self-driving OS. There's a lot you need for testing of models and algorithms that you wouldn't want to build into the RTOS used in the final product. Debugging & visualization tools, and even media pipelines for streaming simulator or pre-recorded video into the video analysis code.

    What they gain by upstreaming their changes is to eliminate the burden of maintaining out-of-tree support for their hardware. Maybe, they could even provide dev kits with their SoC to robotics labs at leading universities.

    Comment


    • #12
      Originally posted by Linuxxx View Post
      Most people seem to think that completely autonomous "killer" drones need to be incredibly sophisticated machines to be useful in real-world combat scenarios.

      As a matter of fact I know how a certain class of autonomous "suicide bomber" drones already put to very effective, tide-turning use in a recent real war are built by a certain country:
      Terrorists and combatants in various conflicts are most certainly doing this, already.

      BTW, the concept of "suicide drones" arguably goes back as far as Germany's WW2-era V2 rockets. In more recent years, cruise missiles should fall in this category, if not also other types of guided missiles.

      Comment


      • #13
        Originally posted by coder View Post
        So, are those "neural processing units" of Jim Keller's design, or some derivative thereof?

        I'm honestly surprised by this, since I'd read someone claim that Linux isn't capable of being certified for self-driving. So, I wonder why they'd bother with Linux support, unless it's like just for the convenience of internal software development (e.g. testing the deep learning models and algorithms).
        Some parts of the self-driving, ie the highlevel stuff is:

        - not what you called hard realtime
        - not really feasible to predict what it should do and thus supervise

        ie. you get away without a RTOS, and you can't use safety measures except the most generic ones - like to run 2-3 separate hardware instances and cross-check.

        (And of course, they want to run their hardware in some sever-racks for AI-training and testing)
        discordian
        Senior Member
        Last edited by discordian; 14 January 2022, 05:55 AM.

        Comment


        • #14
          fl1pm
          Junior Member
          fl1pm Lockstep execution (the arrangement you are describing) is mandatory for high levels of ASIL, however it is nowere near enough, it is in fact just a starting point.

          Redundant execution does protect you from hardware failure (power supply sags, radiation and cosmic rays, etc...) however it does nothing against logic errors (which are actually the most dangerous) either due to bugs, or even more important wrong requirements (this is why ASIL is so heavy on documentation).

          In these type of systems you typically have multiple separate clusters, one might run linux and manage high level features, connectivity, etc, while the other, running an RTOS ( hard real time, not soft, as preemt RT can do) and manage the safety critical aspects of the design.

          The two then talk via very well defined protocols if they must


          Comment


          • #15
            Originally posted by discordian View Post
            Some parts of the self-driving, ie the highlevel stuff is:

            - not what you called hard realtime
            - not really feasible to predict what it should do and thus supervise

            ie. you get away without a RTOS, and you can't use safety measures except the most generic ones - like to run 2-3 separate hardware instances and cross-check.
            Wow. That's a lot.

            First, a deep learning network is basically a systolic processing graph, which typically can be evaluated in deterministic time. I presume there are some deep learning networks with feedback, that can take a non-deterministic amount of time to converge, but that doesn't seem like a practical architecture for realtime control applications.

            Second, it doesn't follow that algorithms "not supervisable", in terms of their answer therefore don't need a RTOS, if that's what you're saying. The RTOS exists to ensure that threads get their requisite amount of processing time & resources, so they maintain the necessary degree of responsiveness.

            As for what to do when your self-driving pilot produces some bad output or otherwise outright fails, there ought to be a lower-level obstacle avoidance system that tries to safely bring the vehicle to a stop. Even with redundancy, you still need that for an algorithm operating safety-critical machinery with an unconstrained set of inputs.

            Originally posted by discordian View Post
            (And of course, they want to run their hardware in some sever-racks for AI-training and testing)
            That's basically what I'm talking about. Maybe these chips aren't used directly for training, but at least for testing new algorithms and deep learning models.

            Comment


            • #16
              Originally posted by coder View Post
              Wow. That's a lot.

              First, a deep learning network is basically a systolic processing graph, which typically can be evaluated in deterministic time. I presume there are some deep learning networks with feedback, that can take a non-deterministic amount of time to converge, but that doesn't seem like a practical architecture for realtime control applications.

              Second, it doesn't follow that algorithms "not supervisable", in terms of their answer therefore don't need a RTOS, if that's what you're saying. The RTOS exists to ensure that threads get their requisite amount of processing time & resources, so they maintain the necessary degree of responsiveness.
              The RTOS is part outside of the analysis and decision making, what I mean with "not supervisable" (I am not english native, so I picked a bad word probably), is that there is no easy way to decide if the data coming out makes sense. Other than running another instance and see whether they are agree.
              Compare that to most math problems where you can easily guess if a solution is valid or highly likely to be valid, with machine learning you can make little assumptions what you get as answer, even if its deterministic.

              Yeah, a RTOS would need to check whether the systems responds and doesn't report some error it can self-diagnose. But the high-level stuff is pretty much a blackbox.

              Originally posted by coder View Post
              As for what to do when your self-driving pilot produces some bad output or otherwise outright fails, there ought to be a lower-level obstacle avoidance system that tries to safely bring the vehicle to a stop. Even with redundancy, you still need that for an algorithm operating safety-critical machinery with an unconstrained set of inputs.
              You wont be able to detect "bad output", in the sense that "bad decisions" are practically cooked in the AI. You can only test whether the system doesn't have some operational hickups like bad memory, overheating, instability.
              The safety critical part is then quite similar to classical cars, steering, brakes and so on are tested and redundant. No one is safe from a human driver making bad decisions, you could at most check some operational status like heart-rate, sleep, pulse.

              Originally posted by coder View Post
              That's basically what I'm talking about. Maybe these chips aren't used directly for training, but at least for testing new algorithms and deep learning models.
              I would expect that those are the same chips. Btw. you can today use RT below Linux (Xenomai) or isolate cores and HW for RT later (Jailhouse), in both cases Linux support would help.

              Comment


              • #17
                Originally posted by discordian View Post
                what I mean with "not supervisable" (I am not english native, so I picked a bad word probably), is that there is no easy way to decide if the data coming out makes sense. Other than running another instance and see whether they are agree.
                Sure you can. You can run a collision-detection test to see if the control inputs from the algorithm are predicted to hit anything, and engage emergency measures if so. The test must necessarily be lower-level and more stable than the algorithm, obviously. That should also make it cheaper to do.

                Originally posted by discordian View Post
                with machine learning you can make little assumptions what you get as answer,
                If the inputs are unknown or the objective is unclear, then you can't evaluate the quality of its solution. However, certain inputs are knowable, as are certain constraints and when they're about to be violated.

                Originally posted by discordian View Post
                in both cases Linux support would help.
                I think whatever OS is hosting VM with a safety-critical guest must also be certified for safety-critical applications, as a bug or failure in the host/hypervisor can invalidate the testing & assumptions made by the guest.

                Comment


                • #18
                  Originally posted by coder View Post
                  Sure you can. You can run a collision-detection test to see if the control inputs from the algorithm are predicted to hit anything, and engage emergency measures if so. The test must necessarily be lower-level and more stable than the algorithm, obviously. That should also make it cheaper to do.
                  if you have some alternative algorithm, this helps alot shielding you from systematic errors. However the safety concept is rather vague about some rather complicated algorithms. Most of the time certification requires you to prove you can detect "simple errors".
                  highest cert levels still require stuff like periodic ram Selftest, changing bits and testing the whole memory if other bits got flipped. Stuff that you kann do with 32k ram.

                  Or you have multiple - less reliable/safe systems and test if they agree. i know of several industrial safety relevant systems using Linux in it's components, even if some are only SIL1 or even just adhering to parts of it, the whole plant can be SIL2 because of redundancy.
                  Originally posted by coder View Post
                  If the inputs are unknown or the objective is unclear, then you can't evaluate the quality of its solution. However, certain inputs are knowable, as are certain constraints and when they're about to be violated.
                  Yeah, and to check the constraints you need similar hardware. aI is still new, doubt there's a conclusion yet how to tackle it in safety critical applications. Tesla tries to argue with having gobs of recorded input to thoroughly test the trained AI.

                  Originally posted by coder View Post
                  I think whatever OS is hosting VM with a safety-critical guest must also be certified for safety-critical applications, as a bug or failure in the host/hypervisor can invalidate the testing & assumptions made by the guest.
                  This block alone will never be safe, it's just a matter of generating a response in a given timeframe. Can still be part of a safety-critical system, you would just have to argue how unlikely an operational fault is - that's the case with everything down to the chance of getting flipped bits in a cat cable.
                  discordian
                  Senior Member
                  Last edited by discordian; 15 January 2022, 12:45 PM.

                  Comment

                  Working...
                  X