Announcement

Collapse
No announcement yet.

Linux Kernel Developers Debate Priority-Based Shutdown Support

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by billyswong View Post
    Okay so we get many different types of "poweroff" now.

    1. Normal shutdown
    2. Abnormal shutdown, with proper electrical power available to the memory chip
    3. Abnormal shutdown, without proper electrical power available to the memory chip
    4. Normal power cycling, without proper electrical power available to the memory chip

    The kernel code merge request is about case 3 & 4. The CNC machine error we are discussing is case 2 & 3.
    the CNC emergency shutdown can in fact be 2, 3 and 4 depending on what happened..

    You are have power for now but it finite because you are on UPS. You cannot be sure that the UPS itself is undamaged when a CNC machine has gone wrong so you cannot be sure if you have the full UPS capacity. So you have to be as power frugal as you possible can while performing system shutdown and recording the final data. This being power frugal as you can sometimes means you have to leave displays on even that it costs power so that human does not move closer to the machine that now going out of control. Yes a screen going black human first nature for some reason is move towards the screen that just went black with a failing CNC as depending on where the screen is placed exactly the wrong direction or exactly the right direction.

    The kernel merge request does overlap with hell hit the fan CNC abnormal shutdown. Yes that be major CNC failure that damaged the controller hardware as well as the UPS bits.

    Some cases a CNC emergency shutdown will be closer to 4 it depends on the Linux cnc control software design.

    So its not just the memory chip.

    CNC emergency shutdown is a particularly horrible case.
    1) Where you really do want to configure the shutdown order so you kill displays/interfaces in the right order to move human in the right way.
    2) You don't want to waste in shutdown power in case you don't have that much. backup power as you were projecting. Again you wish to be able to tune the order of shutdown.
    3) You want if possible the final operation actions the machine was performing before emergency shutdown started stored in media or transmitted out.(this does depend on how dangerous of operation CNC is doing).

    There is more to making a CNC emergency shutdown as safe as possible more than most would presume and it not as simple as just cut all power.

    Being able to configure in like a device tree file what order the system will be taken off line would kind of suite the CNC case well. I would suspect there are other cases were controlling hardware shutdown order would be useful.

    Comment


    • #52
      oiaohm Nothing I read in the mailing list suggested a desire to save state. Rather, they have X00,000 eMMC chips deployed in the field that have a small probability of bricking themselves on power loss, if not prepared in just the right way (which is something more than sync & unmount of all filesystems). Perhaps this failure mode was not discovered until N=X00,000 parts were exposed to multiple power cycles a day.

      Maybe the hardware shouldn't have this problem. Perhaps it could be fixed if whatever computer this goes into was powered by a separate battery independent of the motor start battery. Maybe it would've been fine if some power sequencing requirement had been more clearly documented in the eMMC datasheet. But, "shoulda, coulda, woulda", and replacing some PCB in N=X00,000 vehicles is massively more costly than distributing a software update. And even if this particular problem is known now, there will always be more hardware bugs. Even NASA deploys software workarounds! "Fix it in hardware," will never happen.

      If anyone is tempted to reply with something like, "the cost doesn't matter. It's just a megacorp being cheap," think for a moment on why cars cost many thousands of dollars. Hint: the answer is not and does not contain either the word "capitalism", or "greed".

      Comment


      • #53
        Originally posted by Volker Schmidt View Post
        And that is, where I come together with Greg from another perspective. Use the right tool for the right problem, and sorry Linux is not the right tool for such systems. I get it, that you e.g. the automotive area, they love to use it for infotainment, where it is a good fit, but coupling this with critical functions in that environment is asking for trouble in the long run.
        You can't use it "just" for infotainment. Even if the machine is fully segregated from anything safety critical, that just means nobody dies when it fails. It doesn't mean the owner of the car is happy or your brand reputation is intact. My mother owns a Mazda, and it has seen three power window motor failures. Two on the same window. Think I'm ever going to buy a Mazda?

        Unreliability is only acceptable when there is a ready supply of spares and failover is cheap. (Hard drives in a large array, for example.)

        Comment


        • #54
          Originally posted by yump View Post
          oiaohm Nothing I read in the mailing list suggested a desire to save state. Rather, they have X00,000 eMMC chips deployed in the field that have a small probability of bricking themselves on power loss,
          Here is a solution so you can turn your Raspberry Pi handheld computer on and off from the power plug (power cycle it) just like any other appliance in your home! And it can all be done through only one setting. With a normally set-up Raspberry Pi, power cycling can cause your Micro-SD card to become corrupted. Thus, Safely Shutting via the operating system or a safe shutdown button is crucial. Lets be clear, when you Safely Shut down the Raspberry Pi it will first check that it has completely stopped writing new information to the Micro-SD card, and once the Pi is sure, it will then stop supplying power to it’s circuit board. There are situations however where it is inconvenient to turn the device off safely every time (be it due to location or time constraints). So, knowing that Micro-SD card corruption only occur when you turn off the Raspberry Pi in the instant that it is writing new information to the Micro-SD card, lets completely stop the ability for the Raspberry Pi to write new files to the Micro-SD card. Then corruption will never occur and you can be free to power cycle it whenever you desire! In this guide, we are making our Raspberry Pi Palm-Sized Computer Read-Only. Note that this is also a reversible process. Below are the contents of this guide. - What You Need- Overview of Process- Demonstration- How to Revert Process- Situations Where This Is Perfect This set-up is perfect for kiosks, multi-use work terminals, educators managing a classroom worth of Raspberry Pi boards, and completely finished projects that you want the ease of power cycling. With no concern of corruption, your Micro-SD card will be able to run its natural life, which should be 10+ years.  There are going to be some concessions. Once you have turned your Raspberry Pi into Read-Only no changes (be it creating new code, changing files, deleting directories) will be remembered next time you turn on the Raspberry Pi). For instance, if you create a directory on the desktop of a Read-Only Raspberry Pi when it restarts that folder will have disappeared. This is because we have completely disabled the write function of the Raspberry Pi. All these created files on a Read-Only Raspberry Pi are going to be temporarily stored in the RAM. This means if you create or try to download files larger than the RAM available the system OS won't allow it (or be able to do it). Data can be pulled from a Read-Only Raspberry Pi via USB Drive or in any other normal manner. Crontab (a method of getting the software to run on boot) and time synchronisation (pulling the time information from the internet) will also work perfectly fine with a Read-Only Raspberry Pi. As always if you have any questions, queries, or things to add please let us know your thoughts! What You Need Below is everything you need to set up your Raspberry Pi to be Write-Only. No soldering or extra hardware to make a safe power cycling Raspberry Pi. This process can also be done headless meaning you wouldn't need to use peripheries, check this guide on how. -      Raspberry Pi Palm-Sized Computer (this process will work with all varieties)-      Micro SD Card flashed with most recent Raspberry Pi OS (a quick how-to linked here)-      Power Supply -      Monitor-      HDMI Cord -      Mouse and Keyboard Overview of Process So, with your Raspberry Pi set up with all your settings and data exactly how you want to let's effectively take a permanent snapshot of this moment (which the system will revert back to whenever it is rebooted) by turning the Raspberry Pi to Read-Only. The steps are as follows (but first for real, back up your Micro-SD card, super really, back up that card). With Raspberry Pi OS displaying like normal, open up a new terminal using the black button at the top left of the screen. The method demonstrated from here on out will be the exact same process if you are accessing the Raspberry Pi Headlessly or directly. In this new terminal type and enter the following below line. This will open up the configuration menu of the Raspberry Pi. See further below for an image of this line written into the terminal. This line starts with | sudo | which means it will run the following with admin privileges. sudo raspi-config   The menu will look like the image below, you can navigate this space using your keyboard and select options by pressing enter. Old school graphics but super incredible and useful. Navigate down to Performance Options and press enter. After pressing enter on that option it will look like the image below, presenting a couple of other options. In this page you are going to navigate down to the | Overlay File System |. This is where you are going to enable or disable the Read-Only file system. Press enter on this setting.  Having done this you will see two things happen after each other. First it will bounce back to the Black Terminal for less than 30 seconds to update the normal root files. As it is doing that the terminal will display exactly like the image below.    Then it will jump back to the configuration menu and display the below screen and present the message to the user | The overlay file system is enabled |. This step can be seen in the image below.  Then after pressing enter on that screen it will show a new message which can be seen in the below image. Make sure to say Yes here as this will make the boot partition (the Micro-SD Card) be write-protected. If the disk is write-protected, then it is Read-Only and that is exactly what we want to have happen.  Thus having pressed enter on Yes it having done this it displayed the below page. Then on rebooting the system my Raspberry Pi 4 Model B is completely in Read-Only Mode. Thus no new permanent information can be written to the Micro-SD card ever. Demonstration I’ve created a new directory on the desktop (called in this example ‘New Annoying Folder’). See below in the image for this happening. To double clarify, I am creating this new folder on the desktop after I have rebooted the Raspberry Pi OS. So right now the Micro-SD card is Read-only. Just because it is Read-only doesn't mean you can't alter and create new files, they just will not be saved upon reboot.  Now, this has happened let's reboot the system. Knowing that the setup is Read-Only we would expect after a reboot that file would have disappeared. As we can see in the image below this is exactly what has happened! How to Revert the Process You can reverse this process nicely and easily but it will require two reboots.  Simply, you can go through the same process to revert your Raspberry Pi from Read-only back to the default Read and Write. The setting | Overlay File System | acts as a toggle so once navigating back to this setting using the Raspberry Pi Configuration Menu like before you can simply toggle Write On and Off. Remember though it will take a reboot to disable the overlay file system and then another reboot for the boot partition to become enabled. Each reboot will give the setting a chance to stick (so don't do any important work until you do make your system Read and Write lest your data is lost). See the images below for each step of reverting the process, each option highlighted is the one to do. This process has been compressed into a single image as it is just very similar to before (right click and open image in a new window if the writing is too small to see). Part of the reason why the naming is a little confusing and two reboots are required (I believe) is it is done on purpose to prevent young nefarious agents reverting it back to normal Read and Write and then causing mischief. Worth noting, another method to turn the Raspberry Pi OS to read and write is simply done by re-flashing the Raspberry Pi 4 Model B Micro-SD Card.  Situation Where This Is Perfect Kiosk Applications and Digital Signage are often not properly shut down but instead simply unplugged at the end of the day. The same goes for Video Looping Machines. Repeating this increases the risk of file system corruption. This is also very valuable for educators. Say you have pre-loaded files and settings on a whole bunch of Raspberry Pis. That way when your students come in and do the course work they do not need to jump through the preliminary process of doing those settings and installing the files for the course. Thus, when the students arrive they can get right into the content that the educator wants to teach. But the fear of every straight-edged educator (and rightfully so) is those curious students will start downloading all kinds of things, mess around with the settings, download games, remove the course work, and cause all kinds of mischief. Then, after the lesson, damage control would need to occur. You would then need to go through the process of wiping the micro-SD card, fixing the settings, and re-installing the desired coursework ready for the next class. For each of the Raspberry Pi Boards. And that can become a series investment of time for 20+ Micro-SD Cards (we know this and even created a Mass SD Card Image Writer to make this task faster). What if instead each time you power cycled your Raspberry Pi the data reverted back to that point at the start of the class. All that mischief and setting changes will be forgotten and you’ll be back at your original custom set-up. And that is exactly what a Read-Only Raspberry Pi can do for you. Another scenario where you would want this is when you have fully completed a project and you want to turn it on and off from the wall switch or via an external timer. By converting the Raspberry Pi into Read-Only, it will never corrupt and your system will work perfectly (or at least until the natural lifespan of an SD-card which is around 10 years). Also, say you are operating in an environment with intermittent power and the power shuts down on you unawares (and you haven’t got a brilliant UPS hat like the one from PiJuice) then those lingering half-written files can render the SD card completely unbootable. There are apparently methods to perhaps patch up the corruption but sometimes there is no recourse but to whip out the card reinstall everything. And to those who say My Project doesn't even write any data. It may be true that your application or program does not write any data but your operating system (Raspberry Pi OS) or GUI (such as Chromium or Firefox browser) certainly is. Constantly during operation, they will be writing temporary files, log files, cache files, etc. There is a lot going on under the hood and this is what eventually causes your SD card to corrupt in the mid-to-long run of not safely shutting down. So why not prevent it all together.

          emmc losing data due to losing power at correct time is exact to specification. Yes people learn this with microsd cards and raspberry pi all the time.

          Small probability is not that small. if you happen to be writing to the eMMC while the eMMC loses power you basically have a 100 percent chance of corruption due to how the wear leveling works and its in the specifications.

          Yes CNC machines we want the data we are logging to go on to storage while that storage has power to store the data correctly.

          Also the other thing is eMMC has limited number of write cycles. Yes there is a common problem with particular cars when they get to like 5 to 6 years old that the eMMC in the engine management unit gives up the ghost so now the person is up for 4 to 5 thousand dollar replacement bill.

          eMMC/NVME or other storage device with backup capacitors cost more than the device without backup capacitors.

          Here is the extra kicker. You know how M.2 NVME drivers have backup capacitors so that a instant power out does not leave them in a horrible state these capacitors could have given up the ghost in 4 to 5 years. Car can be around for over 20 years on the road. So the company spend more eMMC with capacitors and they still end up in time back in the same place that they need to save early because the power support hardware has failed integrated with the eMMC.

          Sod law. Every power protection circuit you can design that old school spinning rust or new solid state drives get put into a good state can fail to function correctly.

          Lot of ways your power support hardware you want as it own replaceable part. Also you want to be able to shutdown in the right order to give the UPS hardware more room to fail. So that data devices that you are recording stuff to can complete their recording process before power goes away.

          Yes this right order does not always mean that data storage device first in the shutdown que because you might be wanting to shut down some massively power hungry part first to make the power protection/UPS finiti supply last longer.

          yump here something to be aware of there are milling machines over 200 years old still in using that have been modernized to CNC machines. When control parts could be in place for 50 to 60 years circuit old age issues is a real factor. CNC hardware makes car life times look like minor problems. How long these machines are going to remain in active use is a factor on what design solutions you can use.

          Yes the just fix it in hardware with CNC normally does not work because CNC machine will in a lot o cases be in deployment to the point super capacitors and batteries start failing.

          Comment


          • #55
            Well, the capacitor limitation is partially why many consumer computer components have switched to "solid state" capacitors. No, they don't have infinite lifetime. But they can almost always live longer than the NAND flash write cycles. So we don't have to face "backup capacitor" failure before we replace the out-of-life SSD. And the capacitor get replaced together in a package. It is the raw eMMC / UFS / SD cards that is the problem. Their design / form factor can't carry their own capacitors and throw away together with them.

            But it is not the fault of those cards! They were designed for battery-powered consumer handheld devices, not industrial CNC machines that need fault-tolerance. We are facing wrong tools for the wrong jobs.

            Comment


            • #56
              Originally posted by billyswong View Post
              It is the raw eMMC / UFS / SD cards that is the problem. Their design / form factor can't carry their own capacitors and throw away together with them.
              There is something you missed.

              Originally posted by billyswong View Post
              But it is not the fault of those cards! They were designed for battery-powered consumer handheld devices, not industrial CNC machines that need fault-tolerance. We are facing wrong tools for the wrong jobs.
              There is something more a CNC machine need other than fault-tolerance. Removal-able storage so that logs can be checked before power is applied after a failure. Yes having the logs of the last actions the machine was performing lets you double check that you have inspected everywhere that needs to be. It could be that fragment gone in a odd direct due to the action the machine was doing and is now in the UPS and when you apply mains again if you have not spotted this you now have a explosion and a lot of other things go wrong.

              A CNC machine as gone wrong you want to check the logs of what was happening with the machine before you power it back on.

              Originally posted by billyswong View Post
              But they can almost always live longer than the NAND flash write cycles.
              I like the almost always. Solid state capacitors have less thermal tolerance than NAND flash. This kind of explains why SD cards don't have capacitors. You big SD cards do have enough room to have capacitors and the original patent design of a SD card include the capacitors its durability testing and cost why they are not there. You leave SD card sitting on the dash of your car and it had capacitors no matter the type of capacitors it would be the capacitors being the first bit to fail.

              This leads to a interesting problem right. To have the strongest removable storage device you give up on capacitors/UPS circuits in them.

              Yes they did not include a power loss pin in NAND flash controller design for those not directly connected to capacitors. Yes there is a design fault here but it really one we have to live with.

              As I said you really need the power backup circuit/UPS and the storage independent to each other to make the strongest device. This is about making the strongest device. Yes a communication wire between the two halves would have been great. As in if this wire no longer has voltage system power system is no longer good connected to all storage devices would be good but we don't have that. Yes this wire could have like SD cards alter writing

              The reality here we have to work with the broken hardware designs we have. Yes I know it like attempting to make a silk purse out of a pigs ear.

              Before you say SD cards totally wrong again. I will tell you want you find in old CNC machines for removal log storage that is totally wrong. Old CNC machines removalable log storage is a floppy disc normally a 31/4 but you can find 5 1/2s. That right running a floppy drive in a fine metal dust environment resulting in either the heads of the drive sanded out of existance or the surface of the floppy disc sanding out of existing so you have no logs so have to state the machine up with basically praying that the inspection instructions had you inspecting the right areas because you have no logs what the machine was doing.

              There is a requirement that the removable CNC logs must be readable by general consumer hardware by many countries OHS laws. Leading to very stupid things.

              Comment


              • #57
                From data I can find, consumer grade NAND flash operates up to 70 degree C only while industrial grade NAND flash operates up to 85 degree C only. Meanwhile, solid state capacitors are said to shorten lifetime to the same of electrolytic capacitor when temperature reaches 105 degree C.

                oiaohm I need stronger proof of your claim of solid state capacitor being weaker than NAND flash in high temperature endurance. If one only cares about "can the NAND flash still be used after being toasted", then maybe yes, the capacitor may become broken after extremely high temperature while the NAND flash still being "usable". But the data within the flash may have evaporated long before the capacitor is boiled to death.

                Comment


                • #58
                  Originally posted by billyswong View Post
                  From data I can find, consumer grade NAND flash operates up to 70 degree C only while industrial grade NAND flash operates up to 85 degree C only. Meanwhile, solid state capacitors are said to shorten lifetime to the same of electrolytic capacitor when temperature reaches 105 degree C.
                  I should have been more exact. "thermal tolerance". Yes a solid state capacitor has higher tempeture. than nand. But what I am talking about with thermal tollerance is thermal shock cracks issues. You don't need to get that hot. Its how well the device tolerates thermal change.

                  Nasa in 2007 found that solid capacitors could have functional thermal cycle less than 1000 cycles of heating up and cooling down if it happen rapidly. Rapid change between 60 to 25c is enough todo cracking. Hot car into air-conditioned office. Those making SD found the same thing.

                  Its one of those fun things. Solid state capactiors can operate as rate at higher tempeture than NAND but cannot take the same speed of thermal change.

                  Removable storage one of the problems you have is rapid thermal change.

                  I was referring to the thermal tolerance case inside temperatures a human can handle. Normally you don't let removable storage get to 44C. 60C is in abnormal range.

                  Another problem here is how fast a human can basically heatsink 44C to human body temperature. There is level of tolerance you need to thermal shock if something is going to be a small removable storage device.

                  This is where this gets so problem so fast. SD don't have capactors due to thermal shock problem. Yes this is not heating capacitor to a point that the heat kills it. This is having the speed of change to stress in the part result in cracking that then leads to failure.

                  This solid state capacitor cracking is what makes we want to have software that can mange SD and other small storage devices humans can handle in case of power failure correctly.

                  Yes is a funny one that the solid state capacitor on your motherboard/m.2 is most fine because you normally don't touch it so it does not get to rapid cool because you are not connecting a human heaksink. But if you put a solid state capacitor in a SD card every time human pulls it out it will rapid cool and you have less cycles of this than write cycles.

                  Yes this is true devil in the details.

                  Comment


                  • #59
                    Interesting information. But we don't need to put machine log into SD cards. If the requirement is only "readable by general consumer hardware", any SATA SSD will do. Hot-swappable drive bay has existed since before we have SSD. As long as we don't pull it in/out in routine use and only inspect it off the machine for special case, there will be no human temperature factor to that SATA drive. So, as said before, we are probably "facing wrong tools for the wrong jobs."
                    Last edited by billyswong; 01 December 2023, 05:04 AM.

                    Comment


                    • #60
                      Originally posted by billyswong View Post
                      Interesting information. But we don't need to put machine log into SD cards. If the requirement is only "readable by general consumer hardware", any SATA SSD will do. Hot-swappable drive bay has existed since before we have SSD. As long as we don't pull it in/out in routine use and only inspect it off the machine for special case, there will be no human temperature factor to that SATA drive. So, as said before, we are probably "facing wrong tools for the wrong jobs."
                      Sdcards appear in 1999. Sata appears in the year 2000. Your early sata drives do not tolerate hotswap and not all sata controller tollerage hotswap. Yes this is fun you hotswap a sata drive on sata controller that does not support hotswap and you can have bricked the sata controller for good. Sdcard standard from day one says as long as you do the safe remove procedure hotswap is functional. You are not going to brick a sdcard controller by hotswaping. You might brick a sdcard that was not in state to be removed by the sdcard design.

                      There is another human factor that comes into play with first Sata drivers. Think CNC floor with 100+ machines and you are going around collecting and replacing the log storage. 100 floppies 1 human can carry no problems. 100 old spinning rust sata harddrives you are going to need a cart and the drives are not going like it. 100 sdcards this does not take up much space at all.

                      Please note I agree that is is kind of wrong tools for wrong job. But we have a problem that in a lot of ways the sdcards is the best tool we have that user will not do something to bad to the device and the user/manager can trust they will not do something to bad to the device. Breaking the consumable sdcard you can live with minor downtime as you replace the card. Breaking the PCIe or sata controller in machine so requiring a new highly expensive custom controller board that will take 6 to 8 months to source you cannot live with just because you attempted to hotswap on a system that did not support hotswap sata or nvme. Human error will strike here you have one machine with the controller that cannot tolerate hotswap out of like 20-30 you will bet at some point some human will incorrectly swap it if you are swaping those devices.

                      I personally would like to see extra wire added to sdcards So that could put a capacitor power support on the sdcard power rails and have a pin fall low if system power had been disrupted so triggering the sd controller to save it state and journalling the current qued operations to storage so that sdcard can come up in a non broken state..

                      Yes what I am talking about is being able to place power support capacitors on the socket side of the sdcard that we currently cannot do. Capacitors placed in that location are not exposed to the human problem. This would allow keeping the storage device compact and human handling tolerant. Problem is we don't have sdcard designed this way.

                      As you said sdcard design presumed you are on battery or something equal where power is not going to pull the rapid power disappearing act. A few minor alterations and SDcard socket and card design and it could be made tolerate to unexpected power-loss without losing the sdcard key features of compact size and tolerate to human handling.

                      Wrong tool wrong job is horrible common. There is also the problem you hit with CNC machines using sdcards where back in the day there was no ideal part and the closest back in the day was chosen and now that not perfect has come the standard that we have to live with. Horrible part is it not uncommon for milling/cnc machines to stay in operation for more than 50 years. So those choices in 1999 are going to be around until 2049 at minimum.

                      Comment

                      Working...
                      X