Announcement

Collapse
No announcement yet.

Linus Torvalds Injects Tabs To Thwart Kconfig Parsers Not Correctly Handling Them

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by ddriver View Post



    My configuration API is model driven and binary, it gives you introspection, type safety and real time verification to define the configuration. And yes, for legacy fetish dinosaurs, it does support parsing and generating text from the model as a fallback.

    But get with the times already, and I don't mean you, I mean "contemporary software". Relying on text for critical stuff - that's stone age technology at this point...
    I will quote from "The Art of Unix programming" Chapter 10 Section 3

    In Chapter 5 we described a somewhat different set of design rules for textual data-file formats, and discussed how to optimize for different weightings of interoperability, transparency and transaction economy. Run-control files are typically only read once at program startup and not written; economy is therefore usually not a major concern. Interoperability and transparency both push us toward textual formats designed to be read by human beings and modified with an ordinary text editor.​
    It isn't a law, as such, but binary files, however useful, are not easily readable by humans (perhaps I am generalising inappropriately from my inability to read them efficiently). A lot of people don't think this is important, but many do. Given that the processing and storage capacity of computers has increased somewhat over the years, the performance benefits of binary format configuration files seem to be ever-closer to minimal.

    Comment


    • #42
      Originally posted by Old Grouch View Post

      I will quote from "The Art of Unix programming" Chapter 10 Section 3



      It isn't a law, as such, but binary files, however useful, are not easily readable by humans (perhaps I am generalising inappropriately from my inability to read them efficiently). A lot of people don't think this is important, but many do. Given that the processing and storage capacity of computers has increased somewhat over the years, the performance benefits of binary format configuration files seem to be ever-closer to minimal.
      Pardon me, I was so wrong on that "stuck in the 80s". That would actually be "stuck in the 60s".

      Having the configuration defined as model allows to view and edit it in human readable form while in binary format. And you don't even need a text editor, just run the application in configuration mode. Plus the type safety and input validation and whatnot. And it actually shows you the configuration tree, so you don't have to know it by heart or look it up elsewhere.

      How is dumb old text better?

      Obviously, it is not just about performance. In fact the starting point was about critical safety, But still, the performance is appreciated.

      We have a serious legacy issue with software, but you being an old timer might have grown into it. Most frameworks still serialize to big endian, despite the fact almost all executed code today is in little endian format. This means all saved and loaded data is being converted, each and every time.

      My approach is "keen native endianness and only do conversion if necessary". There's absolutely no good reason for a little endian platform to flip bytes each and every time data is read or written, it is just crusty old notions still being dragged along in times they are no longer applicable.
      Last edited by ddriver; 17 April 2024, 07:51 AM.

      Comment


      • #43
        Originally posted by ddriver View Post

        Pardon me, I was so wrong on that "stuck in the 80s". That would actually be "stuck in the 60s".

        Having the configuration defined as model allows to view and edit it in human readable form while in binary format. And you don't even need a text editor, just run the application in configuration mode. Plus the type safety and input validation and whatnot. And it actually shows you the configuration tree, so you don't have to know it by heart or look it up elsewhere.

        How is dumb old text better?


        Obviously, it is not just about performance. In fact the starting point was about critical safety, But still, the performance is appreciated.

        We have a serious legacy issue with software, but you being an old timer might have grown into it. Most frameworks still serialize to big endian, despite the fact almost all executed code today is in little endian format. This means all saved and loaded data is being converted, each and every time.

        My approach is "keen native endianness and only do conversion if necessary". There's absolutely no good reason for a little endian platform to flip bytes each and every time data is read or written, it is just crusty old notions still being dragged along in times they are no longer applicable.
        Sometimes it can be useful to read a configuration file (or analyse a log) when the original application is not available, or not working properly.

        I take your point with "type safety and input validation and whatnot." - I will admit to having had my frustrations with subtly malformed config files, but conversely, a subtly corrupted binary config file risks being unreadable and unrecoverable. Swings and roundabouts. A good (text) config file parser in application startup will flag errors in a sensible way. This can be done with binary, but again, rectifying problems becomes an issue. I don't really want to resort to hex-editors.

        As for " And it actually shows you the configuration tree,​", that depends on the application not lying to you, which some do by design, and some by accident. I've had to deal with 'hidden' values in binary config files before.

        It is also beneficial that text-based files are closer to being self-documenting. It is obviously true that a text-based file can be opaque, and look like a random selection of letters, but giving meaningful names to parameters and using comments makes interpretation so much easier: much like debugging programs with debug symbols, or not.

        I like my crufty old config files, with liberal comments, but as you say, I'm probably an old stick-in-the-mud (growing up with stuff means you get used to foibles that are unacceptable to people encountering them for the first time). If the rest of your application is so good that the best thing you can do is optimize the config files, then I'm in awe.

        Comment


        • #44
          I've been using spaces, because the unfortunate fact is that spaces won.

          But deep down I'm a tabs person, as tabs are simply superior in most ways. This is because they carry more semantic value.

          Aligning stuff with spaces is like centring a line with spaces in a text editor.
          It works, but you lose the semantic value of "this should be centred"

          If something needs to be aligned, it should be aligned with tabs. Spaces are to separate words. Tabs are for alignment.

          Comment


          • #45
            Originally posted by Old Grouch View Post
            If the rest of your application is so good that the best thing you can do is optimize the config files, then I'm in awe.
            It is not a matter of some insane micro optimization, it is an intrinsic property of the software architecture. I didn't go out of my way to write a redundantly efficient configuration system, it just works "as is" on top of the object and memory models that I am using across all my software.

            One of the benefits is I will definitely not find myself in a situation I have to inject tabs into text to avoid critical failures. That's quite ridiculous to be honest. A dumb solution to a dumb problem that should have long been rectified.

            Today you can afford to parse text for no good reason, tomorrow you can afford to inject tabs while at it. Where does it end? You can't keep patching bad software design with more bloat or waste hardware resources on it - that only makes it worse over time.
            Last edited by ddriver; 17 April 2024, 12:11 PM.

            Comment


            • #46
              Originally posted by curfew View Post
              This time he's wrong. Tabs have no purpose in anything. It's an ancient gimmick aimed at saving meaningless amounts of storage space and computing resources by inventing a silly character that can substitute multiple somewhat redundant characters, while also including annoying side-effects, and sometimes even bugs as evident. Those days are gone, have been for 30 years already.
              Why I enjoy Assembly programming.

              Comment


              • #47
                Originally posted by curfew View Post
                That sounds like a highly specialized concept with very strict restrictions on usage then.
                The tab stop was originally designated for tabulation and indentation and it's exactly what it does with elastic tab stops. Early typewriter models had adjustable width tabs ( https://www.instructables.com/Making...th-Typewriter/ ​) and later models had tabulation modes that applied algorithm for automated text alignment and such: https://shorthandtypist.wordpress.co...er-tabulators/

                Elastic tabstop is simply the next logical step with respect to the differences between typewriters and text editors. But regardless, using tabs for indentation and having adjustable tabstop width in one's editor is simply using tabs for what they're meant for in their intended, albeit adjusted for computers rather than typewriters, use case.

                Comment


                • #48
                  Originally posted by ddriver View Post

                  It is not a matter of some insane micro optimization, it is an intrinsic property of the software architecture. I didn't go out of my way to write a redundantly efficient configuration system, it just works "as is" on top of the object and memory models that I am using across all my software.

                  One of the benefits is I will definitely not find myself in a situation I have to inject tabs into text to avoid critical failures. That's quite ridiculous to be honest. A dumb solution to a dumb problem that should have long been rectified.

                  Today you can afford to parse text for no good reason, tomorrow you can afford to inject tabs while at it. Where does it end? You can't keep patching bad software design with more bloat or waste hardware resources on it - that only makes it worse over time.
                  I'm afraid we shall have to agree to disagree. I respect your position, but I would regard binary configuration files as 'an intrinsic property of the software architecture' to be a warning signal that I've got something wrong. One of the reasons for making things human-readable (and writeable) is the improved ease of recovering from unexpected errors. I certainly agree that writing robust parsers is non-trivial.

                  Binary files tend to be non-portable, and difficult to analyse if you don't have a schema defining their structure, and unless you have taken extraordinary measures, are unlikely to be self-documenting.

                  I certainly don't expect to convince you to change your practice: but at least I hope that you acknowledge that there are other ways of doing things that other people might prefer, even if for reasons that you personally believe to be invalid or irrelevant.

                  May all your programs be bug free!

                  Comment


                  • #49
                    Originally posted by Old Grouch View Post

                    I'm afraid we shall have to agree to disagree. I respect your position, but I would regard binary configuration files as 'an intrinsic property of the software architecture' to be a warning signal that I've got something wrong. One of the reasons for making things human-readable (and writeable) is the improved ease of recovering from unexpected errors. I certainly agree that writing robust parsers is non-trivial.

                    Binary files tend to be non-portable, and difficult to analyse if you don't have a schema defining their structure, and unless you have taken extraordinary measures, are unlikely to be self-documenting.

                    I certainly don't expect to convince you to change your practice: but at least I hope that you acknowledge that there are other ways of doing things that other people might prefer, even if for reasons that you personally believe to be invalid or irrelevant.

                    May all your programs be bug free!

                    You continue to misunderstand. When you hear "binary configuration" you think of the clumsy problematic legacy way that would be handled, and its resulting downsides of incompatibility and poor manual maintenance. And I actually agree - nobody wants that!

                    But when I say "binary configuration" I mean a binary portable format, generated in and from a portable object model definition.

                    In legacy "coding", you type text, to parse and ultimately convert it to a static object model for further code generation. Then a static binary blob with nearly zero introspection is generated, where the cpu executes code in the blind, with the presumption there's no undefined behavior.

                    In my architecture, I use a dynamic object model which in turn can generate context specific code from definitions in the object model format. The code is generated from the definitions, so it is intrinsically compatible with them, and any change to the definitions either dynamically resolves or returns the error before the code ever runs into it. Since the full system model is always available, you can detect logical errors or breaking changes even between multiple applications or libraries via real time static analysis of the entire system state before you ever run into them during execution.

                    So it is quite advanced stuff actually, it can be expected to properly handle the binary format it defines, shocking as that might be!

                    However, it is still damn nice of you that you regret that you won't sway me from my erroneous ways. I would have so much liked to be convinced to go back to the mess that got me enough motivated to unlearn the tedious, time wasting and creativity restricting standard practices and come up with something a tad more useful.

                    Comment


                    • #50
                      Originally posted by ddriver View Post
                      But when I say "binary configuration" I mean a binary portable format, generated in and from a portable object model definition.

                      In my architecture, I use a dynamic object model which in turn can generate context specific code from definitions in the object model format. The code is generated from the definitions, so it is intrinsically compatible with them, and any change to the definitions either dynamically resolves or returns the error before the code ever runs into it. Since the full system model is always available, you can detect logical errors or breaking changes even between multiple applications or libraries via real time static analysis of the entire system state before you ever run into them during execution.

                      So it is quite advanced stuff actually, it can be expected to properly handle the binary format it defines, shocking as that might be!
                      Thank you for the extensive reply. Upvoted.

                      If I understand correctly, you need to be running your software, or at least something that can interpret the object model in order to correctly interpret the binary configuration.

                      Which means that sharing a config file with someone for comments or removing configuration mishaps isn't generally possible: and if your object model changes, the config could change as well, so being able to interpret the binary config correctly could well be software version dependant. It also make independent detection and possible correction of corruption rather difficult.

                      If you have come up with a better way, then perhaps it should become standard practice. Standard practices do need testing every now and then to see if they are still relevant or can be improved, and perhaps your way is the way forward. I will be surprised if it is generally adopted, but life is full of surprises.

                      Thank you again for your comments. It's always interesting to see another point of view, and every so often I learn something and change my mind - quite probably something I don't do enough of, but then again, I am an Old Grouch.

                      Comment

                      Working...
                      X