Announcement

Collapse
No announcement yet.

Intel AVX-512 A Big Win For... JSON Parsing Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by uid313 View Post

    While it is true that JSON does not support comments there are many parsers who seem to have support for comments anyways, either by default or as an option. Example in .NET there is JsonSerializerOptions.ReadCommentHandling.
    Having written an actual JSON parser/AST in Scala I can say this this violates the JSON Spec which disallows any form of comments and the creator of JSON stanard (Douglas Crockford) was very deliberate about this because he wanted to prevent companies creating proprietary extensions in the form of comments which is what happened with Microsoft with HTML in their rendering engines (I guess its not surprising that the parser you linked is from Microsoft's .net).

    No JSON parser out there should be able to handle comments and if you need comments then JSON is the wrong format (typically this is with configuration files in which case you should use yaml/hconof/toml etc etc)

    Comment


    • #42
      Originally posted by coder View Post
      If you find JSON readable, maybe you're just not picking through datasets large enough.

      XML gives you context. You have a path and names with (hopefully) meaningful semantics. Wading through masses of JSON, trying to figure out why it's broken, can be the stuff of nightmares.
      XML is notoriously difficult to parse, is not that CPU friendly in terms of performance, huge amount of extensions which leads to enormous bloat and there are a lot of ambiguities in the format which can make it quite easy to shoot yourself in the foot later down the track. Great example of this is whether you make something a property or a nested element and if you make the wrong decision (which is quite easy) then you have to break your schema later down the track.

      While JSON definitely has its shortcomings, there are things like JSON schema https://json-schema.org/ which alleviates most of its issues and I definitely do not want to go back to XML days.
      Last edited by mdedetrich; 27 May 2022, 06:34 AM.

      Comment


      • #43
        Originally posted by carewolf View Post

        Comments is a standard extension for JSON. Just like 64bit integers are.. But the fact that we need extensions for JSON to make it useful puts it in with the same problems as XMLRPC. It is underdefined and kinda shit.
        Thats interesting, I have never looked at JSON extensions, because I take the premise that if its not built in, its not meant to be used... I like JSON, for basic back and forth communication in web dev. I think it is the primary reason for JSON existing... the double edge sword is that the "ecosystem" around JSON makes it really convenient and easy to use it outside what it was meant to do.

        Just my opinion for sure.

        Comment


        • #44
          Would be interested to see if the compiler could even make more use of AVX512. So maybe compile it once native and once with x86-64v3 target

          Comment


          • #45
            As far as I am concerned. This AVX-512 thing is a PR stunt. Using very large registers for JSON parsing is of very little help. The only purpose that I can think of is for quickly detecting escape sequence or the ending quote when parsing string values.

            AVX-512 can scan 64 bytes of data in a single operation but AFAIK, the typical JSON message has very short strings. ie less than 10 bytes per string. I guess that if you use a JSON input with extremely large strings specifically crafted for a benchmark, the AVX-512 implementation can be superior.

            I am not saying that the AVX-512 implementation is useless but it will be useful only in very specific situations... Probably not that useful in most typical JSON inputs...

            Comment


            • #46
              Originally posted by mdedetrich View Post

              Having written an actual JSON parser/AST in Scala I can say this this violates the JSON Spec which disallows any form of comments and the creator of JSON stanard (Douglas Crockford) was very deliberate about this because he wanted to prevent companies creating proprietary extensions in the form of comments which is what happened with Microsoft with HTML in their rendering engines (I guess its not surprising that the parser you linked is from Microsoft's .net).

              No JSON parser out there should be able to handle comments and if you need comments then JSON is the wrong format (typically this is with configuration files in which case you should use yaml/hconof/toml etc etc)
              Honestly, it sounds like Microsoft implemented some half-way between JSON and JSON5 (Tagline: JSON for Humans) where it's up to you to configure proper compliance for one or the other.

              Comment


              • #47
                Originally posted by schmidtbag View Post
                I have a hard time understanding that:

                In the first example, the JSON has awkward spacing and a whole bunch of } where it's difficult to see which one is associated with what at a glance. In the XML example, there are fewer lines, it's tidier, and you know exactly what thing is closing and when.

                In the 2nd example, the JSON is 11 lines, the XML is 7.

                In the 4th example, the JSON is shorter but you have to take additional steps to read external files, which does nothing to help you more easily read the data as a human and adds additional work as a developer to handle more files. The XML equivalent is a mess but at least everything you need is in one place and simpler to parse.

                Remember too: I was saying XML is easier for basic data. With more complex data, XML is a bit of a mess.
                I could insert awkward spacing into the XML example too. If you look at the examples that follow, they don't have that spacing, so it seems to me like that was just a formatting mistake on his part. In the second example, true, the XML is shorter, but now you have the repetitive "menuitem" clutter. The JSON could have been written like this instead:

                Code:
                {"menu": {
                    "id": "file",
                    "value": "File",
                    "popup": [
                        {"value": "New", "onclick": "CreateNewDoc()"},
                        {"value": "Open", "onclick": "OpenDoc()"},
                        {"value": "Close", "onclick": "CloseDoc()"}
                        ]
                    }
                }
                There is no value in including "menuitem" because it's obvious from the context (popup & the value names themselves) that this is a kind of menu you are dealing with. You might even be able to shorten this further, but there is no context from a surrounding program to know for sure. There are probably cases where the extra label information of XML is useful, but most of the time they come across as unnecessary clutter to me. Does every element really need its own name?

                Comment


                • #48
                  Originally posted by coder View Post
                  If you find JSON readable, maybe you're just not picking through datasets large enough.

                  XML gives you context. You have a path and names with (hopefully) meaningful semantics. Wading through masses of JSON, trying to figure out why it's broken, can be the stuff of nightmares.
                  Mind giving me an example? A lot of that just seems like unnecessary clutter to me. If you look here at the second XML example, you see <menuitem> repeated there. Is that really necessary? Can I not deduce that it's a menu item from the implicit context of "popup" and the fact that you see values like new, open, and close?

                  How are masses of XML better than masses of JSON? It's not like you can't make and validate JSON schema. It seems to me like there is some bias stemming from the problems you deal with daily, which are more likely to be JSON problems rather than XML ones.

                  Look at the fourth example on that page (the long one) and tell me that the XML is more readable with a straight face.

                  Comment


                  • #49
                    Originally posted by mdedetrich View Post

                    Having written an actual JSON parser/AST in Scala I can say this this violates the JSON Spec which disallows any form of comments and the creator of JSON stanard (Douglas Crockford) was very deliberate about this because he wanted to prevent companies creating proprietary extensions in the form of comments which is what happened with Microsoft with HTML in their rendering engines (I guess its not surprising that the parser you linked is from Microsoft's .net).

                    No JSON parser out there should be able to handle comments and if you need comments then JSON is the wrong format (typically this is with configuration files in which case you should use yaml/hconof/toml etc etc)
                    Agreed, and I think Crockford was right to do it. If your options are complex enough to require additional documentation, that's something that should be handled in a FAQ on a website somewhere, not in comments. (Transmission uses JSON for its configuration and that never bothered me)

                    Comment


                    • #50
                      Originally posted by krzyzowiec View Post

                      Agreed, and I think Crockford was right to do it. If your options are complex enough to require additional documentation, that's something that should be handled in a FAQ on a website somewhere, not in comments. (Transmission uses JSON for its configuration and that never bothered me)
                      I sort of agree, except for the "in a FAQ on a website somewhere". I strongly believe in self-documenting applications and one of the reasons I chose TOML for the projects I started recently so so the option to write out a default configuration file can write out a file which embeds the documentation for the fields provided.

                      I'm no Donald Knuth, but I do think that "write out a starter config file" options should have the same kind of intermingling of code and documentation as something like Doxygen or JavaDoc or rustdoc comments.
                      Last edited by ssokolow; 28 May 2022, 08:31 PM.

                      Comment

                      Working...
                      X