Announcement

Collapse
No announcement yet.

Intel AVX-512 A Big Win For... JSON Parsing Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by Raka555 View Post
    The biggest problem with XML is how much CPU it burns.
    Depending on what features you want in a parser, you might be interested in RapidXML, which offers "parsing speed approaching that of strlen function executed on the same data."

    Comment


    • #52
      Originally posted by reba View Post
      We parse a lot of JSON on $dayjob from external sources, which sometimes deliver broken and/or invalid JSON structure and/or values.

      For this we use JSON Schemas (examples).
      By the time you start writing schemas, you should probably be asking yourself if JSON is really the right format for what you're doing.

      BTW, XML namespaces provide the ability to support XML embedding which is used to good effect in XSLT. You can also use them to combine different schemas. I guess we can look forward to namespaces, in the next version of JSON Schema.

      Originally posted by reba View Post
      These serve:
      - as an interface specification (what fields are mandatory, which optional, what is their type and value range, what is the description of this field) for our partners
      - and also directly in-code as a schema to validate the incoming JSON data with or correct structure/fields/values
      - specify the API for our outgoing side to partners know exactly what to expect

      td;dr: A JSON document itself is just whatever it likes to be and cannot be trusted at all. If you use it to pass data around, you need to verify and validate it. JSON Schemas (+ implementing libs) do this and at the same time they serve as a great API documentation.
      That's a good argument for schemas, but it's not at all specific to JSON Schema.

      Comment


      • #53
        Originally posted by reba View Post
        While XML at first seems great and has nice additions (very expressive, XPATH, WS-Security), it's pentesters paradise and too powerful for its own good (entity bombs, document inclusion,
        As I mentioned, entity bombs are addressed in most modern parsers. And DTD external subsets were typically disabled by parsers, even way back in the early days.

        Originally posted by reba View Post
        too much logic for a data transport format).
        If you disable DTDs, then it has not appreciably more logic than a JSON parser. That's why RapidXML can be nearly as fast as strlen().

        Comment


        • #54
          Originally posted by mdedetrich View Post
          XML is notoriously difficult to parse,
          That's rubbish. Since the early days, parsers like expat have shown just how simple a fast parser can be.

          Originally posted by mdedetrich View Post
          huge amount of extensions
          Such as? XML 1.1 added namespaces. You could count schema and maybe XInclude, if it's relevant for what you're doing. A parser can be ignorant of all three, for simple applications.

          Originally posted by mdedetrich View Post
          which leads to enormous bloat
          The core standard really is pretty simple, especially if you don't use DTDs.

          Originally posted by mdedetrich View Post
          there are a lot of ambiguities in the format
          No, there aren't any ambiguities in the format.

          Originally posted by mdedetrich View Post
          which can make it quite easy to shoot yourself in the foot later down the track. Great example of this is whether you make something a property or a nested element and if you make the wrong decision (which is quite easy) then you have to break your schema later down the track.
          Okay, so the only case of this is the equivalence of attributes and simple elements. That's not "a lot", that's one case where you can do the same thing one way or another.

          But, guess what? JSON gives you a choice of whether to make a container an object or an array. Someone could certainly "shoot themselves in the foot" by using an array, where they should've used an object. Or they could use a string, where they really should've used an array or an object.

          Any data modeling exercise presents one with pitfalls and opportunities to "shoot themselves in the foot", by later having to make non- backward-compatible revisions.

          Originally posted by mdedetrich View Post
          I definitely do not want to go back to XML days.
          I think you mean that you don't want to go back to the days of needing an excuse not to learn about XML.

          XML scales well, which is a struggle for JSON. It did suffer from a bit of hype and it's not the best solution to all problems, but it really has no peer.

          You don't have to like it and I don't care if you don't use it, but please don't spread FUD about it.
          Last edited by coder; 28 May 2022, 10:59 PM.

          Comment


          • #55
            Originally posted by krzyzowiec View Post
            Look at the fourth example on that page (the long one) and tell me that the XML is more readable with a straight face.
            The XML schema designer clearly made a decision they valued extensibility more than conciseness. There's no reason you need an init-param element with both a param-name and param-value element. One could nest a set of param elements with name and value attributes inside an init-params container, with little more verbosity (and no less extensibility) than the supposedly-equivalent JSON element.

            <init-params>
            <param name="configGlossary:installationAt" value="Philadelphia, PA"/>
            ...

            That example is also somewhat dishonest in that they used favorable JSON formatting and sloppy XML formatting. One could use a dense or slopping JSON formatting and a more clear XML formatting to tip the balance more in favor of XML.

            Comment


            • #56
              Originally posted by coder View Post
              That's rubbish. Since the early days, parsers like expat have shown just how simple a fast parser can be.
              Sure by old standards

              Originally posted by coder View Post
              Such as? XML 1.1 added namespaces. You could count schema and maybe XInclude, if it's relevant for what you're doing. A parser can be ignorant of all three, for simple applications.
              I am talking about the proprietary stuff.


              Originally posted by coder View Post
              Okay, so the only case of this is the equivalence of attributes and simple elements. That's not "a lot", that's one case where you can do the same thing one way or another.

              But, guess what? JSON gives you a choice of whether to make a container an object or an array. Someone could certainly "shoot themselves in the foot" by using an array, where they should've used an object. Or they could use a string, where they really should've used an array or an object.
              So the issue is that the difference between an array and an object is a lot more significant than the example that I gave. Thats a terrible example you provided because its really not that ambiguous


              Originally posted by coder View Post
              Any data modeling exercise presents one with pitfalls and opportunities to "shoot themselves in the foot", by later having to make non- backward-compatible revisions.
              Yes and JSON has a lot less, thats the point and thats why it largely ended up winning in the modern day.

              Originally posted by coder View Post
              I think you mean that you don't want to go back to the days of needing an excuse not to learn about XML.

              XML scales well, which is a struggle for JSON. It did suffer from a bit of hype and it's not the best solution to all problems, but it really has no peer.

              You don't have to like it and I don't care if you don't use it, but please don't spread FUD about it.
              I am not spreading FUD about it, I have been a backend/web developer for over a decade and I have dealt with both extensively (i.e. written JSON parser/AST in Scala as well as a wsdl code generate for XML soap). There is a reason why everyone is moving away from XML if possible, its becoming a dead format. Today have much better options depending on what you want to do, i.e. json + OpenAPI, yaml, protobuff, toml, jsonb etc etc.

              Comment


              • #57
                Originally posted by mdedetrich View Post
                I am talking about the proprietary stuff.
                Huh? Apparently, you need to spell it out, because I have no idea what you mean. Nor does anyone else, presumably.

                What I'm talking about is what standard and what people actually use.

                Originally posted by mdedetrich View Post
                So the issue is that the difference between an array and an object is a lot more significant than the example that I gave. Thats a terrible example you provided because its really not that ambiguous
                I'm not sure you're using the word "ambiguous", correctly. Providing a user with multiple options doesn't make a language ambiguous.

                I think it's an apt analogy. One could choose to use an array, in order to model a data subset as a tuple, or they could use an object to affix names to the members.

                JSON gives you that flexibility. It didn't need to. Objects are truly redundant. One could use an array of string-value tuples (i.e. 2-element arrays), to achieve the same effect.

                Originally posted by mdedetrich View Post
                Yes and JSON has a lot less,
                All you've cited is the option to use elements vs. attributes to hold simple strings. I'll even hand you another: XML leaves it up to the application, whether whitespace within an element is significant.

                For its part, JSON has two types of containers: arrays and objects. Further, it gives one the option to use numbers, bool, string, or null. It might not always be obvious when null should be used vs. an empty string, for instance. And one could make the mistake of inserting a number as a string, for another.

                Originally posted by mdedetrich View Post
                thats the point and thats why it largely ended up winning in the modern day.
                That's motivated reasoning.

                Originally posted by mdedetrich View Post
                I am not spreading FUD about it,
                Indeed, you are. You're posing an opinion as fact, and yet you're incredibly vague and misleading about the specifics supporting that opinion. That absolutely qualifies as FUD.

                Originally posted by mdedetrich View Post
                I have been a backend/web developer for over a decade
                Qualifications mean zero, to me. If you have to resort to qualifications, it means you've lost on merit. That we cannot verify your qualifications is that much further reason to disregard them.

                What I care about is the amount and quality of the information supporting your position. And that's what's lacking.

                Originally posted by mdedetrich View Post
                There is a reason why everyone is moving away from XML if possible,
                Often, the best solutions on merit don't win the day. People like having a small learning curve and minimal barriers to adoption. Just as XML piggybacked on HTML/SGML, JSON piggybacked on Javascript. The then-ubiquity of Javascript is a lot of the reason, right there.

                Originally posted by mdedetrich View Post
                Today have much better options depending on what you want to do, i.e. json + OpenAPI, yaml, protobuff, toml, jsonb etc etc.
                I doubt any of those has the same power and flexibility of XML, much less the same supporting ecosystem and standards.

                The existence of one good solution never stops people trying to invent other solutions, better-suited to their background or immediate priorities. There are reasons "reinventing the wheel" is such a common analogy. XML wasn't the first to attempt what it did, and it'd be presumptuous to think it'd be the last.

                Comment


                • #58
                  Originally posted by coder View Post
                  Huh? Apparently, you need to spell it out, because I have no idea what you mean. Nor does anyone else, presumably.

                  What I'm talking about is what standard and what people actually use.
                  Yeah and thats the point, most people have stopped using XML and the few companies/projects willing to use XML or doing so for historical/legacy reasons

                  Originally posted by coder View Post
                  I'm not sure you're using the word "ambiguous", correctly. Providing a user with multiple options doesn't make a language ambiguous.

                  I think it's an apt analogy. One could choose to use an array, in order to model a data subset as a tuple, or they could use an object to affix names to the members.

                  JSON gives you that flexibility. It didn't need to. Objects are truly redundant. One could use an array of string-value tuples (i.e. 2-element arrays), to achieve the same effect.
                  The analogy is not apt because you are not quantifying the difference. With JSON object vs JSON array the difference is very clear, if you need ordering you put it in a JSON array otherwise you don't.

                  With nested elements vs properties (as an example) its a lot less clear how to model things.

                  Originally posted by coder View Post
                  All you've cited is the option to use elements vs. attributes to hold simple strings. I'll even hand you another: XML leaves it up to the application, whether whitespace within an element is significant.

                  For its part, JSON has two types of containers: arrays and objects. Further, it gives one the option to use numbers, bool, string, or null. It might not always be obvious when null should be used vs. an empty string, for instance. And one could make the mistake of inserting a number as a string, for another.
                  Yes and I said earlier the difference is a lot less clear. Not saying JSON doesn't have issues but the issues are not what you cite (i.e. an actual real issue is whether JSON numbers are double format or no because the spec leaves it ambigious). There is still a difference between magnitude though.

                  Originally posted by coder View Post
                  That's motivated reasoning.


                  Indeed, you are. You're posing an opinion as fact, and yet you're incredibly vague and misleading about the specifics supporting that opinion. That absolutely qualifies as FUD.
                  XML is dying, thats a fact and thats what I meant. Almost every new backend/web service that is made these days serves JSON via REST principles. Even Microsoft is moving away from SOAP/XML when it can, heck even case where you could argue XML makes sense (such as LSP - Language Service Protocol) doesn't use XML.

                  When people say dead in this context they don't mean literally dead as in no one uses, what they mean its dead in the sense that not much new technology is deciding to use it. Its dead in the sense that COBOL or Corba is dead, although XML is earlier in the timeline.


                  Originally posted by coder View Post
                  Qualifications mean zero, to me. If you have to resort to qualifications, it means you've lost on merit. That we cannot verify your qualifications is that much further reason to disregard them.

                  What I care about is the amount and quality of the information supporting your position. And that's what's lacking.
                  Says the person not quantifying their arguments

                  Originally posted by coder View Post
                  Often, the best solutions on merit don't win the day. People like having a small learning curve and minimal barriers to adoption. Just as XML piggybacked on HTML/SGML, JSON piggybacked on Javascript. The then-ubiquity of Javascript is a lot of the reason, right there.
                  That argument holds if the better solution never gained traction vs an incumbent that is not as good but has larger market power (i.e. VHS vs Beta). But in this case, we had an incumbent that was dominant (XML) and it lost its position which is actually the same as my previous examples of COBOL/Corba. In almost every case when this happens, one of the reasons is because its "better" for the definition of better that is appropriate in context.


                  Originally posted by coder View Post
                  I doubt any of those has the same power and flexibility of XML, much less the same supporting ecosystem and standards.
                  For the most part they do, and if they don't have the same amount of power as XML then its a good thing because such cases are almost always out of scope/over engineered solutions.

                  Originally posted by coder View Post
                  The existence of one good solution never stops people trying to invent other solutions, better-suited to their background or immediate priorities. There are reasons "reinventing the wheel" is such a common analogy. XML wasn't the first to attempt what it did, and it'd be presumptuous to think it'd be the last.
                  JSON didn't re-invent the wheel though, it became popular because it really exposed the limitations of XML and thats why it ended up becoming used.

                  Comment


                  • #59
                    XML is a markup language! At any point you intersperse and embed tags within tags in a run of text. That's great to mark up some text as I just did, but it's irrelevant for many data representation tasks, and you often want to disallow this or namespace it, particularly for a configuration payload.

                    You can't mark up text within JSON with more JSON (at least, not without turning it into verbose nested span objects), which makes it simpler to process and to most people a better fit for a configuration payload. JSON simply turns into a nested object/array structure in most programming languages (you could just `eval` it in JavaScript, hence the name), while XML has to be constrained to do the same because it has so many more features and again it's markup not an object.

                    I like them both. XSL was cool in 2004, JSON schema was cool in 2014.

                    Comments by convention are easy in JSON, just add
                    "//": "The value of the // key is an uninterpreted comment, like this"
                    to any object.

                    Comment


                    • #60
                      Originally posted by mdedetrich View Post
                      Yeah and thats the point, most people have stopped using XML and the few companies/projects willing to use XML or doing so for historical/legacy reasons
                      There's more than a hint of wishful thinking, in this. XML certainly has been displaced from its status as a de facto choice, which it once enjoyed.

                      However, you'd have to be nuts to use JSON for something like manpages. There are plenty of cases where XML remains the best choice among available options.

                      Originally posted by mdedetrich View Post
                      The analogy is not apt because you are not quantifying the difference. With JSON object vs JSON array the difference is very clear, if you need ordering you put it in a JSON array otherwise you don't.
                      Order might or might not be significant. Objects carry more overhead and force each item to be named, which one might not want to do.

                      Originally posted by mdedetrich View Post
                      With nested elements vs properties (as an example) its a lot less clear how to model things.
                      First, they're called "attributes". Second, if you have some value you're very unlikely to want to extend with richer content, you make it an attribute.

                      In fact, the very same dilemma exists in JSON, when one is deciding whether to model something as a simple string value or an object.

                      Originally posted by mdedetrich View Post
                      in this case, we had an incumbent that was dominant (XML) and it lost its position
                      People are lazy and JSON seems easy. It's really as simple as that.

                      XML was definitely over-hyped and therefore, being used in some ways and places where it wasn't necessary. So, it's not at all surprising to see some retrenchment.

                      Originally posted by mdedetrich View Post
                      if they don't have the same amount of power as XML then its a good thing because such cases are almost always out of scope/over engineered solutions.
                      I'd take over-engineered before under-engineered, any day of the week. And I've definitely seen JSON used in some instances of the latter.

                      Originally posted by mdedetrich View Post
                      JSON didn't re-invent the wheel though, it became popular because it really exposed the limitations of XML and thats why it ended up becoming used.
                      And what limitations are those?

                      I get that you don't like XML, but at this point you're just pulling arguments out of your ass.

                      Comment

                      Working...
                      X