Announcement

Collapse
No announcement yet.

Testing The First PCIe Gen 5.0 NVMe SSD On Linux Has Been Disappointing

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by L_A_G View Post
    Upvotes aren't really a proof of anything. They're just as likely to be from people who took your posts at face value and didn't know that what you wrote was factually incorrect.
    Whatever makes you feel better.
    People who work on big datasets doing design and visualization also don't just load a big lump in at the start of the workday and then work on it for the rest of the day. They're usually too big to fit into memory all at once anyway so the application will be swapping data in and onto the disc. Add to that the fact that they're processing this data so they'll also be writing additional data onto the disc as part of that work. For some tasks this may even be more data than the original dataset.
    And as I keep saying, if you're handing that much data, you're straddling the line of what warrants a server.
    Maybe, but that's not the case until you start getting into the territory of datasets so big they don't even fit on a single consumer-grade disc. When you start moving into the realm of literally multiple terabytes of data. In video production you usually keep an archive of the footage on the server and then make local copies of that as you're editing a new project. Fast discs on the server means you can get your even faster local copies faster.
    Again: server territory, not workstations. And I've already said from the beginning that things like video production are one of the few examples where high-performance SSDs make perfect sense on a workstation.
    Once again; Pointing out clear factual errors and correcting misunderstandings when they're used as arguments isn't nitpicking.
    It's a factual error when you take it too literally, which you did.
    You remind me of those people who comment against a layman's explanation of physics, like "planets orbit the sun because they want to move away but gravity pulls them in" and then there's you saying "AKSHULLY, planets don't want to move anywhere as they have no ability to desire anything, and not all planets exist in our solar system".
    On topic; No. A 25 GB database is not big by today's standards. Nor is the access rate you described. I'm running something fairly similar on what is essentially the lowest end cloud instance that Google currently provides and it's almost overkill for the job.
    How many times do I have to tell you it's a table, not a database. Do you know what tables are? And like I said: plain text. Sometimes it goes several hours being touched non-stop; the 15 minute thing is to say how long it ever goes without being read.
    The point is, there's a certain point to how big your dataset is where whether the table were 25GB or 2500GB, because you're not going to fit it all into RAM and if you handle the data appropriately, you don't need to fit it all into RAM and you don't need blazing fast access either.
    What? Is basic arithmetic too much to ask now?
    What "basic arithmetic" are you referring to?
    Hypothetical examples? All of the ones I pointed out are very much real. The fact that whatever office you support doesn't do that kind of work doesn't mean that there aren't a lot of those kinds of use cases. The 250GB dataset that I mentioned? That's a LIDAR scan of the Helsinki area tram network commissioned by the city planning department and has been used for planning out the maintenance, improvement and expansion of the Helsinki tram network.
    250GB sounds rather small for what you described. If that is your real-world example, I'm not seeing where you need a high-speed SSD. And if for some reason you are getting significant I/O issues on the latest SSDs, either the software you're using is poorly optimized (it's friggin LIDAR - just reduce the data density as you zoom out and I'm sure it'll look fine) or you're overcomplicating your workflow.
    So yeah, I still say you're speaking hypothetically.
    The main point of btrfs is anything but compression. It's forte is stuff like file integrity, avoidance of and auto-defragmentation, dynamic volume re-sizing, online load balancing. It's compression is primarily native support for things like zlib, which has been pretty much ubiquitous in games for the last 15 years. The devkits of the Playstation 3, Xbox 360 and Wii all had it included as a standard library from the beginning.
    You have a knack for rambling about things that are besides the point. I'm well aware of the main point of btrfs, but, it is one of the better filesystems for on-the-fly compression.
    To put it as simply as I can; Your primary example is a game from before the use of zlib became standard. Your solution; To use what's already an industry standard.
    And yet, many games within the past 15 years still have room for more compression, hence me bringing up Steam's download size and btrfs.
    A. The Playstation 5 and Xbox Series consoles all have a fast PCIe SSD as standard so you can almost guarantee that people have them.
    B. Its something Sony's early PC ports of Playstation 5 games have already begun to do that. Housemarque's Returnal has 32 GB of RAM as a requirement
    C. Zlib has been an industry standard for over 15 years so the kind of gains you're thinking of don't exist beyond stuff like generated assets.

    Oh and before you start going on about the potential of generated assets and stuff like this 177k on disc demo, I'll have to point out that they always spend more time generating those assets than it's take to read them off a half-decent disc.
    A. Not if you use external/additional storage...
    B. See my last comment. Regardless of that, using zlib isn't going to be as effective as purpose-built compressors. There are many ways to compress textures and audio. Seems to me, devs could be taking advantage of this.
    C. Who said anything about industry standards? No wonder you think there's no room for more compression...
    EDIT:
    Didn't even think about the last thing.
    Last edited by schmidtbag; 13 March 2023, 09:59 PM.

    Comment


    • #42
      Originally posted by schmidtbag View Post
      Whatever makes you feel better.
      In case you didn't notice; I've also gotten upvotes in this thread. One comment even got four of them. However who's right or wrong isn't a popularity content.

      And as I keep saying, if you're handing that much data, you're straddling the line of what warrants a server.
      When you're doing visualization and design you don't use a server other than for distributing that data and archival use. The latency from using an external server will simply break the workflow. The only time you work with the server directly is in a multiple people working on the same data at the same time, doing something relatively latency insensitive (like database queries), computationally non-demanding or a truly massive multi-terabyte dataset.

      Again: server territory, not workstations. And I've already said from the beginning that things like video production are one of the few examples where high-performance SSDs make perfect sense on a workstation.
      Again; In these tasks the servers are only for archival and distribution of these datasets. You pull in the dataset you want, work on it, iteratively, with your higher end SSD, and then when you're done push the resulting edited dataset onto the server. A server is NOT going to have the high end compute hardware you need to process that high end dataset, especially when its quite often a graphics card that you use for that processing.

      It's a factual error when you take it too literally, which you did.
      When its presented as a literal statement, not a joke or anything, then is is a factual error. You can't just declare something to be a joke afterwards because you don't want to be wrong.

      You remind me of those people who comment against a layman's explanation of physics, like "planets orbit the sun because they want to move away but gravity pulls them in" and then there's you saying "AKSHULLY, planets don't want to move anywhere as they have no ability to desire anything, and not all planets exist in our solar system".
      ...and you remind of the people I used to run into on 4chan. Make a false statement, get corrected by multiple people, then make it again, get corrected by multiple people again, get pointed out that you've been corrected multiple times on this, acknowledge this but refuse to admit you're wrong, rinse and repeat. How do they delude themselves that they're not wrong? By being the last one to reply to a thread so they can think they're right on account of being "Last man standing" in the thread. Not because everyone got sick of correcting them when they just made the same false statements over and over again.

      How many times do I have to tell you it's a table, not a database. Do you know what tables are? And like I said: plain text. Sometimes it goes several hours being touched non-stop; the 15 minute thing is to say how long it ever goes without being read. The point is, there's a certain point to how big your dataset is where whether the table were 25GB or 2500GB, because you're not going to fit it all into RAM and if you handle the data appropriately, you don't need to fit it all into RAM and you don't need blazing fast access either.
      Table or whole database doesn't matter. You're talking about working on this one specific 25 GB dataset. It's simply not a large dataset to work on by 2020s standards. No ifs or buts.

      What "basic arithmetic" are you referring to?
      The one where you complained about me not being specific enough when I talked about having a dataset 10 times the size of your "large" one literally on hand at that moment.

      250GB sounds rather small for what you described. If that is your real-world example, I'm not seeing where you need a high-speed SSD. And if for some reason you are getting significant I/O issues on the latest SSDs, either the software you're using is poorly optimized (it's friggin LIDAR - just reduce the data density as you zoom out and I'm sure it'll look fine) or you're overcomplicating your workflow.
      I already said that it's not even a particularly big dataset anymore and it's almost a decade old at this point. Yet its still 10 times the size of your "big" dataset. We have another test dataset that's a couple of years newer made with a then (2017) newer scanner that's 40GB for an area that's about 500 x 50 meters. You don't get high-resolution scans unless you want to use them so you're in no position to give any advice to literal subject matter experts.

      So yeah, I still say you're speaking hypothetically.
      I think it's pretty clear that you're just a know-it-all with a bad case of Dunning-Kruger who thinks being able to get the last word in means you're right.

      You have a knack for rambling about things that are besides the point. I'm well aware of the main point of btrfs, but, it is one of the better filesystems for on-the-fly compression.
      Beside the point? I pointed out that btrfs's compression is the exact same some tool that's been ubiquitous in game development for the last 15 years.

      To put it as simply as I can; Games are already being compressed with the exact same compression tool that btrfs uses.

      And yet, many games within the past 15 years still have room for more compression, hence me bringing up Steam's download size and btrfs.
      Yeah... I'm probably arguing against one of the same shitposters I got sick of on 4chan here.

      I've pointed to you multiple times that PC releases have up until recently contained higher LoDs, i.e additional data, that didn't use to be included in the console versions due to consoles not having the performance/memory to use those assets. With the latest console generation however, those higher LoDs are also used by consoles so they ARE included in their versions and hence console installs aren't really any smaller than the PC versions.

      Repeating a false statement, specially to a person who's pointed out that it is false, doesn't make it any less false.

      A. Not if you use external/additional storage...
      Seeing how you didn't know; You can't run current generation versions of games off external storage anymore. Not on the Playstation 5. Not on the Xbox Series consoles.

      B. See my last comment. Regardless of that, using zlib isn't going to be as effective as purpose-built compressors. There are many ways to compress textures and audio. Seems to me, devs could be taking advantage of this.
      Yet you still suggested zlib as a solution... (If you didn't get it; btrfs compression IS zlib)

      Sony built in dedicated hardware compression into the Playstation 5 and install sizes are still almost exactly that of the Xbox Series versions of those same games.

      C. Who said anything about industry standards? No wonder you think there's no room for more compression...
      Again; Your suggested solution is something that's already been an industry standard for over 15 years and your suggestion as to what can be achieved with it are from before it became an industry standard.

      Comment


      • #43
        Originally posted by L_A_G View Post
        When you're doing visualization and design you don't use a server other than for distributing that data and archival use. The latency from using an external server will simply break the workflow. The only time you work with the server directly is in a multiple people working on the same data at the same time, doing something relatively latency insensitive (like database queries), computationally non-demanding or a truly massive multi-terabyte dataset.
        Or, you do the computation directly on the server. It's strange how you accuse me of needing to get with the times yet seem unaware of how much more powerful a typical server is compared to a workstation.
        When its presented as a literal statement, not a joke or anything, then is is a factual error. You can't just declare something to be a joke afterwards because you don't want to be wrong.
        How did I present it to be literal?
        ...and you remind of the people I used to run into on 4chan. Make a false statement, get corrected by multiple people, then make it again, get corrected by multiple people again, get pointed out that you've been corrected multiple times on this, acknowledge this but refuse to admit you're wrong, rinse and repeat. How do they delude themselves that they're not wrong? By being the last one to reply to a thread so they can think they're right on account of being "Last man standing" in the thread. Not because everyone got sick of correcting them when they just made the same false statements over and over again.
        Nobody else here is correcting me, because they understand there isn't anything to correct. Other people got the gist of what I was saying. As I mentioned before, I wasn't disagreeing with anything you said other than how common high-performance SSDs are needed. This is practically a one-sided argument.
        Table or whole database doesn't matter. You're talking about working on this one specific 25 GB dataset. It's simply not a large dataset to work on by 2020s standards. No ifs or buts.
        Except there are other tables... And like I said, the table is large enough that it makes a very small difference [on an SSD] in how much bigger it could possibly get, since no matter what, you're not loading the whole thing into RAM and modern databases are build to find data quickly.
        The one where you complained about me not being specific enough when I talked about having a dataset 10 times the size of your "large" one literally on hand at that moment.
        Still not sure what your question is asking then, not that it matters because you're just being facetious and rhetorical anyway.
        I already said that it's not even a particularly big dataset anymore and it's almost a decade old at this point. Yet its still 10 times the size of your "big" dataset. We have another test dataset that's a couple of years newer made with a then (2017) newer scanner that's 40GB for an area that's about 500 x 50 meters. You don't get high-resolution scans unless you want to use them so you're in no position to give any advice to literal subject matter experts.
        When you record data, any kind of data, it makes perfect sense to read in the highest resolution possible. That's how you do a job appropriately. But, much like any other industry who does their job right, they load the appropriate amount of data to perform their work. In the case of LIDAR, I don't see how it's necessary to need full-res detail for most of the job. I can see how in some situations it is imperative to have that detail, but not (for example) when you're just "flying" through the scene or when you're virtually 1000m away from the scene. I imagine most LIDAR/SLAM software is similar to other programs where the data loaded and/or rendered varies depending on what you're doing.
        So, I find it hard to believe you need a high-performance SSD for a 40GB scene. Seems to me either your software is poorly designed or you're just loading in more detail than necessary.
        I think it's pretty clear that you're just a know-it-all with a bad case of Dunning-Kruger who thinks being able to get the last word in means you're right.
        I could argue the same of you.
        Beside the point? I pointed out that btrfs's compression is the exact same some tool that's been ubiquitous in game development for the last 15 years.
        Except not all games use the same level of compression and not all of them use zlib. So, sometimes you will see some significant gains. In other cases, not so much. btrfs also allows for LZO and ZSTD, but it seems like zlib is all you know about.
        I've pointed to you multiple times that PC releases have up until recently contained higher LoDs, i.e additional data, that didn't use to be included in the console versions due to consoles not having the performance/memory to use those assets. With the latest console generation however, those higher LoDs are also used by consoles so they ARE included in their versions and hence console installs aren't really any smaller than the PC versions.
        Yeah, keep mentioning it. I'm not disagreeing with that, so how about move on, eh? Most of your gripe with me is all in YOUR head.
        Seeing how you didn't know; You can't run current generation versions of games off external storage anymore. Not on the Playstation 5. Not on the Xbox Series consoles.
        Seems you are right about that, I'll give you that one.

        Comment


        • #44
          Originally posted by schmidtbag View Post
          Or, you do the computation directly on the server. It's strange how you accuse me of needing to get with the times yet seem unaware of how much more powerful a typical server is compared to a workstation.
          The average server does not have the performance, especially in graphically intensive workloads, to do workstation jobs. Even if it had, it's a server. Used by multiple people simultaneously. Those big Epyc and Xeon Gold server CPUs aren't running desktop applications. They're running non-realtime compute jobs. Which is incidentally what I wrote my master's thesis on. There's a reason why workstation workloads haven't moved onto servers and instead only things like thin client desktops have had any success.

          How did I present it to be literal?
          How did you present it as anything but that? When you say something as a joke or some other less-than-serious manner you don't just say it as a matter of fact. You said it as a matter of fact. I can't read your mind over the internet to know that it wasn't meant to be interpreted as it was written.

          Nobody else here is correcting me, because they understand there isn't anything to correct. Other people got the gist of what I was saying. As I mentioned before, I wasn't disagreeing with anything you said other than how common high-performance SSDs are needed. This is practically a one-sided argument.
          Who else is even bothering to reply to a Dunning-Kruger case like you except me? Not only are you wrong about the number of use cases for high performance SSDs, you're also wrong in your justifications. Arguments that are flat out incorrect or based on an outdated idea of the kinds of workloads people run on their machines 20+ years out of date.

          Except there are other tables... And like I said, the table is large enough that it makes a very small difference [on an SSD] in how much bigger it could possibly get, since no matter what, you're not loading the whole thing into RAM and modern databases are build to find data quickly.
          You're really doubling down on this one ultimately irrelevant detail aren't you? It really doesn't matter if it's the whole database or just that one table when you talked about working on that one table specifically as if it's this monster thing you can't fit into RAM. Which again shows how badly out of touch you are.

          My previous workstation, a now 10-year-old hand-me-down from my boss when I started at my current job, had 32GB of RAM and thus much more than enough to fit that "huge" table in RAM at any given time. My current one, a 5-year-old original Threadripper box, has 64GB. It similarly doesn't matter what kind of super fast NoSQL database you're running; Persistent memory is never going to be able to compete with volatile memory of the same vintage.

          Still not sure what your question is asking then, not that it matters because you're just being facetious and rhetorical anyway.
          You're the one who began to complain about me not being specific enough when I pointed out that I had a dataset 10 times the size of what you were touting as a huge dataset and that it wasn't even a particularly big dataset by today's standards.

          When you record data, any kind of data, it makes perfect sense to read in the highest resolution possible. That's how you do a job appropriately. But, much like any other industry who does their job right, they load the appropriate amount of data to perform their work. In the case of LIDAR, I don't see how it's necessary to need full-res detail for most of the job. I can see how in some situations it is imperative to have that detail, but not (for example) when you're just "flying" through the scene or when you're virtually 1000m away from the scene. I imagine most LIDAR/SLAM software is similar to other programs where the data loaded and/or rendered varies depending on what you're doing.
          Do you really think we're working entirely out off the disc or that we don't include downsampled stand-ins for when things are viewed at a distance or we're waiting for the high-res chunks are loaded in off disc? Because we absolutely do and this has been a standard feature of software in this industry for decades already.

          This is exactly what I mean; You're a know-it-all who tries to explain things well outside of your own area of expertise to literal subject matter experts in that exact thing.

          So, I find it hard to believe you need a high-performance SSD for a 40GB scene. Seems to me either your software is poorly designed or you're just loading in more detail than necessary.
          You could actually try to read what I wrote. I said it's a test dataset. A subset of a far, far larger dataset we use to test our software.

          I could argue the same of you.
          Says the layman trying to explain their own area of expertise to a literal subject matter expert...

          Except not all games use the same level of compression and not all of them use zlib. So, sometimes you will see some significant gains. In other cases, not so much. btrfs also allows for LZO and ZSTD, but it seems like zlib is all you know about.
          As I already told you; The use of zlib has been ubiquitous in games for well over a decade. LZO is also just an implementation of Lempel-Ziv that only produces results on very small, mostly text, files while zstd is just a faster implementation of Lempel-Ziv.

          To put is as simply as I can; With the kinds of archive sizes we're talking here, Lempel-Ziv is useless. Yes, you get the files in one neat package, but you don't get any appreciable compression of data. That's something you only get if you're still in the 90s or if you're compressing something fairly small like source code.

          Yeah, keep mentioning it. I'm not disagreeing with that, so how about move on, eh? Most of your gripe with me is all in YOUR head.
          I keep pointing it out because you keep making the same false statement that PC versions of games are bigger than their console versions because the console versions use better compression. For some reason the correction that "No, they use the exact same compression and PC versions used to contain extra data, now they're the same size" just keeps going over your head time and time again.

          Comment


          • #45
            Originally posted by L_A_G View Post
            The average server does not have the performance, especially in graphically intensive workloads, to do workstation jobs. Even if it had, it's a server. Used by multiple people simultaneously. Those big Epyc and Xeon Gold server CPUs aren't running desktop applications. They're running non-realtime compute jobs. Which is incidentally what I wrote my master's thesis on. There's a reason why workstation workloads haven't moved onto servers and instead only things like thin client desktops have had any success.
            I understand they're not running desktop applications. I didn't say they were, though, some servers are built with desktop virtualization in mind - I've been to companies that do such things. We're talking about some pretty blurred lines here of what warrants a server and what doesn't. There's a certain point where you have so much data that it isn't practical to load on a workstation, where getting 10Gbps ethernet makes more sense. But even if you do have that much data, that doesn't necessarily mean you need the fastest SSD available. By "need", I mean where you are losing several minutes or even hours throughout a normal workday waiting for the disk and where there is nothing you can do to optimize the workflow.
            How did you present it as anything but that? When you say something as a joke or some other less-than-serious manner you don't just say it as a matter of fact. You said it as a matter of fact. I can't read your mind over the internet to know that it wasn't meant to be interpreted as it was written.
            Uh... no? How often do you go out and socialize? When someone says "ugh I'm starving!" they're not literally starving, but it's not a joke either. When someone says "French citizens speak French", it's given there are people there who speak other languages, or, that there are expats who maybe (for some reason) don't speak French. Obviously not a joke either.
            When you say games within the past 15 years use ZLIB, I'm aware that not literally every single game takes advantage of ZLIB for all of their assets.
            When I say there are games that have room for more compression, I'm not saying all games aren't using compression, or, that all assets aren't compressed.
            Who else is even bothering to reply to a Dunning-Kruger case like you except me? Not only are you wrong about the number of use cases for high performance SSDs, you're also wrong in your justifications. Arguments that are flat out incorrect or based on an outdated idea of the kinds of workloads people run on their machines 20+ years out of date.
            I wasn't aware you were the authority of such cases. My bad, I'll be sure to pass people your way next time they have such arguments.
            You're really doubling down on this one ultimately irrelevant detail aren't you? It really doesn't matter if it's the whole database or just that one table when you talked about working on that one table specifically as if it's this monster thing you can't fit into RAM. Which again shows how badly out of touch you are.
            It's you doubling-down, because you refuse to get the gist of what I'm trying to say, only because my examples aren't on the same tier that only the largest of datacenters deal with (which BTW, isn't going to work on your fancy workstation either). You keep acting like this one table is the only thing the server has to deal with. You're not going to want 25GB of just 1 table (of hundreds) stored entirely in RAM when there are other workloads that also demand attention. And you think I'm the one out of touch...
            You're the one who began to complain about me not being specific enough when I pointed out that I had a dataset 10 times the size of what you were touting as a huge dataset and that it wasn't even a particularly big dataset by today's standards.
            So you're assuming I complained, you assumed that I used the word "huge", and you continue to disregard it's just 1 table.
            Do you not see how you're the problem here? You either take things way too literally or you take creative liberties in what I say when it suits your narrative.
            Do you really think we're working entirely out off the disc or that we don't include downsampled stand-ins for when things are viewed at a distance or we're waiting for the high-res chunks are loaded in off disc? Because we absolutely do and this has been a standard feature of software in this industry for decades already.
            Precisely my point: so shut up and stop arguing with me. If the downsampled stand-ins are still slow enough to impede workflow when loading from a decent SATA drive, then we can add this to the short list of applications that demand high-end SSDs.
            You could actually try to read what I wrote. I said it's a test dataset. A subset of a far, far larger dataset we use to test our software.
            What difference does it make? Like I said before: there's a point where a bigger dataset has a nearly negligible impact on your workload. Assuming your test dataset is the same LoD as the complete dataset, the application isn't going to change that much in what it loads at any given time.
            In another perspective:
            Game developers could have hundreds of GB of raw assets. If they decide to take enough of those assets to work on a single level on a separate computer, their total disk load doesn't really change much compared to having the complete dataset. You're not loading everything, so it doesn't matter how big it gets.
            That's why I was using my 25GB table example - it's not that big, but it's big enough that it's not practical to dump the whole thing in RAM.
            Says the layman trying to explain their own area of expertise to a literal subject matter expert...
            I'm an expert too. My proof is because I said so - I even wrote a masters thesis on it.
            As I already told you; The use of zlib has been ubiquitous in games for well over a decade. LZO is also just an implementation of Lempel-Ziv that only produces results on very small, mostly text, files while zstd is just a faster implementation of Lempel-Ziv.
            Right... I'm sure you know how the entire industry works. After all, you are the ostensible expert. Surely, I should believe everything you say.
            As I already told you, there's more methods of compression than just zlib. Using zlib to compress audio and textures is outright stupid, when there are purpose-built alternatives. If you're using zlib on top of all that, that's fine, I guess.
            I keep pointing it out because you keep making the same false statement that PC versions of games are bigger than their console versions because the console versions use better compression. For some reason the correction that "No, they use the exact same compression and PC versions used to contain extra data, now they're the same size" just keeps going over your head time and time again.
            It's not a false statement, but you, for whatever bewildering reason, think I'm suggesting it's an issue with all games. You act like everything is compressed with zlib and only zlib.
            Hah... expert.
            Last edited by schmidtbag; 15 March 2023, 04:03 PM.

            Comment


            • #46
              Originally posted by schmidtbag View Post
              I understand they're not running desktop applications. I didn't say they were, though, some servers are built with desktop virtualization in mind - I've been to companies that do such things. We're talking about some pretty blurred lines here of what warrants a server and what doesn't. There's a certain point where you have so much data that it isn't practical to load on a workstation, where getting 10Gbps ethernet makes more sense. But even if you do have that much data, that doesn't necessarily mean you need the fastest SSD available. By "need", I mean where you are losing several minutes or even hours throughout a normal workday waiting for the disk and where there is nothing you can do to optimize the workflow.
              We're not talking about standard thin client type desktop applications here. We're talking about some pretty compute-heavy and latency sensitive workloads.

              These workloads are also often fairly iterative so slowing down the workflow is also going to reduce the quality of the end result. Workloads like these also include a lot of batch- processing of data, which on a workstation will be IO-limited. I've seen job processing times go down from over an hour to less than 20 minutes just by installing a PCIe SSD. Add to this the amortization time of years and most of the people who use these applications being well paid professionals and fast SSDs make a lot of business sense.

              Uh... no? How often do you go out and socialize? When someone says "ugh I'm starving!" they're not literally starving, but it's not a joke either. When someone says "French citizens speak French", it's given there are people there who speak other languages, or, that there are expats who maybe (for some reason) don't speak French. Obviously not a joke either.
              How often do you go out? Because this thing called "context" is something I expected you to understand, but apparently I expected too much of you. Non-literal statements like that are made in a context where it's clear they're not meant to be taken literally. There was nothing in the context that would've suggested this was a joke or anything.

              I shouldn't have to explain basic social skills to you, but here we are...

              When you say games within the past 15 years use ZLIB, I'm aware that not literally every single game takes advantage of ZLIB for all of their assets.
              When I say there are games that have room for more compression, I'm not saying all games aren't using compression, or, that all assets aren't compressed.
              You do realize that your "evidence" for PC games not being properly compressed was all either well out of date or a misunderstanding? What you wrote was simply wrong.

              There's a difference between being slightly hyperbolic and being flat-out wrong.

              I wasn't aware you were the authority of such cases. My bad, I'll be sure to pass people your way next time they have such arguments.
              ... or maybe someone doing desktop support for a standard office environment shouldn't extrapolate from their own personal experiences as much as you do.

              It's you doubling-down, because you refuse to get the gist of what I'm trying to say, only because my examples aren't on the same tier that only the largest of datacenters deal with (which BTW, isn't going to work on your fancy workstation either). You keep acting like this one table is the only thing the server has to deal with. You're not going to want 25GB of just 1 table (of hundreds) stored entirely in RAM when there are other workloads that also demand attention. And you think I'm the one out of touch...
              You brought up this thing as an example of what you considered to be a big dataset to be working on. I pointed our that it isn't and that you thinking a 25 GB dataset is big by the standards of 2023 shows you're very badly out of touch. A properly big dataset by today's standards is something in the class of literal terabytes of data and it's fairly cleat that you haven't worked on anything even close to that size.

              Also, you're generally not going to have multiple datasets and thus multiple jobs under work at the same time in memory. At least unless you've got multiple people trying to do work on the same machine at the same time. Something I already pointed out wasn't practical for workload type jobs.

              So you're assuming I complained, you assumed that I used the word "huge", and you continue to disregard it's just 1 table.
              Do you not see how you're the problem here? You either take things way too literally or you take creative liberties in what I say when it suits your narrative.
              You brought it up as this impractically huge dataset and you didn't say that it's only a small fraction of something far, far larger until I mocked you for it. It was only that one table in isolation and only now is it part of something far, far larger. Which leads me to believe that its just the largest table in a database where none of the other tables are anywhere near that big.

              Precisely my point: so shut up and stop arguing with me. If the downsampled stand-ins are still slow enough to impede workflow when loading from a decent SATA drive, then we can add this to the short list of applications that demand high-end SSDs.
              Shut up and stop arguing with you? Sounds like somebody was an only child or otherwise got doted on by their parents too much...

              Seeing how you didn't realize the obvious again; Those lower density sets aren't free. They take up space on disc too and that's never infinite. So you generally only have one and then pull in from the original once the viewport is anywhere near it. The utility of these downsampled sets when actually looking closer at a part of the pointcloud is questionable when they're going to be viewed at trough a 4k monitor these days.

              What difference does it make? Like I said before: there's a point where a bigger dataset has a nearly negligible impact on your workload. Assuming your test dataset is the same LoD as the complete dataset, the application isn't going to change that much in what it loads at any given time.
              What difference does it make? If a 50x500m test dataset is 40GB, how big do you think a production use set with the same density is going to be? What do you think that's going to do to the amount of time you spend waiting for things to be loaded off disc on a SATA drive versus a PCIe one?

              To answer that obvious question; It's going to be in the hundreds of gigabytes and the bigger it is, the more you're going to benefit from having a fast disc as you load and un-load chunks of it into RAM and VRAM.

              Game developers could have hundreds of GB of raw assets. If they decide to take enough of those assets to work on a single level on a separate computer, their total disk load doesn't really change much compared to having the complete dataset. You're not loading everything, so it doesn't matter how big it gets.
              That's why I was using my 25GB table example - it's not that big, but it's big enough that it's not practical to dump the whole thing in RAM.
              If that one table is what you're primarily working on at a time then on modern workstations that 25GB, or a large chunk of if, is very much practical to be read and kept in memory. When you're working with datasets that are genuinely large by today's standards your software will be reading big chunks of it into memory as you're working with it and it'll be doing so constantly. Volatile memory is still massively faster than non-volatile memory regardless if it sits behind a bus designed and spec'd for mechanical disc drives (SATA) or something more suited for the kind of performance NAND flash memory can provide (PCIe).

              I'm an expert too. My proof is because I said so - I even wrote a masters thesis on it.
              Aah yes... The desktop support guy trying to talk down to somebody who actually works on the kinds of applications that genuinely benefit from fast disc drives.

              [QUOTE]Right... I'm sure you know how the entire industry works. After all, you are the ostensible expert. Surely, I should believe everything you say.[QUOTE]

              You don't have to be an expert to know something about what's in the platform provider provided toolkits used by every developer who works on a particular platform. If you don't believe me you're free to go look things up and fact check me. However your trying to "Nu-uuh" me like you're 5 years old is less than impressive.

              As I already told you, there's more methods of compression than just zlib. Using zlib to compress audio and textures is outright stupid, when there are purpose-built alternatives. If you're using zlib on top of all that, that's fine, I guess.
              No, you used examples of alternatives that really aren't suited and proved that you don't know what you're talking about.

              It's not a false statement, but you, for whatever bewildering reason, think I'm suggesting it's an issue with all games. You act like everything is compressed with zlib and only zlib.
              Remember why I brought up zlib? Because you suggested using btrfs to compress games and zlib, a game industry standard going back over a decade, is what btrfs uses for file compression.

              In other words; You suggested using something that's already been an industry standard for a decade and a half.

              Hah... expert.
              When arguing with a Dunning-Kruger case like you, it doesn't even take much to be an expert by comparison.

              Comment


              • #47
                Originally posted by L_A_G View Post
                We're not talking about standard thin client type desktop applications here. We're talking about some pretty compute-heavy and latency sensitive workloads.

                These workloads are also often fairly iterative so slowing down the workflow is also going to reduce the quality of the end result. Workloads like these also include a lot of batch- processing of data, which on a workstation will be IO-limited. I've seen job processing times go down from over an hour to less than 20 minutes just by installing a PCIe SSD. Add to this the amortization time of years and most of the people who use these applications being well paid professionals and fast SSDs make a lot of business sense.
                Exactly my point - these are so complex that they teeter on the need of servers. One thing servers are particularly useful for is iterative workflows.
                I shouldn't have to explain basic social skills to you, but here we are...
                Says the expert autist who, even in the last quoted segment, doesn't realize we're in agreement.
                You do realize that your "evidence" for PC games not being properly compressed was all either well out of date or a misunderstanding? What you wrote was simply wrong.
                And you know this how? It's not like this stuff is constantly being reported. I don't keep tabs on this crap, but there's no reason it wouldn't still be happening.
                ... or maybe someone doing desktop support for a standard office environment shouldn't extrapolate from their own personal experiences as much as you do.
                I'm not doing desktop support, but since we're making baseless accusations of who we are: you're not an expert and you've done little to prove otherwise.
                You brought up this thing as an example of what you considered to be a big dataset to be working on. I pointed our that it isn't and that you thinking a 25 GB dataset is big by the standards of 2023 shows you're very badly out of touch. A properly big dataset by today's standards is something in the class of literal terabytes of data and it's fairly cleat that you haven't worked on anything even close to that size.
                You continue to miss the underlying point here but since you're so hellbent on framing who I am and what I know, there is little point in reiterating.
                What I will reiterate is: you undermine your own point when you talk about those literal terabytes, because it's incredibly niche for a workstation to have that much stuff. Not saying it doesn't happen, but you're the one saying this stuff is common.
                Also, you're generally not going to have multiple datasets and thus multiple jobs under work at the same time in memory. At least unless you've got multiple people trying to do work on the same machine at the same time. Something I already pointed out wasn't practical for workload type jobs.
                If you've got a single workstation and your job is hyper-focused on a single application, then yeah, you're working with a rather fixed dataset. If your workload involves many different applications, you're going to be swapping out memory frequently, or perhaps even loading the same dataset between different programs (of which are highly unlikely to share memory).
                You brought it up as this impractically huge dataset and you didn't say that it's only a small fraction of something far, far larger until I mocked you for it. It was only that one table in isolation and only now is it part of something far, far larger. Which leads me to believe that its just the largest table in a database where none of the other tables are anywhere near that big.
                No, I didn't, and I hadn't implied it was "impractically huge" either. In fact, I was actually stating that despite its size and frequent access, it hardly demands a high-end SSD.
                Obviously it's only a fraction of something larger; what kind of business only runs on a single database table? You should know this by now - after all, you are ostensibly an expert. I had to clarify there were more tables because you're so obtuse in your way of thinking that you kept insisting there was only 1. The thing is, the other tables are hardly relevant; the only reason the other tables matter (and even then, not exclusive to the other tables) is that it isn't practical to fit 25GB of just 1 table into RAM. You do know there's more to databases than just running queries, right? An expert would know this. The whole point I'm trying to drill into you is that it doesn't matter how big your dataset gets, because there's a certain point where it makes no difference to the workflow. Of course, this varies dramatically depending on what the workflow is, hence the limited situations where high-performance SSDs make perfect sense on a workstation.
                Shut up and stop arguing with you? Sounds like somebody was an only child or otherwise got doted on by their parents too much...
                Yeah, parents who unleash their frustration on their child due to their own personal issues, knowing damn well the child is right. We're practically in agreement yet you try so hard to find any possible way you can argue with me.
                Seeing how you didn't realize the obvious again; Those lower density sets aren't free. They take up space on disc too and that's never infinite. So you generally only have one and then pull in from the original once the viewport is anywhere near it. The utility of these downsampled sets when actually looking closer at a part of the pointcloud is questionable when they're going to be viewed at trough a 4k monitor these days.
                This is laughable. You're the one trying to convince me that shelling out big money for a high-performance NVMe drive is necessary for your line of work but you can't justify 40GB-250GB of temporary low-density sets? Your complete dataset could stored on a separate server (and frankly probably should be), accessed through 10Gbps, and then downsampled locally on your workstation. Worst-case scenario, you do a set of RAID1 HDDs to locally store your complete dataset and then use your working data on an SSD. A configuration like that ought to yield good-enough read performance, it guarantees data integrity, you get oodles of local capacity, all for a low price. Surely, an expert would figure this out.
                As for looking through a 4K monitor, seems to me you're just spoiled. "Oh no! I can notice some gaps in the pointcloud! How will I ever do my work!?"
                Cry me a river.
                What difference does it make? If a 50x500m test dataset is 40GB, how big do you think a production use set with the same density is going to be? What do you think that's going to do to the amount of time you spend waiting for things to be loaded off disc on a SATA drive versus a PCIe one?
                As stated before, you're not loading all that data simultaneously, so it doesn't matter how much bigger the production use is.
                If that one table is what you're primarily working on at a time then on modern workstations that 25GB, or a large chunk of if, is very much practical to be read and kept in memory. When you're working with datasets that are genuinely large by today's standards your software will be reading big chunks of it into memory as you're working with it and it'll be doing so constantly. Volatile memory is still massively faster than non-volatile memory regardless if it sits behind a bus designed and spec'd for mechanical disc drives (SATA) or something more suited for the kind of performance NAND flash memory can provide (PCIe).
                As I already said, there's more to databases than just running queries. In my particular case, the CPU actually tends to be the bottleneck (not always).
                So, given your presumptions, it's pretty understandable why you're so quick to mock, because yeah - a workload of nothing but querying 25GB is insignificant even 10 years ago and would be very cheap to put into RAM. But again: in what universe does it make sense where a whole business runs on a single 25GB table?
                You don't have to be an expert to know something about what's in the platform provider provided toolkits used by every developer who works on a particular platform. If you don't believe me you're free to go look things up and fact check me. However your trying to "Nu-uuh" me like you're 5 years old is less than impressive.
                The fact you treat your knowledge as so ubiquitous and the only truth just undermines your self-proclaimed expertise.
                No, you used examples of alternatives that really aren't suited and proved that you don't know what you're talking about.
                I used examples of what the user can do to further compress data, proving that there is in fact room for more compression. This is a worst case scenario, hence me pointing out there are purpose-build compression methods for specific types of assets.
                So, if a game compressed via ZLIB can continue to inefficiently be compressed by the user via ZLIB, that shows there is in fact much more room for more optimized compression methods.
                In other words; You suggested using something that's already been an industry standard for a decade and a half.
                No actually, I didn't suggest it. Didn't imply it either. I said using those methods shows there is more room for compression, of which I am indisputably correct about. Go ahead - try it yourself. Use compression level 9 on a game you know for a fact uses the zlib license. To reiterate: this is not the appropriate way to do it, but just to exemplify more-optimized methods could be done.
                Last edited by schmidtbag; 16 March 2023, 09:52 AM.

                Comment


                • #48
                  Originally posted by schmidtbag View Post
                  Exactly my point - these are so complex that they teeter on the need of servers. One thing servers are particularly useful for is iterative workflows.
                  Do you even read what you respond to? Servers are NOT good for iterative, real-time and low-latency workflows. I've already pointed out that they don't have the processing power of a workstation. Especially when that server needs to be serving multiple people simultaneously. Workstation jobs are still done on workstations for good reason.

                  Says the expert autist who, even in the last quoted segment, doesn't realize we're in agreement.
                  Uuuh... I explained why you're wrong and you're going "But that proves me right" because you don't have an argument anymore. I've proven your arguments wrong and now you're trying to pretend like the explanations as to why you're wrong are somehow in agreement with you.

                  And you know this how? It's not like this stuff is constantly being reported. I don't keep tabs on this crap, but there's no reason it wouldn't still be happening.
                  How do I know this? Because I, unlike you as you've just admitted, pay attention to this stuff. I remember this exact complaint from ages ago and I looked into it. I also remembered it when the current generation of consoles came out and could see that install sized on games went up to about what they are on PC.

                  I'm not doing desktop support, but since we're making baseless accusations of who we are: you're not an expert and you've done little to prove otherwise.
                  Desktop support was my assessment based on your actual level of expertise versus how much you think you know. It's easy to develop a case of Dunning-Kruger like yours when you're working with near computer illiterate people, less so when you're working with highly experienced engineers and people considered "gurus" in the field.

                  You continue to miss the underlying point here but since you're so hellbent on framing who I am and what I know, there is little point in reiterating. What I will reiterate is: you undermine your own point when you talk about those literal terabytes, because it's incredibly niche for a workstation to have that much stuff. Not saying it doesn't happen, but you're the one saying this stuff is common.
                  How many times do I have to remind you that you need to read what you're replying to before you do.

                  I wrote that it's not until you get into single datasets of terabytes when using a server starts to make sense. Those are the massive datasets when using a server becomes necessary. Even then the most practical thing to is to get the subset with what working with that'll fit in your workstation's PCIe SSD and then upload the results back to the server.

                  If you've got a single workstation and your job is hyper-focused on a single application, then yeah, you're working with a rather fixed dataset. If your workload involves many different applications, you're going to be swapping out memory frequently, or perhaps even loading the same dataset between different programs (of which are highly unlikely to share memory).
                  You do realize that operating systems have universally had this thing called "virtual memory" that allow them to use the persistent storage to offload the memory contents of applications not being used onto the disc since at least the 1990s? I shouldn't have to point this out to you, but here we are.

                  No, I didn't, and I hadn't implied it was "impractically huge" either. In fact, I was actually stating that despite its size and frequent access, it hardly demands a high-end SSD.
                  Obviously it's only a fraction of something larger; what kind of business only runs on a single database table? You should know this by now - after all, you are ostensibly an expert. I had to clarify there were more tables because you're so obtuse in your way of thinking that you kept insisting there was only 1. The thing is, the other tables are hardly relevant; the only reason the other tables matter (and even then, not exclusive to the other tables) is that it isn't practical to fit 25GB of just 1 table into RAM. You do know there's more to databases than just running queries, right? An expert would know this. The whole point I'm trying to drill into you is that it doesn't matter how big your dataset gets, because there's a certain point where it makes no difference to the workflow. Of course, this varies dramatically depending on what the workflow is, hence the limited situations where high-performance SSDs make perfect sense on a workstation.
                  Again, you used that 25 GB table as this supposedly impractically huge thing and the fact that you keep harping on about it and being evasive about the rest of the database makes it very clear none of the other tables are anywhere near that size. It's pretty clear that the 32GB of my decade old workstation can fit that whole databse in RAM with room to spare for the OS and basic applications.

                  Even assuming the total database is, say, 500 GB, when you're working on it any well written application will be dynamically caching large chunks of it in RAM rather than just reading trough individual columns on-disc. Any well written application will during compute-heavy processing constantly reading big chunks of it into RAM to speed up access and read/write speeds. With a dateset that big and heavy enough processing the application may end up reading the whole thing into memory more than once over the course of the job.

                  Yeah, parents who unleash their frustration on their child due to their own personal issues, knowing damn well the child is right. We're practically in agreement yet you try so hard to find any possible way you can argue with me.
                  I think it's pretty clear your parents were the overly doting type who'd never correct you when you were wrong and as a result you now can't handle being wrong as an adult.

                  This is laughable. You're the one trying to convince me that shelling out big money for a high-performance NVMe drive is necessary for your line of work but you can't justify 40GB-250GB of temporary low-density sets? Your complete dataset could stored on a separate server (and frankly probably should be), accessed through 10Gbps, and then downsampled locally on your workstation. Worst-case scenario, you do a set of RAID1 HDDs to locally store your complete dataset and then use your working data on an SSD. A configuration like that ought to yield good-enough read performance, it guarantees data integrity, you get oodles of local capacity, all for a low price. Surely, an expert would figure this out.
                  Big money? Maybe you're used to government type tightfisted budgeting, but in the private sector a 500€ drive for a machine that costs 3000€ or more (sans peripherals) used by someone with a pre-tax salary of at least 4000€ is not "big money" by any stretch of the imagination. This is again showing how out of touch you are.

                  Again with the dyslexic straw men; I explicitly wrote that the 250 GB dataset was an example of an almost decade old real world dataset to show that your "huge" 25 GB table wasn't that big in 2023 and the 40 GB was to explain the context that it isn't even a big dataset by today's standards.

                  Your idea that customers should be downsampling high density datasets that cost multiple times their salary to produce is clear proof that you're clearly a know-it-all who doesn't even know how little he really knows. You're not just a bad case, you're an extreme case of Dunning-Kruger.

                  As for looking through a 4K monitor, seems to me you're just spoiled. "Oh no! I can notice some gaps in the pointcloud! How will I ever do my work!?" Cry me a river.
                  Yes, you tell that to paying customers who spent at least 10.000€ just getting a high density scan done and we'll see how long you stay in business. In this business even 50.000€ scans of large areas like that old "big" dataset isn't exactly unheard of.

                  In case you didn't get it; My employer doesn't use that kind of software, we make it.

                  It's pretty clear now that you don't do anything that's customer facing or for customers. Assuming you're employed at all, your work is obviously internal systems and more specifically desktop support. "Have you tried turning it off and on again"

                  As stated before, you're not loading all that data simultaneously, so it doesn't matter how much bigger the production use is.
                  Not all in one go, but in a series of big chunks that can be bigger than that 25GB table. Chunks that will be loaded in and out of memory during runtime. Where a slow drive will inevitably cause stuttering while you're waiting for the next chunk to be read from memory.

                  As I already said, there's more to databases than just running queries. In my particular case, the CPU actually tends to be the bottleneck (not always).
                  If you're CPU-limited then you're either using a badly optimized application or it's not a very IO-heavy workload. Just a moderately sized dataset.

                  So, given your presumptions, it's pretty understandable why you're so quick to mock, because yeah - a workload of nothing but querying 25GB is insignificant even 10 years ago and would be very cheap to put into RAM. But again: in what universe does it make sense where a whole business runs on a single 25GB table?
                  You used that 25GB table from what I can see clearly is a database that isn't multiples of that in total size as an example of something big. I pointed out that it absolutely isn't and hasn't been for a long time. It shouldn't be hard for you to process this, but here we are.

                  The fact you treat your knowledge as so ubiquitous and the only truth just undermines your self-proclaimed expertise.
                  If my knowledge was ubiquitous I wouldn't have to point out that you're wrong about so many things. If you don't believe what I'm saying; You can fact check me and prove me wrong with sources. But you've been unable/unwilling to do so.

                  I used examples of what the user can do to further compress data, proving that there is in fact room for more compression. This is a worst case scenario, hence me pointing out there are purpose-build compression methods for specific types of assets.
                  Maybe you don't know this, but running something trough the exact same compression algorithm multiple times doesn't really further compress files. At best you get a slight extra compression, but most probably you're going to get a slight increase in size.

                  So, if a game compressed via ZLIB can continue to inefficiently be compressed by the user via ZLIB, that shows there is in fact much more room for more optimized compression methods.
                  When a game has already been compressed with zlib, like almost all bigger titles for the better part of a decade, a user trying to run it trough zlib again will not produce results of any significance. At best you reduce the size by few hundred KB at the cost increasing disc access times, more probably you increase the sized by a few hundred KB and also increase disc access times while you're at it.

                  No actually, I didn't suggest it. Didn't imply it either. I said using those methods shows there is more room for compression, of which I am indisputably correct about. Go ahead - try it yourself. Use compression level 9 on a game you know for a fact uses the zlib license. To reiterate: this is not the appropriate way to do it, but just to exemplify more-optimized methods could be done.
                  You literally did so in the previous paragraph. You're absolutely not "indisputably correct" about it when your "proof" was compressing a game from 20 years ago and hence before zlib became an industry standard. You didn't even know games used zlib or how ubiquitous its usage was before I pointed it out to you in this thread.

                  Comment


                  • #49
                    Originally posted by L_A_G View Post
                    Do you even read what you respond to? Servers are NOT good for iterative, real-time and low-latency workflows. I've already pointed out that they don't have the processing power of a workstation. Especially when that server needs to be serving multiple people simultaneously. Workstation jobs are still done on workstations for good reason.
                    Ah, so that's why databases, web hosting, network storage, render farms, physics simulations, etc are done on workstations!
                    /s
                    Uuuh... I explained why you're wrong and you're going "But that proves me right" because you don't have an argument anymore. I've proven your arguments wrong and now you're trying to pretend like the explanations as to why you're wrong are somehow in agreement with you.
                    Your behavior and line of thinking is bizarre. How do I have an argument "anymore" if I wasn't really disagreeing with you in the first place?
                    How do I know this? Because I, unlike you as you've just admitted, pay attention to this stuff. I remember this exact complaint from ages ago and I looked into it. I also remembered it when the current generation of consoles came out and could see that install sized on games went up to about what they are on PC.
                    You have this annoying tendency to look at things in an all-or-nothing perspective. I don't know how many times I have to tell you that not every game has this problem, and that you don't know how every development studio works. But go ahead, feign expertise - it's totally working out for you.
                    Desktop support was my assessment based on your actual level of expertise versus how much you think you know. It's easy to develop a case of Dunning-Kruger like yours when you're working with near computer illiterate people, less so when you're working with highly experienced engineers and people considered "gurus" in the field.
                    You have yet to prove how you are all that special. Just because you claim authority, doesn't make you have any. This is the internet, kid - you're nothing here. Having access to a decent workstation and working with 40-250GB chunks of datasets doesn't make you an expert.
                    You do realize that operating systems have universally had this thing called "virtual memory" that allow them to use the persistent storage to offload the memory contents of applications not being used onto the disc since at least the 1990s? I shouldn't have to point this out to you, but here we are.
                    Exactly: virtual memory is a 1990s thing. Today, there isn't much advantage in it so long as you design your system around your workflow properly. RAM is relatively cheap and abundant. If you depend on virtual memory, you're either being way too cheap, you're doing things wrong/inefficiently or you have a very niche situation where perhaps you only need it very temporarily.
                    Again, you used that 25 GB table as this supposedly impractically huge thing and the fact that you keep harping on about it and being evasive about the rest of the database makes it very clear none of the other tables are anywhere near that size. It's pretty clear that the 32GB of my decade old workstation can fit that whole databse in RAM with room to spare for the OS and basic applications.
                    No... I didn't, and insisting that doesn't make it true, just like insisting you're an expert doesn't make it true. Deliberately ignoring the complete argument does nothing to make you look competent or intelligent. You pick and choose the parts that seem stupid by themselves, but you ignore the rest. So, why should I elaborate on anything else when you have the attention span and memory of a goldfish?
                    Even assuming the total database is, say, 500 GB, when you're working on it any well written application will be dynamically caching large chunks of it in RAM rather than just reading trough individual columns on-disc. Any well written application will during compute-heavy processing constantly reading big chunks of it into RAM to speed up access and read/write speeds. With a dateset that big and heavy enough processing the application may end up reading the whole thing into memory more than once over the course of the job.
                    Once again, you are inadvertently agreeing with me. Kinda amusing.
                    Now, let's say there are hundreds or thousands of smaller tables rather than one big one. Then, sprinkle in some extra processing where you're not just simply doing select/insert statements. That's what my system looks like.
                    So I agree, there absolutely will be large swaths of data being re-read multiple times; adding more RAM could help, but only if I/O wait is a problem (which for me it usually isn't).
                    I think it's pretty clear your parents were the overly doting type who'd never correct you when you were wrong and as a result you now can't handle being wrong as an adult.
                    Says the one with an authority complex.
                    Big money? Maybe you're used to government type tightfisted budgeting, but in the private sector a 500€ drive for a machine that costs 3000€ or more (sans peripherals) used by someone with a pre-tax salary of at least 4000€ is not "big money" by any stretch of the imagination. This is again showing how out of touch you are.
                    And how long do you keep that machine? 1, maybe 2 years? Considering it makes up for most of your salary, that seems rather expensive, relatively speaking. You can't ignore peripherals either; after all, this isn't a server!
                    It's pretty rare for a company to have a looser fist than a government. Spending 500€ extra so you can see more dots would only be agreeable if your boss was clueless.
                    Again with the dyslexic straw men; I explicitly wrote that the 250 GB dataset was an example of an almost decade old real world dataset to show that your "huge" 25 GB table wasn't that big in 2023 and the 40 GB was to explain the context that it isn't even a big dataset by today's standards.
                    None of that changes my point; funny how I'm thought to be the strawman.
                    Your idea that customers should be downsampling high density datasets that cost multiple times their salary to produce is clear proof that you're clearly a know-it-all who doesn't even know how little he really knows. You're not just a bad case, you're an extreme case of Dunning-Kruger.
                    I find your interpretation of my idea confusing; no wonder you think I know nothing.
                    If you like throwing out trendy psychological terms as if it contributes anything: you've got an extreme case of confirmation bias.
                    Not all in one go, but in a series of big chunks that can be bigger than that 25GB table. Chunks that will be loaded in and out of memory during runtime. Where a slow drive will inevitably cause stuttering while you're waiting for the next chunk to be read from memory.
                    If your software loads in 25GB+ chunks during runtime when (in your case) the data points do not depend on one another, then you're not a good developer.
                    I can't wait for you to correct me on how clueless I am about this!
                    You used that 25GB table from what I can see clearly is a database that isn't multiples of that in total size as an example of something big. I pointed out that it absolutely isn't and hasn't been for a long time. It shouldn't be hard for you to process this, but here we are.
                    There you go again, only quoting the parts that are convenient for you.
                    If my knowledge was ubiquitous I wouldn't have to point out that you're wrong about so many things. If you don't believe what I'm saying; You can fact check me and prove me wrong with sources. But you've been unable/unwilling to do so.
                    Exactly, so it stands to reason that you don't know as much as you think. HuRr DuRr Dunning-Kruger!!!
                    I provided examples throughout to back up my claims. The only thing you have to back up yourself is "trust me, I'm an expert", over-generalized claims, and a personal anecdote. I asked for more sources and you didn't provide them. Surely, it should be easy if you're that certain.
                    Maybe you don't know this, but running something trough the exact same compression algorithm multiple times doesn't really further compress files. At best you get a slight extra compression, but most probably you're going to get a slight increase in size.
                    That's why I said to do compression level 9... Obviously, you won't see any noteworthy difference doing the same exact method, and I never suggested that.
                    You literally did so in the previous paragraph. You're absolutely not "indisputably correct" about it when your "proof" was compressing a game from 20 years ago and hence before zlib became an industry standard. You didn't even know games used zlib or how ubiquitous its usage was before I pointed it out to you in this thread.
                    You are as dense as you want your LIDAR points to be.
                    Using zlib at level 9 to further compress an already compressed game is a worst-case scenario example that I don't suggest anyone do. The point is, it manages to shave off more than just a couple percent, that shows how much potential there is for more compression. NOT compression using zlib, but methods optimized for specific assets. Y'know, like WebP vs BMP, or FLAC vs WAV. Go with lossy methods and you can save even more.
                    Seems to me you're a strong case of Dunning-Kruger in this regard.
                    Last edited by schmidtbag; 16 March 2023, 03:56 PM.

                    Comment


                    • #50
                      You guys do realise that in English we have

                      ...

                      We use it to do this;

                      "First line of sentence in quote blahblahblah...
                      ...last line of sentence in quote blahblahblah"

                      Save's us all that wall of text that we have ready read. I'm pretty effing sure your public display lost most people's (including the interested) interest and could have been conducted in private if you didn't want the audience.

                      So, spare MY advice a thought, yeah?

                      ...

                      We might just keep reading. And throw you a bone (like).
                      Hi

                      Comment

                      Working...
                      X