Announcement

Collapse
No announcement yet.

Quick, overall system performance suite?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    I think compiling is a terrible way to test disk scores. You need a way to test the pure disk performance.

    For normal usage random read/write is more important than sequential. So could you limit IOZone to only test 4K random reads/writes? According to the interesting article writthen by Anand from Anandtech, random writes is almost the most noticeable feature of your disk subsystem. So we could give random writes more of an importance.

    I also think using power values doesn't seem quite right. When one compares a score of 3000 to a score of 6000, the "6000" PC is roughly double as fast. Linear scale makes sense...

    A Score of:
    * Fastest thread
    * Total processing power
    * Random Disk performance
    * 2D (Gui) performance
    * 3D performance

    It might be usefull to split 3D into 2 parts, e.g. Simple 3D and Advanced 3D.
    Then Ungine can be used for Advanced 3D, and if it fails for whatever reason a score of 0 is acceptable.

    It would be great to have one number, but the problem with that is that is that it is very misleading. So we need to show the sub-values (e.g. 6 of them) quite prominently.

    Comment


    • #22
      Thanks for the input, grigi. We need more people jump in!

      Originally posted by grigi View Post
      I think compiling is a terrible way to test disk scores. You need a way to test the pure disk performance.

      For normal usage random read/write is more important than sequential. So could you limit IOZone to only test 4K random reads/writes? According to the interesting article writthen by Anand from Anandtech, random writes is almost the most noticeable feature of your disk subsystem. So we could give random writes more of an importance.
      That's an interesting idea (adding a fast, dedicated disk efficiency test). We re trying to focus on real world 9as opposed to synthetic) measurement though, is there any "real world" test that could serve though? Otherwise the idea is interesting, and we may just want to do what you are proposing.

      Regarding the average, did you read the third post in detail in the 1st page of the thread? (and the wikipedia article) I think a geometric mean makes a lot of sense in this case. If all the components are 3 times faster than the baseline system, your score is 3. For systems much faster than the baseline, the arithmetic mean will more evenly consider all components. Consider 4 components, and you get scores 3,8,9,10 (one subsystem is much slower). The geom-mean is 6.82 and the arith-mean 7.50. If you speed up the slow component from 3 to 6, your score jumps 19% in the geom-mean, to 8.11, but only 10% in the arith-mean. Still, we could use arith-mean if everyone feels it's best.

      Coming up with a single number is always arbitrary, but useful to have an idea of where your system stands. Regardless, yes, the idea is to show all the individual scores.

      One of the goals is to keep the whole test as fast as possible, two 3D tests would probably be overkill, no?

      Many thanks!
      Last edited by mendieta; 14 April 2009, 11:31 AM.

      Comment


      • #23
        Just to be very specific:

        Originally posted by grigi View Post

        I also think using power values doesn't seem quite right. When one compares a score of 3000 to a score of 6000, the "6000" PC is roughly double as fast. Linear scale makes sense...
        The geometric mean does scale linearly if all components scale linearly. You get that each factor is, say, twice as fast, so you score is multipled by power(2 ^ n, 1/n) = 2. With four components, the new score (with all components twice as fast is multiplied by the fourth root of 2^4, that is 2. Maybe the notation is not clear, the wikipedia article is nicer :-)

        Comment


        • #24
          Eh, sorry. I missed that. A geometric mean makes more sense, since we want to favour lower scores (lower scores tend to indicate some bottleneck, and us users notice the bottlenecks)

          The reason I'm thinking of 2 3D tests is that Ungine doesn't run on any of the opensource drivers at the moment, but the overall gaming experience isn't too bad. Maybe we should get one 3D app that can fall-back to less features, but not at the expense of quality (not going to happen).

          Comment


          • #25
            Originally posted by mendieta View Post
            Questions:

            [1] Does the composite aggregate all the individual tests od scimark2? That would be best.
            [2] The regular test for build-apache builds it 3 times, way too long for this. Can we build it just once in this test?
            [3] Draw circle may be limited. Can we run all "draw" tests sequentially and aggregate? (maybe not)
            [4] What do we do in cases where Unigine fails? (older cards). Maybe we should only show the cpu score in that case.

            I think we are getting there. Best!
            1. Yes, well, internally it does that I believe. The scimark2 composite option is within the Scimark2 program itself, but I believe that's how it roughly behaves.

            2. I could add in a force option quite easily, but I'll need to think whether it's the right thing to do since just one run could be inaccurate in some cases.

            4. Fallback to reporting 0 for graphics or something.
            Michael Larabel
            https://www.michaellarabel.com/

            Comment


            • #26
              Originally posted by grigi View Post
              The reason I'm thinking of 2 3D tests is that Ungine doesn't run on any of the opensource drivers at the moment, but the overall gaming experience isn't too bad. Maybe we should get one 3D app that can fall-back to less features, but not at the expense of quality (not going to happen).
              I agree, somewhere in the thread I proposed something similar. If we go the way you propose, we should have a "normal" score (using Unigen for 3D) and a "legacy" score when using the legacy 3D test. Do you have any suggestions for a legacy test (the lighter and quicker the better)

              Comment


              • #27
                Originally posted by grigi View Post
                For normal usage random read/write is more important than sequential. So could you limit IOZone to only test 4K random reads/writes? According to the interesting article writthen by Anand from Anandtech, random writes is almost the most noticeable feature of your disk subsystem. So we could give random writes more of an importance.
                Do you have a link to the article? Would be useful! It strikes me that lots of small random r/w will use mostly disk cache for the writes (not for the reads) ... or is it that if you push the disk to the limit it is unable to use the cache?

                Also: I am by no means a guru. But in my overclocking experience a few years back with my current system I used a compilation test to measure progress, and it seemed clear to me that the disk was the bottleneck. Of course you are reading/writing files all the time in a build, but it may be that all these read/writes are mostly using the pretty fast cache of the disk. In the end, we care about the disk in terms of how it slows down loading a game, booting up, compiling (if it has an effect), etc ...

                Again, we might as well use a synthetic test for disk. The discussion itself is fun anyways :-)

                Comment


                • #28
                  A little more info on compilation and disk performance. This guy finds a 20% speedup by compiling in RAM:



                  The disk clearly speeds up compilation but maybe not that much (RAM is like an infinitely fast disk, and it only gives you around 20%)

                  Comment


                  • #29
                    But that is the point, Disk may be a bottleneck if it is TOO SLOW, but once it is fast enough (or the CPU is slow enough) it doesn't matter anymore.

                    Hence the compiling test does not scale with disk performance past a certain point.

                    A "reccomended" benchmark should be able to scale indefnitely.

                    Comment


                    • #30
                      Anandtech article:


                      It is a very long article, but very educational. Read it all.

                      Comment

                      Working...
                      X