Announcement

Collapse
No announcement yet.

Running The Latest Windows 10 vs. Ubuntu Linux OpenGL/Vulkan Benchmarks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    subscribed, this seems like a very interesting thread

    Comment


    • #32
      Unapproved, see next post.
      Last edited by Eliasvan; 09 July 2016, 09:45 AM.

      Comment


      • #33
        Following up my previous post containing several metrics to quantify 'smoothness' and jitter,
        in this post I will list the results of these metrics applied to following test-data:
        https://openbenchmarking.org/result/...SO-1307181SO75 (child of http://www.phoronix.com/scan.php?pag...amp;px=MTQxNDI).
        I'll pay most attention to the "Reaction Quake 3" and "Unvanquished 1920 x 1080" benchmarks, since they are the most interesting ones IMHO.

        Before diving into the results, let's take a look at the distribution of frametime for each benchmark:





        Judging based on the median frametime (percentile = 50%), AMD is the overall winner.
        The same can be said about the average fps: see here and here.

        Now, we'll dive into the results of our candidate 'smoothness' metrics:

        Code:
        Testing method "StdDev of Frametime In Milliseconds, Less is Better":
            5.847    "OpenArena, AMD A10-6800K"
            5.427    "OpenArena, Intel Core i7 4770K"
            6.250    "Reaction Quake 3, AMD A10-6800K"
            2.575    "Reaction Quake 3, Intel Core i7 4770K"
            4.687    "Unvanquished 1680 x 1050, AMD A10-6800K"
            5.847    "Unvanquished 1680 x 1050, Intel Core i7 4770K"
            4.887    "Unvanquished 1920 x 1080, AMD A10-6800K"
            6.061    "Unvanquished 1920 x 1080, Intel Core i7 4770K"
        
        Testing method "StdDev of Inter-frametime Difference In Milliseconds, Less is Better":
            5.528    "OpenArena, AMD A10-6800K"
            6.592    "OpenArena, Intel Core i7 4770K"
            3.929    "Reaction Quake 3, AMD A10-6800K"
            2.615    "Reaction Quake 3, Intel Core i7 4770K"
            1.440    "Unvanquished 1680 x 1050, AMD A10-6800K"
            3.072    "Unvanquished 1680 x 1050, Intel Core i7 4770K"
            1.494    "Unvanquished 1920 x 1080, AMD A10-6800K"
            3.138    "Unvanquished 1920 x 1080, Intel Core i7 4770K"
        
        Testing method "Median Absolute Inter-frametime Difference In Milliseconds, Less is Better":
            1.000    "OpenArena, AMD A10-6800K"
            1.000    "OpenArena, Intel Core i7 4770K"
            3.000    "Reaction Quake 3, AMD A10-6800K"
            2.000    "Reaction Quake 3, Intel Core i7 4770K"
            0.000    "Unvanquished 1680 x 1050, AMD A10-6800K"
            1.000    "Unvanquished 1680 x 1050, Intel Core i7 4770K"
            0.000    "Unvanquished 1920 x 1080, AMD A10-6800K"
            1.000    "Unvanquished 1920 x 1080, Intel Core i7 4770K"
        
        Testing method "Median Rolling StdDev of Frametime In Milliseconds, Less is Better":
            1.194    "OpenArena, AMD A10-6800K"
            2.011    "OpenArena, Intel Core i7 4770K"
            3.420    "Reaction Quake 3, AMD A10-6800K"
            2.152    "Reaction Quake 3, Intel Core i7 4770K"
            1.459    "Unvanquished 1680 x 1050, AMD A10-6800K"
            3.271    "Unvanquished 1680 x 1050, Intel Core i7 4770K"
            1.556    "Unvanquished 1920 x 1080, AMD A10-6800K"
            3.373    "Unvanquished 1920 x 1080, Intel Core i7 4770K"
        
        Testing method "Percentile Low-High Gap of Frametime In Milliseconds, Less is Better":
            25.000    "OpenArena, AMD A10-6800K"
            48.040    "OpenArena, Intel Core i7 4770K"
            35.000    "Reaction Quake 3, AMD A10-6800K"
            13.000    "Reaction Quake 3, Intel Core i7 4770K"
            20.000    "Unvanquished 1680 x 1050, AMD A10-6800K"
            25.000    "Unvanquished 1680 x 1050, Intel Core i7 4770K"
            22.000    "Unvanquished 1920 x 1080, AMD A10-6800K"
            27.000    "Unvanquished 1920 x 1080, Intel Core i7 4770K"
        
        Testing method "Median Rolling Percentile Low-High Gap of Frametime In Milliseconds, Less is Better":
            4.000    "OpenArena, AMD A10-6800K"
            7.000    "OpenArena, Intel Core i7 4770K"
            13.000    "Reaction Quake 3, AMD A10-6800K"
            8.710    "Reaction Quake 3, Intel Core i7 4770K"
            5.000    "Unvanquished 1680 x 1050, AMD A10-6800K"
            12.000    "Unvanquished 1680 x 1050, Intel Core i7 4770K"
            5.710    "Unvanquished 1920 x 1080, AMD A10-6800K"
            12.130    "Unvanquished 1920 x 1080, Intel Core i7 4770K"
        As you can see, although AMD is the best in terms of average fps, the metrics show that in the case of "Reaction Quake 3" Intel is the best in terms of 'smoothness',
        this is also what you could have expected from this graph at OpenBenchmarking.

        Some other interesting things about the difference between the metrics:
        • for the "OpenArena" benchmark, there is a rare case where normally similar metrics "StdDev of Frametime In Milliseconds" and "Percentile Low-High Gap of Frametime In Milliseconds" perform very differently: Intel wins for the former, while AMD takes the lead for the latter;
          arguing from this graph, I think AMD should win IMHO
        • the "Median Absolute Inter-frametime Difference In Milliseconds" metric disappoints me because it fails to differentiate in the "OpenArena" benchmark, and sometimes produces the value zero;
          this can be improved by calculating the average instead of the median, but then the "StdDev of Inter-frametime Difference In Milliseconds" metric is better, because there high inter-frametimes are penalized more (squares) than low inter-frametimes
        • the "Median Rolling StdDev of Frametime In Milliseconds" and "Median Rolling Percentile Low-High Gap of Frametime In Milliseconds" metrics perform relatively the same, with the latter being almost a multiple of 4 of the former;
          note however that the former seems to be more fine grained than the latter, more on that in next paragraphs


        Now that we've had an overview of these six metrics, let's take a closer look at the (IMO) two most interesting metrics: "Median Rolling StdDev of Frametime In Milliseconds" and "Median Rolling Percentile Low-High Gap of Frametime In Milliseconds".
        Here we'll investigate whether using the median is the right choice, and it's also a good opportunity to compare the behavior of both metrics.
        In order to do this, we'll take a look at the metric for each percentile (101 in total), remember that a percentile of 50% is the same as the median, and thus should correspond with previous results.

        "Rolling StdDev of Frametime In Milliseconds, Less is Better"





        "Rolling Percentile Low-High Gap of Frametime In Milliseconds, Less is Better"





        Note how similar the shape of the curves of both metrics are for a given benchmark, but with one important difference: the "Rolling StdDev of Frametime In Milliseconds" metric is a lot smoother than the "Rolling Percentile Low-High Gap of Frametime In Milliseconds" metric.
        If we would choose a higher percentile, say, 95%, we risk capturing the effect of the scene changes, so that's why I think 50% (median) is safer.
        Since the former metric is the smoothest, it is also more stable than the latter metric (i.e., a small change in percentile will yield a small change in the output value), and thus IMHO the preferred metric.

        To conclude, IMO "Median Rolling StdDev of Frametime In Milliseconds, Less is Better" is the overall winner, and it would be nice if this one could be implemented in the Phoronix Test Suite,
        and featured in future Vulkan benchmarks (for the benchmarks that support exposing total frame latencies).
        (I'm willing to do the implementation, but before anything, I'd like to get some feedback.)

        @Michael: if you plan to use one of the rolling window metrics, make sure to carefully set the window size you think fits best for a given benchmark (my recommendation: 30, see previous post).

        For your reference, I published the (quick and dirty) code here:
        - Metric calculation: https://gist.github.com/Eliasvan/3db...rkingjitter-py
        - Plot generation: https://gist.github.com/Eliasvan/26d...arkingjitter-m

        Comment


        • #34
          Related to previous two posts (https://www.phoronix.com/forums/foru...554#post878554 and https://www.phoronix.com/forums/foru...624#post883624),
          take a look at these videos (see video description for relation):




          PS: does anyone of you know why YouTube severely degraded the quality of the first video?
          I used the YouTube profile of SimpleScreenRecorder.

          Comment

          Working...
          X