Announcement

Collapse
No announcement yet.

Phoronix testing methodology proposal

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Phoronix testing methodology proposal

    I simply love this site. I've been a long time reader and only recently became an active member, even though its been a few months since I registered I had been a News reader for a while.

    However, there had always been one thing that "bothers" me... The testing methodology is rather inconsistent. I know, I know... In Linux we do not have a nice set of benchmark applications that test every possible aspect of the system as our Windows brethren have. Which is why we have to be more clever about it and how do we test Linux in our systems. Even without the plethora of benchmarking applications Windows has, we know Linux offers enough tools to (if not completely accurately, save for some tools) give a general impression of what could you expect from a given setup of software and hardware combinations. Without further ado, here's how I see ideally Linux should be tested:

    First divide the tests into different categories (I know system testing/benchmarking is quite time consuming, and with what you're about to read, even more so), the way I see it, at least three: System, gaming, special cases.

    As you would expect the bulk of the performance bottle necks will be located in the "System" category, so here's what I think could be tested (and stuff for which we do have at least some tools), needless to say the following are items you'd expect to see with normal system use.
    • File compression/decompression. This is one common task many users perform, even if they don't know they do. Depending on the distribution, most users do update and install new software. I see this as part of that. Some distros even dynamically compress their logs so they wouldn't end up being too bulky.
    • CPU speed. Not a real test, but most of us know that Linux performs a quick and dirty speed check, and thus we have the BogoMips, seriously though, /proc/cpuinfo has quite a lot of useful stuff about the CPUs.
    • RAM. Fortunately we have some *real* synthetic tests at our disposal: RAMspeed. This is a rather tiresome application to use, as it has quite a bit of options to test, however Michael here at Phoronix has been using it for quite some time and seems to understand the tool better than I do
    • HDD access speed.. It may not reflect real-life usage, but it is darn close, hdparm offers quite a good tool to give you an idea of what you can expect from your new shiny 1Tb drive
    • USB access. I'm not sure how to do this, but I remember there being a tool that helped calculating USB throughput. As per this post to the KernelNewbies mailing list, it should to at least be possible to measure the USB throughput.
    • Optical drives access. Maybe not a critical aspect of any system, especially by today's standards, but hdparm yields some interesting data about these. Important for:
    • Optical drives burn speed. It is not uncommon to see optical drives not reaching their top speeds when burning, and (in my experience, at least) it is very dependent on the kernel you may be running. Some kernels behave better than others when burning, I seldom see my DVD drive reach full 16X write speed.
    • Boot time. For many of us not relevant (in Linux you don't have to reboot, except for a kernel upgrade, and with kexec, you may never see your BIOS post ever again ), but with some rather nifty stuff in the horizon (loading services in tandem, rather than sequentially, and other good stuff) the boot process stresses not only the number of services, but also the HDD and memory subsystems (systems with faster memory boot faster than the same systems with slower memory).
    • Compression of digital audio/video files. It is certainly not uncommon for anyone to rip your favorite tracks off the CD into your HDD either as OGG-Vorbis, MP3 or FLAC. Some people also compress small videos taken on their camcorders to send to family and friends via e-mail. These are resource intensive operations, but I place them here (and not in the special cases) due to the "domestic" use. This test is revisited more seriously in the special cases. This test could also serve for multitasking, especially with apps which rip and encode at the same time (e.g. Grip) and put the system into quite some stress.
    • System responsiveness. It may not be the case any more, especially with preemptive kernels, but it is always nice to see how snappy does the system "feels" while under load. This is a rather subjective matter, but if a system manages to stay "snappy" under load conditions, it'll yield a better user experience.
    • 2D acceleration, there's those tasks many of us sit by hours during our day where what we have to do with the computer requires a lot of text scrolling, window swapping, etc, and even with the advent of nice GUI enhancements like Compiz, many of those tasks are inherently drwan in 2D by the 2D renderer (for example firefox scrolling with fglrx and AIGLX), I'm not sure how could this be tested. I remember there's a simple 2D "benchmark" in X that tests text antialiasing and a few other parameters, but I'm not sure it is part of X in recent releases.
    • XVideo performance. While we are at 2D acceleration testing, what about video playback performance? I'm not sure how to more or less accurately test this.
    • Networking speed/latency. We know that in the best of Unix traditions, Linux has a superb networking stack, but this varies from system to system and from NIC to NIC (drivers mostly)
    • Given that these are test representing normal use, what about Presentation playback (PP[T/S]/ODP)? This is actually (IMO) one of the low points of OOo, and if you're like me, friends and family constantly share with you pointless, useless, yet funny .pp[t/s] or (if educated) .odp files. I don't know what are the factors behind OO0's Impress low performance while doing presentations, but this sure seems like a test!


    Gaming testing is actually one the most prominent areas of tests conducted here at Phoronix. Most of us know (by years and years of losing countless hours in front of the monitor) that different games stress different aspects of our systems. There are games which are CPU bound, in that they require that the CPU do a lot of work, others which are memory bound, in that the speed of this determines how fluid the animation or effects might be (for instance, how quickly your character responds to a weapon pickup/change in a heavy fire fight in an FPS), network bound (multiplayer games are heavily impacted by the link latency, and some games cap the FPS with the network "ticks", so in some cases a poor connection means poor performance). So here's what I propose to check besides how well does the application run:
    • General game performance. I don't think further explanations are needed.
    • System load while running the game. Could be determined with the aid of a script, for instance from the game launching script to determine system load at given intervals while the game is running (could yield nice graphics about when the game taxes the system the most).
    • Network latency. Some games even provide their diagnostic tools.
    • gl2Benchmark. Even though it is still in early stages of development, it might prove quite useful for testing the performance of several "gaming" scenarios, plus graphics/game-engine features (pixel/vertex shaders, physics, etc).


    I want to believe that all of us who use Linux are special, and that Linux itself is special, and as such there are special cases where Linux and High Performance Computing meet. Be it visually or on servers, some tasks are simply not run on a regular basis...
    • SPECVIEWPERF. The defacto Professional Graphics Industry standard benchmark. Need I say more?
    • 3D Rendering:
      1. POV-Ray
      2. Blender
    • Professional audio/video compression
      1. Codification of long audio tracks.
      2. Multipass codification of video.
    • Professional grade backup, using tar and pbzip2 instead of regular bzip2. As per the [url=http://compression.ca/pbzip2/]web site/url]:
      PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 or newer
    • Inkscape. Running various resource hungry effects to complex drawings (blur, etc). Remember that SVGs are dynamically "rendered", could be a good 2D/3D benchmark when Inkscape adopts HW accerated Cairo renderer.
    • GIMP. Applying several resource hungry filters and effects to large images, opening very large images (GIMP suffers a LOT when it is memory constraint).


    Don't mind these suggestions much, it is 4 in the morning, and had to get this out of my chest. There could also be a Server category, with stuff like *SQL seek, PHP and Ruby processing, etc... It would be interesting if some of the more demanding tests are affected at all by running them in 32-bit Linux or 64-bit Linux.

  • #2
    Originally posted by Thetargos View Post
    I simply love this site. I've been a long time reader and only recently became an active member, even though its been a few months since I registered I had been a News reader for a while.

    However, there had always been one thing that "bothers" me... The testing methodology is rather inconsistent. I know, I know... In Linux we do not have a nice set of benchmark applications that test every possible aspect of the system as our Windows brethren have. Which is why we have to be more clever about it and how do we test Linux in our systems. Even without the plethora of benchmarking applications Windows has, we know Linux offers enough tools to (if not completely accurately, save for some tools) give a general impression of what could you expect from a given setup of software and hardware combinations. Without further ado, here's how I see ideally Linux should be tested:

    First divide the tests into different categories (I know system testing/benchmarking is quite time consuming, and with what you're about to read, even more so), the way I see it, at least three: System, gaming, special cases.

    As you would expect the bulk of the performance bottle necks will be located in the "System" category, so here's what I think could be tested (and stuff for which we do have at least some tools), needless to say the following are items you'd expect to see with normal system use.
    • File compression/decompression. This is one common task many users perform, even if they don't know they do. Depending on the distribution, most users do update and install new software. I see this as part of that. Some distros even dynamically compress their logs so they wouldn't end up being too bulky.
    • CPU speed. Not a real test, but most of us know that Linux performs a quick and dirty speed check, and thus we have the BogoMips, seriously though, /proc/cpuinfo has quite a lot of useful stuff about the CPUs.
    • RAM. Fortunately we have some *real* synthetic tests at our disposal: RAMspeed. This is a rather tiresome application to use, as it has quite a bit of options to test, however Michael here at Phoronix has been using it for quite some time and seems to understand the tool better than I do
    • HDD access speed.. It may not reflect real-life usage, but it is darn close, hdparm offers quite a good tool to give you an idea of what you can expect from your new shiny 1Tb drive
    • USB access. I'm not sure how to do this, but I remember there being a tool that helped calculating USB throughput. As per this post to the KernelNewbies mailing list, it should to at least be possible to measure the USB throughput.
    • Optical drives access. Maybe not a critical aspect of any system, especially by today's standards, but hdparm yields some interesting data about these. Important for:
    • Optical drives burn speed. It is not uncommon to see optical drives not reaching their top speeds when burning, and (in my experience, at least) it is very dependent on the kernel you may be running. Some kernels behave better than others when burning, I seldom see my DVD drive reach full 16X write speed.
    • Boot time. For many of us not relevant (in Linux you don't have to reboot, except for a kernel upgrade, and with kexec, you may never see your BIOS post ever again ), but with some rather nifty stuff in the horizon (loading services in tandem, rather than sequentially, and other good stuff) the boot process stresses not only the number of services, but also the HDD and memory subsystems (systems with faster memory boot faster than the same systems with slower memory).
    • Compression of digital audio/video files. It is certainly not uncommon for anyone to rip your favorite tracks off the CD into your HDD either as OGG-Vorbis, MP3 or FLAC. Some people also compress small videos taken on their camcorders to send to family and friends via e-mail. These are resource intensive operations, but I place them here (and not in the special cases) due to the "domestic" use. This test is revisited more seriously in the special cases. This test could also serve for multitasking, especially with apps which rip and encode at the same time (e.g. Grip) and put the system into quite some stress.
    • System responsiveness. It may not be the case any more, especially with preemptive kernels, but it is always nice to see how snappy does the system "feels" while under load. This is a rather subjective matter, but if a system manages to stay "snappy" under load conditions, it'll yield a better user experience.
    • 2D acceleration, there's those tasks many of us sit by hours during our day where what we have to do with the computer requires a lot of text scrolling, window swapping, etc, and even with the advent of nice GUI enhancements like Compiz, many of those tasks are inherently drwan in 2D by the 2D renderer (for example firefox scrolling with fglrx and AIGLX), I'm not sure how could this be tested. I remember there's a simple 2D "benchmark" in X that tests text antialiasing and a few other parameters, but I'm not sure it is part of X in recent releases.
    • XVideo performance. While we are at 2D acceleration testing, what about video playback performance? I'm not sure how to more or less accurately test this.
    • Networking speed/latency. We know that in the best of Unix traditions, Linux has a superb networking stack, but this varies from system to system and from NIC to NIC (drivers mostly)
    • Given that these are test representing normal use, what about Presentation playback (PP[T/S]/ODP)? This is actually (IMO) one of the low points of OOo, and if you're like me, friends and family constantly share with you pointless, useless, yet funny .pp[t/s] or (if educated) .odp files. I don't know what are the factors behind OO0's Impress low performance while doing presentations, but this sure seems like a test!


    Gaming testing is actually one the most prominent areas of tests conducted here at Phoronix. Most of us know (by years and years of losing countless hours in front of the monitor) that different games stress different aspects of our systems. There are games which are CPU bound, in that they require that the CPU do a lot of work, others which are memory bound, in that the speed of this determines how fluid the animation or effects might be (for instance, how quickly your character responds to a weapon pickup/change in a heavy fire fight in an FPS), network bound (multiplayer games are heavily impacted by the link latency, and some games cap the FPS with the network "ticks", so in some cases a poor connection means poor performance). So here's what I propose to check besides how well does the application run:
    • General game performance. I don't think further explanations are needed.
    • System load while running the game. Could be determined with the aid of a script, for instance from the game launching script to determine system load at given intervals while the game is running (could yield nice graphics about when the game taxes the system the most).
    • Network latency. Some games even provide their diagnostic tools.
    • gl2Benchmark. Even though it is still in early stages of development, it might prove quite useful for testing the performance of several "gaming" scenarios, plus graphics/game-engine features (pixel/vertex shaders, physics, etc).


    I want to believe that all of us who use Linux are special, and that Linux itself is special, and as such there are special cases where Linux and High Performance Computing meet. Be it visually or on servers, some tasks are simply not run on a regular basis...
    • SPECVIEWPERF. The defacto Professional Graphics Industry standard benchmark. Need I say more?
    • 3D Rendering:
      1. POV-Ray
      2. Blender
    • Professional audio/video compression
      1. Codification of long audio tracks.
      2. Multipass codification of video.
    • Professional grade backup, using tar and pbzip2 instead of regular bzip2. As per the [url=http://compression.ca/pbzip2/]web site/url]:
    • Inkscape. Running various resource hungry effects to complex drawings (blur, etc). Remember that SVGs are dynamically "rendered", could be a good 2D/3D benchmark when Inkscape adopts HW accerated Cairo renderer.
    • GIMP. Applying several resource hungry filters and effects to large images, opening very large images (GIMP suffers a LOT when it is memory constraint).


    Don't mind these suggestions much, it is 4 in the morning, and had to get this out of my chest. There could also be a Server category, with stuff like *SQL seek, PHP and Ruby processing, etc... It would be interesting if some of the more demanding tests are affected at all by running them in 32-bit Linux or 64-bit Linux.
    As we can see, performance evaluating is a complex subject. Your proposal is a good direction except...

    there is no unified characteristic to performance, that's why we have to start from our goals. If methodology will have too much parameters, it would be error-prone to evaluate performance.
    So, i would recommend all users into groups --
    HPC users,
    enterprise users,
    home users.
    These categories you can divide more precisely.
    Than one should assign typical tasks and parameters, which are valueable for each group. After that hardware testing should be done according to the tasks of the group.
    P.S. idea with GIMP is very interesting

    Comment

    Working...
    X