Announcement

Collapse
No announcement yet.

PathScale EKOPath 5.0 Beta Compiler Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • janisozaur
    replied
    Hi,
    I know it's quite and old thread, but what ever happened with github repositories? They seem to not have been touched in year or so, is EkoPath no longer open source?
    Thanks

    Leave a comment:


  • pszilard
    replied
    GROMACS 4.6 does not compile

    Originally posted by codestr0m View Post
    Thanks for posting this!
    ----------------------------
    I'm curious if anyone else reading the forums can post benchmarks using their own codes + processor/system details.
    Just tried to compile GROMACS with AVX-256 CPU acceleration, but the 02-11 nightly choked on our SIMD intrinsics kernels:

    Code:
    $ make
    [  0%] Generating version information
    [  0%] Built target gmx_version
    [  0%] Building C object src/gmxlib/CMakeFiles/gmx.dir/nonbonded/nb_kernel_avx_256_single/nb_kernel_ElecCoul_VdwCSTab_GeomW3W3_avx_256_single.c.o
    Signal: Segmentation fault in Code_Expansion phase.
    Error: Signal Segmentation fault in phase Code_Expansion -- processing aborted
    *** Internal stack backtrace:
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0xcfbb51]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0xcfcd99]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be(ErrMsg_Report+0x55) [0xcfaa7d]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be(ErrMsgLine+0xbe) [0xcfac26]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0xcfd58c]
        /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x2afb0d5a54a0]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x9c5c5b]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x9c588a]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x9c47d4]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x9cda50]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x9c4348]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x9c1b3d]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x9c18ea]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be(_Z20Convert_WHIRL_To_OPsP2WN+0x139) [0x9c09e5]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be(CG_Generate_Code+0x279) [0x8e51cb]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x7346d9]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x733b7b]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be(main+0x55f) [0x732dbb]
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x2afb0d59076d]
        /opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be() [0x732629]
    pathcc ERROR: execute '/opt/ekopath-5.0.0_nightly-2013-02-11/lib/5.0.0/x8664/be' failed: Died due to unknown signal
    Note that I was compiling current git GROMACS 4.6 (release-4-6 branch) with default configuration on a Sandy Bridge machine. Feel free to drop me a mail if you need help with compilation or testing.

    PS: And if I disable AVX and fall back to SSE4.1 the code compiles, but (mdrun) segv-s immediately after startup.
    Last edited by pszilard; 13 February 2013, 12:42 AM.

    Leave a comment:


  • trueblue
    replied
    installation needs license file

    When I try and do the installation using the gui interface, it asks for a license file. How do I get one?

    Leave a comment:


  • XorEaxEax
    replied
    Happy to see some info on EKOPath, been quiet since the open source announcement. As for the results, I seem to recall that EKOPath was optimized for AMD cpu's or am I mistaken?

    As for the tests, again why are there so many tests where there are no optimization levels declared, like SCIMARK for example, it's impossible to draw any worthwhile conclusions from those tests, for all we know they could be done at -O0.

    Looking at the tests where we do have an optimization setting (hence tests which are of any interest), Ekopath seems to do quite well with the exception of the BLAKEv2 test where it does horribly, and to a lesser extent the Himeno benchmark.

    Again, can Michael please fix the benchmarks so that they declare optimization level for all tests, else they are of little interest as we don't know what level is being compared. Using -O3 across the board would be the obvious choice if only one optimization level is used per benchmark (as is the case here).

    Leave a comment:


  • codestr0m
    replied
    Originally posted by ChrisXY View Post
    Well, if it would work I wouldn't want to set it manually. .
    I'll use the CPU info you provided and see if we can get both sets of bugs fixed in the driver. Give us a couple days and hopefully I remember to reply to this thread once it's fixed. Alternatively, pull another nightly in a week or few days and yell if it's not. (Squeaky wheel)

    Leave a comment:


  • ChrisXY
    replied
    Originally posted by codestr0m View Post
    The correct way to -march=auto or -march=native is to not set this at all. EKOPath/ENZO unlike other compilers automatically pick the best CPU profile for the current host system. If you need to target another system is when you should use those. (I think we may have a bug here and I'll double check)

    We switched over to CPUID instead of parsing /proc/cpuinfo - can you give some output of your /proc/cpuinfo and the processor info. /* This is one of those areas where I'd like the most feedback. */
    Well, if it would work I wouldn't want to set it manually.

    I have 8 of these:
    Code:
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 58
    model name      : Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz
    stepping        : 9
    microcode       : 0x13
    cpu MHz         : 1200.000
    cache size      : 6144 KB
    physical id     : 0
    siblings        : 8
    core id         : 0
    cpu cores       : 4
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 13
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
    bogomips        : 4391.75
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    edit: by cpu information you mean that? http://pastebin.com/4TbSia7Q

    Originally posted by codestr0m View Post
    About AVX - With the exception of a corner case on AMD - Please tell me where it would be a performance win compared to SSE4.1/4.2. With that in mind we've disabled it by default and in fact AVX can cause performance *degradation* if not used properly. For more information on this please reference Agner's work on CPU instruction timing data.
    Thanks for the explanation. I wasn't aware of that.

    Originally posted by codestr0m View Post
    bdver1 doesn't support 3DNOW - that was dropped
    Then all is good.

    Originally posted by codestr0m View Post
    Lastly - sorry about the manpage location - Most of our users use --prefix when installing and never use a "default".
    Actually I used --prefix=/usr
    It's usually in /usr/share/man/ somewhere.
    Last edited by ChrisXY; 09 February 2013, 03:34 PM.

    Leave a comment:


  • codestr0m
    replied
    Originally posted by ChrisXY View Post
    Well, a noticeable improvement is that it does not return with an error with -march=native.

    But "-march=native" is not recognized and as all nonrecognized march parameters it activates the generic profile:
    Code:
    /usr/lib/5.0.0/x8664/ipl -VHO:rotate -LIST:source=off:notes=off -PHASE:p:i -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -show -LANG:=ansi_c -TARG:abi=n64 -TARG:processor=generic -TARG:sse=on -TARG:sse2=on -TARG:sse3=off -TARG:ssse3=off -TARG:sse4a=off -TARG:sse4_1=off -TARG:sse4_2=off -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=off -TARG:pclmul=off -TARG:3dnow=off -fB,/tmp/pathcc-B-1934caf9.B -fp,hello.o hello.c -cmds pathcc -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -TARG:abi=n64 -TARG:processor=generic -TARG:sse=on -TARG:sse2=on -TARG:sse3=off -TARG:ssse3=off -TARG:sse4a=off -TARG:sse4_1=off -TARG:sse4_2=off -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=off -TARG:pclmul=off -TARG:3dnow=off
    The correct way to autochoose the cpu is -march=auto:
    "-march=auto -Ofast"
    Code:
    /usr/lib/5.0.0/x8664/ipl -VHO:rotate -LIST:source=off:notes=off -PHASE:p:i -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -show -LANG:=ansi_c -TARG:abi=n64 -TARG:processor=pentium4 -TARG:sse=on -TARG:sse2=on -TARG:sse3=on -TARG:ssse3=on -TARG:sse4a=off -TARG:sse4_1=on -TARG:sse4_2=on -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=on -TARG:pclmul=off -TARG:3dnow=off -fB,/tmp/pathcc-B-19683af2.B -fp,hello.o hello.c -cmds pathcc -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -TARG:abi=n64 -TARG:processor=pentium4 -TARG:sse=on -TARG:sse2=on -TARG:sse3=on -TARG:ssse3=on -TARG:sse4a=off -TARG:sse4_1=on -TARG:sse4_2=on -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=on -TARG:pclmul=off -TARG:3dnow=off
    Slightly better, it builds for SSE3 but for Pentium 4?! This is a ivy bridge mobile cpu, i7 3632qm! If you could just copy & paste the cpu recognition from another compiler, that would be great.


    The installer installs the manpages to /usr/docs/man/man1/ which is not in the man search path on archlinux, but I don't know about other systems. But it seems nonstandard to me. Use "man -l /file" to open files directly with man.
    Code:
           -march=<cpu-type>
                   (For x86) Compiler will optimize code for the selected cpu type: opteron, opteron-sse3, xeon, em64t, nocona, prescott, core, core2, wolfdale, harpertown, nehalem, barcelona, shanghai, istanbul, sandy, bdver1, auto.  auto means to optimize for the host platform that the compiler is running  on.   Core  refers  to  the
                  Intel Core Microarchitecture, used by 64-bit CPUs such as Woodcrest.  The default is auto.
    It seems none of the cpu profiles, even bdver1 enable the use of avx by default. In fact it says
    Code:
    pathcc -o hello_pathcc hello.c -march=bdver1 -O3 -mavx -show
    pathcc ERROR: Target processor does not support AVX.
    I am not so proficient what exactly is supported in which cpus, but I thought bulldozer supported avx right from the beginning?

    So the closest for me would probably be using -march=sandy -Ofast and perhaps -mavx, -mxop, -maes, -mpclmul.
    Unfortunately sandybridge did not support fma and xop so I can't activate it directly.

    Intel's cpus don't support 3dnow but I saw that the parameter to activate 3dnow is not documented in the manpage (it's pretty clear that it's -m3dnow though. It says it's not supported for bdver1, by the way, not sure if this is right).
    The correct way to -march=auto or -march=native is to not set this at all. EKOPath/ENZO unlike other compilers automatically pick the best CPU profile for the current host system. If you need to target another system is when you should use those. (I think we may have a bug here and I'll double check)

    We switched over to CPUID instead of parsing /proc/cpuinfo - can you give some output of your /proc/cpuinfo and the processor info. /* This is one of those areas where I'd like the most feedback. */

    About AVX - With the exception of a corner case on AMD - Please tell me where it would be a performance win compared to SSE4.1/4.2. With that in mind we've disabled it by default and in fact AVX can cause performance *degradation* if not used properly. For more information on this please reference Agner's work on CPU instruction timing data.

    bdver1 doesn't support 3DNOW - that was dropped

    The closest CPU recognition we may be able to get "inspiration" from would be libav and their cpu check stuff.

    Lastly - sorry about the manpage location - Most of our users use --prefix when installing and never use a "default".

    Leave a comment:


  • mattst88
    replied
    There isn't much to see out of Parallel BZIP2 Compression.
    Are you building only pbzip2 with the various compilers? pbzip2 is sort of just a front end for libbzip2, which is where the work actually happens.

    Leave a comment:


  • ChrisXY
    replied
    Well, a noticeable improvement is that it does not return with an error with -march=native.

    But "-march=native" is not recognized and as all nonrecognized march parameters it activates the generic profile:
    Code:
    /usr/lib/5.0.0/x8664/ipl -VHO:rotate -LIST:source=off:notes=off -PHASE:p:i -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -show -LANG:=ansi_c -TARG:abi=n64 -TARG:processor=generic -TARG:sse=on -TARG:sse2=on -TARG:sse3=off -TARG:ssse3=off -TARG:sse4a=off -TARG:sse4_1=off -TARG:sse4_2=off -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=off -TARG:pclmul=off -TARG:3dnow=off -fB,/tmp/pathcc-B-1934caf9.B -fp,hello.o hello.c -cmds pathcc -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -TARG:abi=n64 -TARG:processor=generic -TARG:sse=on -TARG:sse2=on -TARG:sse3=off -TARG:ssse3=off -TARG:sse4a=off -TARG:sse4_1=off -TARG:sse4_2=off -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=off -TARG:pclmul=off -TARG:3dnow=off
    The correct way to autochoose the cpu is -march=auto:
    "-march=auto -Ofast"
    Code:
    /usr/lib/5.0.0/x8664/ipl -VHO:rotate -LIST:source=off:notes=off -PHASE:p:i -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -show -LANG:=ansi_c -TARG:abi=n64 -TARG:processor=pentium4 -TARG:sse=on -TARG:sse2=on -TARG:sse3=on -TARG:ssse3=on -TARG:sse4a=off -TARG:sse4_1=on -TARG:sse4_2=on -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=on -TARG:pclmul=off -TARG:3dnow=off -fB,/tmp/pathcc-B-19683af2.B -fp,hello.o hello.c -cmds pathcc -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -TARG:abi=n64 -TARG:processor=pentium4 -TARG:sse=on -TARG:sse2=on -TARG:sse3=on -TARG:ssse3=on -TARG:sse4a=off -TARG:sse4_1=on -TARG:sse4_2=on -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=on -TARG:pclmul=off -TARG:3dnow=off
    Slightly better, it builds for SSE3 but for Pentium 4?! This is a ivy bridge mobile cpu, i7 3632qm! If you could just copy & paste the cpu recognition from another compiler, that would be great.


    The installer installs the manpages to /usr/docs/man/man1/ which is not in the man search path on archlinux, but I don't know about other systems. But it seems nonstandard to me. Use "man -l /file" to open files directly with man.
    Code:
           -march=<cpu-type>
                   (For x86) Compiler will optimize code for the selected cpu type: opteron, opteron-sse3, xeon, em64t, nocona, prescott, core, core2, wolfdale, harpertown, nehalem, barcelona, shanghai, istanbul, sandy, bdver1, auto.  auto means to optimize for the host platform that the compiler is running  on.   Core  refers  to  the
                  Intel Core Microarchitecture, used by 64-bit CPUs such as Woodcrest.  The default is auto.
    It seems none of the cpu profiles, even bdver1 enable the use of avx by default. In fact it says
    Code:
    pathcc -o hello_pathcc hello.c -march=bdver1 -O3 -mavx -show
    pathcc ERROR: Target processor does not support AVX.
    I am not so proficient what exactly is supported in which cpus, but I thought bulldozer supported avx right from the beginning?

    So the closest for me would probably be using -march=sandy -Ofast and perhaps -mavx and -mpclmul.
    Unfortunately sandybridge did not support fma and xop so I can't activate it directly. Are there any real cpu specific optimizations or is it just for choosing which instructions to use (i.e. generic with all the supported stuff enabled one by one being equally good)?

    Intel's cpus don't support 3dnow but I saw that the parameter to activate 3dnow is not documented in the manpage (it's pretty clear that it's -m3dnow though. It says it's not supported for bdver1, by the way, not sure if this is right).

    The benchmark is not that good I think, because it very probably uses the generic cpu build profile (Michael is using an Ivy Bridge cpu too). It may be fair in that gcc is set to the generic build profile too but that's not really where ekopath is supposed to shine, right?
    Last edited by ChrisXY; 09 February 2013, 03:19 PM.

    Leave a comment:


  • codestr0m
    replied
    Originally posted by Cyborg16 View Post
    So the real reason to be excited about EKOPath is automatic GPGPU usage? I am involved in some "scientific computing" but so far haven't had a reason to use anything other than GCC and clang.
    s/EKOPath/ENZO/g
    ---------
    I'm biased, but I'd certainly recommend you test EKOPath and Intel compilers if you don't have a GPU. If you can get access to a system with a GPU (Tesla 2050, 2070 or 2090) *and* you're willing to add some pragma or directives to your code ENZO may be interesting. (The performance gains can be well worth the effort) We're working on support for -autogpu which like autovectorization or other automatic optimizations requires zero code changes. This isn't ready for production and just "noteworthy" at this point. (Honestly, give us a couple more months)

    Leave a comment:

Working...
X