Announcement

Collapse
No announcement yet.

Linux 5.7 Netfilter To See AVX2 Optimizations For Big Performance Boost - Can Be Up To ~420%

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ZeDestructor
    replied
    Originally posted by newwen View Post
    Doesn't executing AVX2 instructions consume a lot of power? If the processing time decreases, it's logical that the total consumed energy for the process to decrease, at the expense of higher peak/intantaneous power consumption, but has anyone tested it?
    It does, but when you're looking at 0-30% more power for 30-420% better throughput, it's still more efficient overall.

    Leave a comment:


  • computerquip
    replied
    Originally posted by Mario Junior View Post

    This BTFO all people says: "but muhhh, the compiler knows how to optimize assembly code better than you do it manually."
    There's obviously going to be cases where hand-optimized assembler is going to be faster. The problem is that it's less readable, harder to understand, much more verbose, and far less portable than C in general. And it *does*, in the vast majority of situations, generate better assembly than you could write by hand.

    Leave a comment:


  • sophisticles
    replied
    Originally posted by Raka555 View Post
    I assume it is hand optimized assembler ?
    All SIMD programming, is done using either assembler or compiler intrinsics, which are basically C style instructions that convert 1-1 to assembler.

    By definition SIMD is hand optimized, unless of course you rely on compiler optimizations.,

    Leave a comment:


  • newwen
    replied
    Doesn't executing AVX2 instructions consume a lot of power? If the processing time decreases, it's logical that the total consumed energy for the process to decrease, at the expense of higher peak/intantaneous power consumption, but has anyone tested it?

    Leave a comment:


  • ssokolow
    replied
    Originally posted by Mario Junior View Post

    This BTFO all people says: "but muhhh, the compiler knows how to optimize assembly code better than you do it manually."
    To be fair, the best solution is sort of a middle-ground: compiler intrinsics.

    The "either use the fast path or fail to compile" of explicitly calling SIMD instructions directly, but without the "compiler has no idea what optimizations are safe to perform on the data flow surrounding this instruction" of assembly.

    That's why Microsoft decided to omit inline assembly support from Visual C++ for 64-bit targets.

    Leave a comment:


  • Mario Junior
    replied
    Originally posted by tiennou View Post

    A look at the commit (https://git.kernel.org/pub/scm/linux...94d765c8eecbe1) points in that direction.
    This BTFO all people says: "but muhhh, the compiler knows how to optimize assembly code better than you do it manually."

    Leave a comment:


  • ldesnogu
    replied
    Originally posted by cbxbiker61 View Post
    It's always welcome to see performance improvements. Optimizing arm would no doubt hit a larger user base with all of the arm based OpenWRT DdWRT routers (I've got 5 routers running OpenWRT and one of them would benefit from netfilter performance improvements).
    It seems the author agrees:
    https://git.kernel.org/pub/scm/linux...94d765c8eecbe1
    A similar strategy could be easily reused to implement specialised versions for other SIMD sets, and I plan to post at least a NEON version at a later time.

    Leave a comment:


  • milkylainen
    replied
    Originally posted by r08z View Post
    This is what ClearLinux does on a regular basis for all libraries and programs with a few simple avx2 instricts patches to help the program make better use of the -march=haswell compiler flag.
    Vectorization and architecture support is not the same as structured optimized assembly for complex data.
    But if the code structure was different, maybe more standard vectorization would have helped.
    Replacing a functions or calls with a hand optimized intrinsic is not the same either.

    Leave a comment:


  • r08z
    replied
    This is what ClearLinux does on a regular basis for all libraries and programs with a few simple avx2 instricts patches to help the program make better use of the -march=haswell compiler flag.

    Leave a comment:


  • Setif
    replied
    I hope It's not about some operations that occur once or twice a day, changed from taking 4.2 ms to 1.0 ms (420%).

    Leave a comment:

Working...
X