Originally posted by newwen
View Post
Announcement
Collapse
No announcement yet.
Linux 5.7 Netfilter To See AVX2 Optimizations For Big Performance Boost - Can Be Up To ~420%
Collapse
X
-
-
Originally posted by Mario Junior View Post
This BTFO all people says: "but muhhh, the compiler knows how to optimize assembly code better than you do it manually."
- Likes 1
Leave a comment:
-
Originally posted by Raka555 View PostI assume it is hand optimized assembler ?
By definition SIMD is hand optimized, unless of course you rely on compiler optimizations.,
Leave a comment:
-
Doesn't executing AVX2 instructions consume a lot of power? If the processing time decreases, it's logical that the total consumed energy for the process to decrease, at the expense of higher peak/intantaneous power consumption, but has anyone tested it?
- Likes 1
Leave a comment:
-
Originally posted by Mario Junior View Post
This BTFO all people says: "but muhhh, the compiler knows how to optimize assembly code better than you do it manually."
The "either use the fast path or fail to compile" of explicitly calling SIMD instructions directly, but without the "compiler has no idea what optimizations are safe to perform on the data flow surrounding this instruction" of assembly.
That's why Microsoft decided to omit inline assembly support from Visual C++ for 64-bit targets.
- Likes 2
Leave a comment:
-
Originally posted by tiennou View Post
A look at the commit (https://git.kernel.org/pub/scm/linux...94d765c8eecbe1) points in that direction.
- Likes 1
Leave a comment:
-
Originally posted by cbxbiker61 View PostIt's always welcome to see performance improvements. Optimizing arm would no doubt hit a larger user base with all of the arm based OpenWRT DdWRT routers (I've got 5 routers running OpenWRT and one of them would benefit from netfilter performance improvements).
A similar strategy could be easily reused to implement specialised versions for other SIMD sets, and I plan to post at least a NEON version at a later time.
- Likes 4
Leave a comment:
-
Originally posted by r08z View PostThis is what ClearLinux does on a regular basis for all libraries and programs with a few simple avx2 instricts patches to help the program make better use of the -march=haswell compiler flag.
But if the code structure was different, maybe more standard vectorization would have helped.
Replacing a functions or calls with a hand optimized intrinsic is not the same either.
- Likes 3
Leave a comment:
-
This is what ClearLinux does on a regular basis for all libraries and programs with a few simple avx2 instricts patches to help the program make better use of the -march=haswell compiler flag.
Leave a comment:
-
I hope It's not about some operations that occur once or twice a day, changed from taking 4.2 ms to 1.0 ms (420%).
- Likes 1
Leave a comment:
Leave a comment: