Originally posted by PuckPoltergeist
View Post
The Code submitted is LGPL https://en.wikipedia.org/wiki/GNU_Le...Public_License so while it is likely optimized in Intel's favor anyone is free to rewrite it in any manner they choose (including optimizing it for a specific Architecture).
It's up to the CPU from AMD, ARM, Intel (etc.) to eat whatever it is given and execute it as quickly as possible. Any Code however it is written that runs much slower on one particular Architecture and much faster on another makes the faster CPU the winner (on speed alone, not necessarily Wattage or Bang/Watts).
It's up to the Programmer to make the Code short (particularly for ARM) and fast, it's up to the CPU to whip through it quickly and power efficiently.
Indeed specialized Libraries are developed for specific applications where some Loops are unrolled and some Instructions reordered (not caring about size, readability, or anything except speed) while other portions are written to conserve Memory or to be particularly easy to understand - it depends upon the goal.
Usually speed is the priority but where a huge savings in Memory usage can be had sometimes that becomes an important consideration.
As a simplified example, you likely don't want a half dozen Math Instructions in a row all accessing the same Register followed by a half dozen Memory Transfer Instructions all accessing the same Memory Address, you would interleave them.
In writing for a specific Architecture one would intersperse Memory access with other Instructions (particularly with Bulldozer) so that the CPU isn't Thread locked or sitting on a Wait State (like Multitasking on a single Thread level).
So everyone should feel free to read and improve the submission, the LGPL even allows Intel's submission to be used in proprietary software (so don't say that Intel never gives anything away).
Thanks Intel.
Comment