As mentioned in the posts by the developer, traditionally -Os has been the fastest option because most of they have a ton of cold code that hardly ever gets executed, and -Os allows for better caching behavior.
It sounds like they've narrowed the issue down to the way gcc is selecting what code to inline when given the -Os flag. Apparently it's not inlining some code even when doing so would result in smaller output.
It sounds like they've narrowed the issue down to the way gcc is selecting what code to inline when given the -Os flag. Apparently it's not inlining some code even when doing so would result in smaller output.
Comment