Ah yes, of course, there is no compiler option "this will run in Ring 0", so it would be all too easy for someone to accidentally compile kernel code with this type of frame and screw everything up…
Imagine "PUSHAQ/POPAQ" if they hadn't been removed… The number of registers and the size of each register would be doubled wrt PUSHAD/POPAD, so it would just push 128 bytes on the stack in one fell swoop. I can see why it was removed all right!
Announcement
Collapse
No announcement yet.
Fedora's FESCo Rejects The Idea Of "-fno-omit-frame-pointer" As Default Compiler Flag
Collapse
X
-
Originally posted by PluMGMK View PostAh yes, because on the original 8086, you could only use the registers BP, BX, SI and DI for addressing…
Is that strictly true? In Protected/Long Mode, don't interrupts in usermode automatically switch the stack anyway? Or could the kernel still decide to mess up the user-mode stack while handling the interrupt?
Like ENTER, for one, which as I understand it basically does what you're saying but, of course, using the two registers! Nice job Intel…
Since the protected mode has been introduced in 80286, the stacks are switched whenever the privilege level changes, which on most modern operating systems happens whenever the user-mode programs are interrupted.
However, when the execution is already inside the kernel, a new interrupt will not switch the stack. Even if the interrupts are disabled, that does not help as there are still exceptions that can happen in any instruction and there are also non-maskable interrupts.
So even if in user-mode programs it would be possible to store the old frame pointer in the stack (from RSP), in the future stack frame, then allocate the stack frame by decreasing the stack pointer (RSP), to point to the saved frame pointer, at the base of the new stack frame, and hope that the future stack frame is not corrupted before the update of the stack pointer, this does not work in any function that is used in the kernel or in a device driver.
This could be made to work by having different compilation modes for user mode and for kernel mode programs, where only for the user-mode programs it would be guaranteed that the stack can be examined by following the chain of frame pointers. Even now, most operating system kernels, including the Linux kernel, use certain compilation options that are not normally used in non-privileged programs, but such a frame pointer implementation would greatly increase the risk that some function might be compiled with the wrong options and used in some device driver, causing serious memory corruption.
Moreover, on x86, this kind of stack implementation, while having optimal speed, may increase a little the program size, because for the saving and restoring of the registers it replaces the 1-byte (first 7 registers) or 2-byte (the remaining registers) PUSH/POP instructions with 5-byte MOV instructions. On IBM POWER, there was no such problem, because it had load/store instructions that could save or restore any number of registers with a single instruction, while on x86 the PUSHA/POPA instructions have been removed by AMD in the transition to 64-bit (they were bad anyway, because they could not specify the registers to be saved/restored).Last edited by AdrianBc; 04 December 2022, 04:45 AM.
- Likes 2
Leave a comment:
-
Originally posted by AdrianBc View PostHaving 2 distinct registers, one to be used as stack pointer and one to be used as frame pointer, is a mistake inherited by x86 from Intel 8086. However this is a mistake that has been done in many CPU instruction sets.
Originally posted by AdrianBc View PostOn x86, a couple of instructions would be needed for the same effect, but they cannot be used because an interrupt can arrive between the 2 instructions, which can corrupt the stack.
Originally posted by AdrianBc View PostIn theory, it would be possible to add to x86 an instruction that would atomically subtract a value from RSP, to allocate a new stack frame where the new value of RSP would be the frame pointer, while the old value of RSP, i.e. the old frame pointer, would be saved at the base of the new stack frame, ensuring that the stack always is a linked list of the stack frames, which can be examined for debugging.
x86 already has plenty of much more complex and less useful instructions.
- Likes 2
Leave a comment:
-
I don't understand why in hell someone would want it to make it global.
Like 99% of workload you gonna have beyond process you use right now will be tied to either talking with kernel (and kernel doing something) or talking to something like glibc. There is not a single reason why by default every single package is supposed to run with it.
For Python or Java stuff - why in hell you even want it. They are profiled/debugged using dedicated tools to those specific languages not gcc flags.
Literally if you have problem profiling one specific thing outside of program you are developing, why the hell you don't recompile that one specific thing to your needs. You can put everything you want there. heck even if you did profile it and found something you can improve there - what you gonna do now. If you maintain your own version here for own needs, then you maintain own compilation flags there.
There is only 1 exception from it, where it sort of may make sense a bit - KDE developers/GNOME. And that is in case if you have automated bug reporting and that automated bug reporting will make more sense with such compilation flag applied, sure go for it. But outside of automated bug reporting for software known to have a log of bugs (like DEs in Linux) absolutly no.
- Likes 3
Leave a comment:
-
Originally posted by archkde View Post
The point here is not general debugging, but profiling. This is (other than in the most extreme cases) not something you can do with printf.
Anyway, the purpose of profiling seems to be optimizing for performance. Wasting a register on a convenience frame pointer, runs contrary to that goal. There will always be those inner loops that needs exactly one register more than you have - and then this bites.
Leave a comment:
-
Why dont they use gentoo? They can have whatever default flags for their entire system they want.
Leave a comment:
-
A better solution would be to adopt shadow stack like clang CFI.
This would not only makes profiling easier, but also prevents ROP attacks.
Leave a comment:
-
Originally posted by coder View PostIs this strictly an x86 thing? On AArch64, the performance impact of carrying a frame pointer should be negligible, due to double the general-purpose registers. I would like to see some benchmarks on that, but you'd obviously want to use a more recent core than something like the Pi's A72. Michael, do you have access to an Ampere Altra?
Having 2 distinct registers, one to be used as stack pointer and one to be used as frame pointer, is a mistake inherited by x86 from Intel 8086. However this is a mistake that has been done in many CPU instruction sets.
One notable ISA where this is done right is the IBM POWER, where a single register can be used both as a stack pointer and as a frame pointer. Therefore there is never any need to make compromises between performance and debugging convenience.
The reason why this is possible on IBM POWER, but impossible on x86, is that the former has an instruction that can update atomically both the stack pointer/frame pointer register and the location on the stack where the previous frame pointer is saved, whenever a new stack frame is allocated.
On x86, a couple of instructions would be needed for the same effect, but they cannot be used because an interrupt can arrive between the 2 instructions, which can corrupt the stack.
In theory, it would be possible to add to x86 an instruction that would atomically subtract a value from RSP, to allocate a new stack frame where the new value of RSP would be the frame pointer, while the old value of RSP, i.e. the old frame pointer, would be saved at the base of the new stack frame, ensuring that the stack always is a linked list of the stack frames, which can be examined for debugging.
x86 already has plenty of much more complex and less useful instructions.
However, it is unlikely that the inertia can be overcome, because there is a huge amount of software tools for x86 that are built upon the model with 2 distinct registers, RSP and RBP, even if this has the consequence of having to choose between higher execution performance and easier debugging.
- Likes 5
Leave a comment:
-
People seem to be forgetting that global settings like this should reflect the character of the clear majority of its users/stake holders. That's not the case here, even with certain groups coming out in favor of the change. If the clear majority of Fedora's users were Google, Meta, Gnome, and KDE developers that wanted these changes then it would be appropriate to make such a change, but it's obvious that's not the case from the sheer popularity of the distribution alone.
What it does appear to me is that there is enough manpower resources to create a Fedora spin and/or SIG with appropriate packages that focus on this particular topic for those interested. Sorry, but the "upstreaming our changes" refers to code changes for maintenance that benefits the broader community, not screwing with compiler and build options that are on balance demonstrably detrimental to the rest of the user base. You can't tell me the billions of dollars in ill gotten wealth from Meta and Google can't be used to create and support the long term maintenance of a profiling friendly Fedora spin. If nothing else just terminate the C-suite bonus packages at either company and you can support such a project indefinitely. Not a bit of sympathy here.
- Likes 4
Leave a comment:
Leave a comment: