Announcement

**coder** · 02 December 2022, 03:03 PM

Originally posted by Hafting View Post

I believe there is still only a linear slowdown for recalculating the frame when needed?

That still might not be acceptable for realtime workloads.

**piotrj3** · 02 December 2022, 10:59 PM

I don't understand why in hell someone would want it to make it global.

Like 99% of workload you gonna have beyond process you use right now will be tied to either talking with kernel (and kernel doing something) or talking to something like glibc. There is not a single reason why by default every single package is supposed to run with it.

For Python or Java stuff - why in hell you even want it. They are profiled/debugged using dedicated tools to those specific languages not gcc flags.

Literally if you have problem profiling one specific thing outside of program you are developing, why the hell you don't recompile that one specific thing to your needs. You can put everything you want there. heck even if you did profile it and found something you can improve there - what you gonna do now. If you maintain your own version here for own needs, then you maintain own compilation flags there.

There is only 1 exception from it, where it sort of may make sense a bit - KDE developers/GNOME. And that is in case if you have automated bug reporting and that automated bug reporting will make more sense with such compilation flag applied, sure go for it. But outside of automated bug reporting for software known to have a log of bugs (like DEs in Linux) absolutly no.

**PluMGMK** · 03 December 2022, 10:55 AM

Originally posted by AdrianBc View Post

Having 2 distinct registers, one to be used as stack pointer and one to be used as frame pointer, is a mistake inherited by x86 from Intel 8086. However this is a mistake that has been done in many CPU instruction sets.

Ah yes, because on the original 8086, you could only use the registers BP, BX, SI and DI for addressing…

Originally posted by AdrianBc View Post

On x86, a couple of instructions would be needed for the same effect, but they cannot be used because an interrupt can arrive between the 2 instructions, which can corrupt the stack.

Is that strictly true? In Protected/Long Mode, don't interrupts in usermode automatically switch the stack anyway? Or could the kernel still decide to mess up the user-mode stack while handling the interrupt?

Originally posted by AdrianBc View Post

In theory, it would be possible to add to x86 an instruction that would atomically subtract a value from RSP, to allocate a new stack frame where the new value of RSP would be the frame pointer, while the old value of RSP, i.e. the old frame pointer, would be saved at the base of the new stack frame, ensuring that the stack always is a linked list of the stack frames, which can be examined for debugging.

x86 already has plenty of much more complex and less useful instructions.

Like ENTER, for one, which as I understand it basically does what you're saying but, of course, using the two registers! Nice job Intel…

**AdrianBc** · 04 December 2022, 04:18 AM

Originally posted by PluMGMK View Post

Ah yes, because on the original 8086, you could only use the registers BP, BX, SI and DI for addressing…

Is that strictly true? In Protected/Long Mode, don't interrupts in usermode automatically switch the stack anyway? Or could the kernel still decide to mess up the user-mode stack while handling the interrupt?

Like ENTER, for one, which as I understand it basically does what you're saying but, of course, using the two registers! Nice job Intel…

Since the protected mode has been introduced in 80286, the stacks are switched whenever the privilege level changes, which on most modern operating systems happens whenever the user-mode programs are interrupted.

However, when the execution is already inside the kernel, a new interrupt will not switch the stack. Even if the interrupts are disabled, that does not help as there are still exceptions that can happen in any instruction and there are also non-maskable interrupts.

So even if in user-mode programs it would be possible to store the old frame pointer in the stack (from RSP), in the future stack frame, then allocate the stack frame by decreasing the stack pointer (RSP), to point to the saved frame pointer, at the base of the new stack frame, and hope that the future stack frame is not corrupted before the update of the stack pointer, this does not work in any function that is used in the kernel or in a device driver.

This could be made to work by having different compilation modes for user mode and for kernel mode programs, where only for the user-mode programs it would be guaranteed that the stack can be examined by following the chain of frame pointers. Even now, most operating system kernels, including the Linux kernel, use certain compilation options that are not normally used in non-privileged programs, but such a frame pointer implementation would greatly increase the risk that some function might be compiled with the wrong options and used in some device driver, causing serious memory corruption.

Moreover, on x86, this kind of stack implementation, while having optimal speed, may increase a little the program size, because for the saving and restoring of the registers it replaces the 1-byte (first 7 registers) or 2-byte (the remaining registers) PUSH/POP instructions with 5-byte MOV instructions. On IBM POWER, there was no such problem, because it had load/store instructions that could save or restore any number of registers with a single instruction, while on x86 the PUSHA/POPA instructions have been removed by AMD in the transition to 64-bit (they were bad anyway, because they could not specify the registers to be saved/restored).

**PluMGMK** · 04 December 2022, 08:36 AM

Ah yes, of course, there is no compiler option "this will run in Ring 0", so it would be all too easy for someone to accidentally compile kernel code with this type of frame and screw everything up…

Imagine "PUSHAQ/POPAQ" if they hadn't been removed… The number of registers and the size of each register would be doubled wrt PUSHAD/POPAD, so it would just push 128 bytes on the stack in one fell swoop. I can see why it was removed all right!

Announcement

Fedora's FESCo Rejects The Idea Of "-fno-omit-frame-pointer" As Default Compiler Flag

Comment

Comment

Comment

Comment

Comment