Announcement

**Nuc!eoN** · 05 February 2020, 04:58 AM

This is a game changer!

**Murple** · 05 February 2020, 08:17 AM

Care to enlighten me?

**Space Heater** · 05 February 2020, 01:23 PM

Originally posted by Murple View Post

Care to enlighten me?

I can make an attempt.

An atomic operation is an operation that either fully completes a sequence of actions or does not complete any of them. This is why the term "atomic" is used, the sequence either is done all in one go or not at all.

Atomic operations are used to serialize access to shared data between threads. For example if I want thread A to modify some shared data, I want to make sure that thread B is not currently accessing it. So in this example thread A performs an atomic operation on a lock in memory in an attempt to acquire the lock and thus prevent other threads from accessing that data while thread A is writing and/or reading it.

But in order to update this lock we must perform several actions, for example:

load the value from memory into a register
check the value.
If it is 0, write 1 to memory to mark it as taken and move onto modifying the shared data
If it is 1, then another thread has the lock, and the current thread must wait or do something else until the lock is marked as free.

Without atomic operations, a problem can occur when two or more threads try to acquire the same lock at around the same time.

For example, say the the lock is free and thread A and B are attempting to take the same lock for the same shared data.

Thread A loads the value of the lock into a register.
Thread B loads the value of the lock into a register.
Thread A checks the value of the register and sees that it is 0, so it is free to obtain the lock.
Thread B checks the value of the register and sees that it is 0, so it is free to obtain the lock.
Thread A writes 1 to memory to mark the lock as taken.
Thread B writes 1 to memory to mark the lock as taken.
Thread A starts modifying shared data.
Thread B starts modifying shared data.
Result: Both threads are in the same critical section, which is a big problem.

The amd64 ISA allows unaligned atomic operations for legacy reasons, and I don't know of any other current ISA that allows this.

An unaligned operation means it accesses memory that is not aligned to a word boundary, in other words it is accessing a byte address that is not a multiple of the word length (which on amd64 is 64 bits or 8 bytes).

Why is this a problem? To understand this you need to know a bit about how atomic operations are implemented.

On systems with a single processor, all the operating system needs to do is disable interrupts, thereby preventing any possibility that the current thread is interrupted and so the sequence of operations that we want to be atomic (such as compare and swap) will complete without interference.

But most systems today have multiple cores, so we need to do more than just disable interrupts, we need a way to prevent other threads on other cores from modifying the memory location of the lock while a thread is trying to acquire it.

How can this be done?

The fast and common way this is done is by only "locking/freezing" a single cache line on every processor, which is the cache line that would contain the memory address of the lock. This is fast because other processors are minimally affected by this, and can still perform most of their memory operations while one of the cores is acquiring a lock.

But the above method only works if a lock is entirely contained in a single cache line, and this is guaranteed only if the lock is word aligned. This is where the term "split lock" comes from, the locking structure is split along two cache lines.

When there is a lock that is not aligned, and thus potentially lies across two cache lines, the processor still needs to guarantee atomicity but cannot simply lock one cache line. In this scenario the processor falls back to locking the entire memory bus (a global bus lock as intel calls it) for all other cores.

This global bus lock prevents all other processors from performing any memory operations while the one thread is attempting to acquire the unaligned lock. This can devastate performance if you have a system with a lot of cores, and several threads are attempting to acquire locks. This can even become a denial of service attack in a shared hosting environment where one malicious guest continues to lock the memory bus of the entire system by attempting to acquire an unaligned lock over and over in a loop.

Real time systems are also impacted by this. A common configuration is a system where one core is running a safety-critical RTOS that is supposed to be isolated from the other cores that run a non-real time OS. But if the non-real time OS can accidentally or intentionally block the core running the RTOS from accessing memory, it can completely break any real-time guarantees of the system, and therefore impair a potentially safety-critical system.

While it is natural to ask why the hardware does not just lock two cache lines, from what I understand this would add an enormous amount of complexity to the hardware for a very rare occurrence.

So instead of trying to handle this in hardware, the idea is that if any application tries to access an unaligned lock, we just kill the application and raise an #AC trap (AC = Alignment Check).

The reasoning behind this is to force firmware and OS developers/companies (as everyone else should be shielded from these low level details) to not ship broken software, and force them to fix it. At first glance outright killing the application may seem extreme, but killing their software dead in its tracks as soon as the problem happens both aids in debugging, and is ultimately the best way to ensure they don't inflict broken software on the world.

**stompcrash** · 05 February 2020, 03:54 PM

Useful info, Space Heater. But I wonder, doesn't the compiler normally align allocated variables to word boundaries? Are we only talking about situations where programmers force the use of memory addresses which are unaligned? And if so, isn't this something which should be quite rare, in particular on desktop programs where programmers shouldn't be fussing too much with memory and making use of language features to sort that out for them?

**xnor** · 05 February 2020, 04:19 PM

This is not at all a game changer.

stompcrash Yes, this can happen when the programmer didn't understand what he's doing.

I consider such memory accesses as bugs. There are also tools to help detect and fix this, like gdb.

**Space Heater** · 05 February 2020, 04:39 PM

Originally posted by stompcrash View Post

isn't this something which should be quite rare, in particular on desktop programs where programmers shouldn't be fussing too much with memory and making use of language features to sort that out for them?

Yes it is not common, so the number cases where this would kill an application should be pretty limited.

**Murple** · 05 February 2020, 05:46 PM

Space Heater you are a dude. That was conscientiously thorough and I could actually understand it. Thanks

**Ungweliante** · 06 February 2020, 06:49 AM

Originally posted by Space Heater View Post

I can make an attempt.

...

Hoping for more comments like this in Phoronix!! Thank you very much!!

**hwertz** · 29 March 2022, 09:46 PM

Just saw this, started up steam and as it started downloading some updates got lines like:

Code:

[ 9335.290665] x86/split lock detection: #AC: CHTTPClientThre/82514 took a split_lock trap at address: 0xeb6a9713
[ 9338.075841] split_lock_warn: 16 callbacks suppressed

Luckily I'm only on a dual-core here so it's not stalling all that many cores...

Announcement

The Linux Kernel Will Be Able To Detect Split-Locks To Then Warn Or Kill Offending Apps

The Linux Kernel Will Be Able To Detect Split-Locks To Then Warn Or Kill Offending Apps

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment