Announcement

**pegasus** · 19 July 2022, 07:19 AM

What would be possible use cases for this? I vaguely remember something like this being present in i960mx ...

**Vistaus** · 19 July 2022, 08:49 AM

-----

**yump** · 19 July 2022, 08:59 AM

The documentation and commit message say "metadata", but imagine the struct packing possibilities for march=native in languages that allow the compiler to freely arrange member variables. Anything that has a pointer in it along with some field(s) that fit in 6/15 bits can be made smaller, without having to insert code to clear the upper bits before using the pointer.

**bob l'eponge** · 19 July 2022, 10:13 AM

Mainly for atomic operations. Since there is no double compare and swap (that is compare 2x 64 bits register and swap if equal) in AMD64, you can't write a some lock free structures without this (like a double linked list, or a binary tree). Using this free bits means you can now have those, since you can store a "revision" counter in these 7 bits (9 bits if you count the 2 low bits too), so you can be safe with up to 2^9 simultaneous operation on a shared 64 bit value (like a A-B-A problem that's corrupted after 1 swap, you can still be safe after 2^8 swaps)

**pegasus** · 19 July 2022, 10:29 AM

i960MX had this "tag bit" thing that made it possible to implement memory protection in hardware. Could this be used in the same way? These "free" bits could represent bitmasks that would map to processes (or users) and would represent ownership of memory addresses ...

**archkde** · 19 July 2022, 11:45 AM

Originally posted by bob l'eponge View Post

Mainly for atomic operations. Since there is no double compare and swap (that is compare 2x 64 bits register and swap if equal) in AMD64, you can't write a some lock free structures without this (like a double linked list, or a binary tree). Using this free bits means you can now have those, since you can store a "revision" counter in these 7 bits (9 bits if you count the 2 low bits too), so you can be safe with up to 2^9 simultaneous operation on a shared 64 bit value (like a A-B-A problem that's corrupted after 1 swap, you can still be safe after 2^8 swaps)

While it's presumably more expensive than a 64-bit operation, "lock cmxchg16b" (the 64-bit variant of the infamous "lock cmpxchg8b") is a double compare and swap.

**bob l'eponge** · 19 July 2022, 12:22 PM

Originally posted by archkde View Post

While it's presumably more expensive than a 64-bit operation, "lock cmxchg16b" (the 64-bit variant of the infamous "lock cmpxchg8b") is a double compare and swap.

It's a double width compare an swap, not a double compare and swap. It's useful but it forces the layout of the pointer to be contiguous which is harder.

**archkde** · 19 July 2022, 01:09 PM

Originally posted by bob l'eponge View Post

It's a double width compare an swap, not a double compare and swap. It's useful but it forces the layout of the pointer to be contiguous which is harder.

Makes sense, but don't your examples have the same problem?

**bob l'eponge** · 19 July 2022, 01:46 PM

It depends. If the CPU actually ignore those bits when accessing the pointed item but really compare them in the CAS operation, then it behaves like a DCAS, since it's comparing X and Y at unrelated address, and you can fiddle with the bits in Y and X whatever you like. The operation is something like

Code:

X = X | some marker in high bits; Z = Y & high bits mask;  DCAS(X, Y, Z, new value)

.

If instead it ignores the high bit in all instructions then it's a regular DWCAS as you said.
I don't know this technology so I can't say. We'll certainly see people use it through.

Announcement

Intel Revs Its Linear Address Masking Patches For Linux

Intel Revs Its Linear Address Masking Patches For Linux

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment