Announcement
Collapse
No announcement yet.
Google Engineer Shows "SESES" For Mitigating LVI + Side-Channel Attacks - Code Runs ~7% Original Speed
Collapse
X
-
Originally posted by kravemir View Post
First time seeing zig and jag languages,.. will take a bit better look later, in free time.
Well, I hated garbage collection almost ten years ago,... However, that was from user's perspective ignoring related costs to "faster" memory management,... So, there are various ways to do automatic (language guaranteed) memory management. Each has got different advantages (performance) and costs (performance, and need of programmer's assistance and correctness). Reference counting offers immediate freeing after reference count goes to zero, but is prone to memory leaks based on cyclical references, which aren't reachable from any thread anymore, so needs programmer's assistance. Weak references could solve that issue, but mechanism is more bit complex and impacts performance a bit (linked list of weak references to "object"), and also could lead to unwanted freeing of still useful "objects", also needs programmer's assistance. Unique pointers are the most restricting ones,... And, so on,... Garbage collection offers easiest way to write safe code, which doesn't have memory leaks, but it has probably the highest HW resource costs, as memory goes higher, than needed, as garbage collection is being run on some thresholds, and also consumes some computing power during garbage collection. ..
So,... it's matter of taste and mainly of use-case (type of software). Business will make the bet on the safest and easiest way (currently, winning Java and similar languages). System, and desktop application, programming usually goes for stable and predictable performance (ie. not garbage collection). So, Rust-like languages might be the best for common system/desktop applications, and golang-like languages for customer's specific web server applications.
- Likes 1
Comment
-
Originally posted by milkylainen View Post
What you're probably thinking of is a fully static scheduled machine like Intels EPIC (IA-64 Itanium).
There you have full explicit control over instruction scheduling. I definitely differ between EPIC and VLIW.
To me, Itanium is a VLIW/EPIC machine.
VLIW/EPIC does not change the fact of sidechannels in a CPU. It only changes where ILP is extracted.
Firstly developed by HP, and later acquired, in part, by intel, and even later acquired in total participation by intel..
HP done that to cut the production costs( adding intel to the mix ).. it was a flop, because AMD left aside, and being threatened( AMD ), they developed further the x86, that was why EPiC died.. well not only..
EPIC was very exotic, there are some special cases of windows heavily adapted to run there, but I have never saw a MS Windows OS running in EPIC only HPUX, and is was amazing!!
I used to work on some mainframes based on the Itaniums, running HPUX( using a csh ) and the great Informix
This systems are still alive and kicking!!
They are very good..
Usually nowadays CPUs have pipelining, several execution units, a hardware scheduler( one of the big problems from a security perspective, if not the bigger ), and they can do in order or out of order execution, everything is done in hardware, the superscaler way.. and even speculative execution..
In VLIW, pipelining is transversal, as you load all instructions at once, the scheduler is software based, so any patch, that you eventually need is made to the compiler software, and the problem is solved, you don't have hardware schedulers( messing around with out of order execution, and speculative execution )..
Its not a perfect world as could be there some type of exploits, but is way less prone to it, and easier to solve the problems that may arise, since the compiler play the bigger role, and the compiler is software..
OfCourse, the software scheduler needs to be a sane environment, I think everybody understands that speculative execution is like a virus, and we are seeing it now, with google getting 7% performance of a CPU, or patching the compilers 22% of the real perf..
So isn't better a CPU that does not do that and has 75% of the performance? it would be orders of magnitude more performant after all..
You want more performance? Increase the with of fetch,
Elbrus 2k do 25 instructions per fetch cycle, some one above spoke that Elbrus 8/16 will do 50 instructions per fetch cycle..
The only reservation I have to VLIW with very large fetch sizes, is when your program is small..
Imagine a very small application with 30 instructions..
in Elbrus 2k, it will need to fetch 2 times, but the second one will only execute 5 instructions in parallel..so there is a penalty for this type of situations.. but its better than running at 7% performance or 22% perf all the time on x86..Last edited by tuxd3v; 21 March 2020, 02:58 PM.
- Likes 1
Comment
-
Originally posted by Raka555 View PostThis is very interesting and I like the idea.
So basically the compiler does the pipe-lining. It would solve this whole mess.
Are there any working CPUs that one can buy ? Would be great if there were something like an RPI with this technology.
I don't know about other VLIW CPUs other than Elbrus ones in production today( but they are very restrictive )..
I confess I already tried to acquire one , several times
I have never succeeded!
Sopped by the price or by restrictive measures..
Comment
-
Originally posted by dweigert View PostReal time Java got around some of this by running the garbage collector more often, and in shorter bursts that were interleaved. It was fast enough to work on an app streaming real time video. Please note, I'm not a fan of Java in general, but I do respect the work that was put into it to make it run real time work loads.
It can turn your application faster..
Imagine the situation were you will need 1000 objects of a type..
In C/C++ you have to create the objects( malloc them ) and later initialize them.
With Garbage Collector in place,
It could be that you already have them in the collection, and you only need to assign pointers to previously deleted objects, and initialize them...
In C/C++ you have to malloc them first( loosing a lot of time doing this type of operations repetitively ), to later initialize them..
So, its a mechanism that permits re-utilization of "previously deleted" objects..
In Midlewares you take advantage of it..
Comment
-
Originally posted by tuxd3v View Post
Garbage Collector introduce some benefits when you deal with large datasets too.. at same time, you also need a system with plenty of Memory( its the downside of it.. )
It can turn your application faster..
Imagine the situation were you will need 1000 objects of a type..
In C/C++ you have to create the objects( malloc them ) and later initialize them.
With Garbage Collector in place,
It could be that you already have them in the collection, and you only need to assign pointers to previously deleted objects, and initialize them...
In C/C++ you have to malloc them first( loosing a lot of time doing this type of operations repetitively ), to later initialize them..
So, its a mechanism that permits re-utilization of "previously deleted" objects..- Allocators like jemalloc implement mechanisms to maximize efficiency of memory reuse, such a pooled allocation.
- Collections like Rust's Vec<T> implement a distinction between capacity (number of items that memory has been allocated for) and length (number of items currently stored) and provide APIs like with_capacity(...) which allow you to pre-allocate to a size you expect to need which will then remain fixed unless you exceed that and force it to reallocate. (To amortize the cost of memory allocation, Vec<T> doesn't shrink unless specifically asked. It just waits until the whole container is dropped to free the memory. It also takes inspiration from read-ahead caching and requests more memory than immediately necessary when it grows unless you manually take over managing the allocated capacity.)
Last edited by ssokolow; 21 March 2020, 10:01 PM.
Comment
-
I don't understand one thing all speculative executation issues are result of threads running on one CPU core having access to cached data from the other threads, right?
Then why just invalidating caches for insecure threads before switching context to them would be not enough? Hell, if whole environment is insecure invalidate on each context switch, it still would be a lot faster than Google's solution.
Comment
-
Originally posted by blacknova View PostI don't understand one thing all speculative executation issues are result of threads running on one CPU core having access to cached data from the other threads, right?
Originally posted by blacknova View PostThen why just invalidating caches for insecure threads before switching context to them would be not enough?
- Likes 1
Comment
-
Originally posted by blacknova View PostI don't understand one thing all speculative executation issues are result of threads running on one CPU core having access to cached data from the other threads, right?
The mitigations make things so slow because the only reliable way to request safety with the interfaces available to software boils down to requesting that the CPU throw out all caches and start over.
Comment
Comment