Announcement

Collapse
No announcement yet.

Google Engineer Shows "SESES" For Mitigating LVI + Side-Channel Attacks - Code Runs ~7% Original Speed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by ssokolow View Post
    From the VLIW Wikipedia page:

    Outside embedded processing markets, Intel's Itanium IA-64 explicitly parallel instruction computing (EPIC) and Elbrus 2000 appear as the only examples of a widely used VLIW CPU architectures.


    Comment


    • #32
      Originally posted by kravemir View Post

      First time seeing zig and jag languages,.. will take a bit better look later, in free time.

      Well, I hated garbage collection almost ten years ago,... However, that was from user's perspective ignoring related costs to "faster" memory management,... So, there are various ways to do automatic (language guaranteed) memory management. Each has got different advantages (performance) and costs (performance, and need of programmer's assistance and correctness). Reference counting offers immediate freeing after reference count goes to zero, but is prone to memory leaks based on cyclical references, which aren't reachable from any thread anymore, so needs programmer's assistance. Weak references could solve that issue, but mechanism is more bit complex and impacts performance a bit (linked list of weak references to "object"), and also could lead to unwanted freeing of still useful "objects", also needs programmer's assistance. Unique pointers are the most restricting ones,... And, so on,... Garbage collection offers easiest way to write safe code, which doesn't have memory leaks, but it has probably the highest HW resource costs, as memory goes higher, than needed, as garbage collection is being run on some thresholds, and also consumes some computing power during garbage collection. ..

      So,... it's matter of taste and mainly of use-case (type of software). Business will make the bet on the safest and easiest way (currently, winning Java and similar languages). System, and desktop application, programming usually goes for stable and predictable performance (ie. not garbage collection). So, Rust-like languages might be the best for common system/desktop applications, and golang-like languages for customer's specific web server applications.
      Real time Java got around some of this by running the garbage collector more often, and in shorter bursts that were interleaved. It was fast enough to work on an app streaming real time video. Please note, I'm not a fan of Java in general, but I do respect the work that was put into it to make it run real time work loads.

      Comment


      • #33
        Originally posted by milkylainen View Post

        What you're probably thinking of is a fully static scheduled machine like Intels EPIC (IA-64 Itanium).
        There you have full explicit control over instruction scheduling. I definitely differ between EPIC and VLIW.
        To me, Itanium is a VLIW/EPIC machine.

        VLIW/EPIC does not change the fact of sidechannels in a CPU. It only changes where ILP is extracted.
        EPIC is a HP developed Arch,
        Firstly developed by HP, and later acquired, in part, by intel, and even later acquired in total participation by intel..

        HP done that to cut the production costs( adding intel to the mix ).. it was a flop, because AMD left aside, and being threatened( AMD ), they developed further the x86, that was why EPiC died.. well not only..
        EPIC was very exotic, there are some special cases of windows heavily adapted to run there, but I have never saw a MS Windows OS running in EPIC only HPUX, and is was amazing!!

        I used to work on some mainframes based on the Itaniums, running HPUX( using a csh ) and the great Informix
        This systems are still alive and kicking!!
        They are very good..

        Usually nowadays CPUs have pipelining, several execution units, a hardware scheduler( one of the big problems from a security perspective, if not the bigger ), and they can do in order or out of order execution, everything is done in hardware, the superscaler way.. and even speculative execution..

        In VLIW, pipelining is transversal, as you load all instructions at once, the scheduler is software based, so any patch, that you eventually need is made to the compiler software, and the problem is solved, you don't have hardware schedulers( messing around with out of order execution, and speculative execution )..

        Its not a perfect world as could be there some type of exploits, but is way less prone to it, and easier to solve the problems that may arise, since the compiler play the bigger role, and the compiler is software..
        OfCourse, the software scheduler needs to be a sane environment, I think everybody understands that speculative execution is like a virus, and we are seeing it now, with google getting 7% performance of a CPU, or patching the compilers 22% of the real perf..

        So isn't better a CPU that does not do that and has 75% of the performance? it would be orders of magnitude more performant after all..
        You want more performance? Increase the with of fetch,
        Elbrus 2k do 25 instructions per fetch cycle, some one above spoke that Elbrus 8/16 will do 50 instructions per fetch cycle..

        The only reservation I have to VLIW with very large fetch sizes, is when your program is small..

        Imagine a very small application with 30 instructions..
        in Elbrus 2k, it will need to fetch 2 times, but the second one will only execute 5 instructions in parallel..so there is a penalty for this type of situations.. but its better than running at 7% performance or 22% perf all the time on x86..
        Last edited by tuxd3v; 21 March 2020, 02:58 PM.

        Comment


        • #34
          Originally posted by Raka555 View Post
          This is very interesting and I like the idea.
          So basically the compiler does the pipe-lining. It would solve this whole mess.
          Are there any working CPUs that one can buy ? Would be great if there were something like an RPI with this technology.
          Like others suggested in the past the Embedded space used VLIW
          I don't know about other VLIW CPUs other than Elbrus ones in production today( but they are very restrictive )..

          I confess I already tried to acquire one , several times
          I have never succeeded!

          Sopped by the price or by restrictive measures..

          Comment


          • #35
            Originally posted by dweigert View Post
            Real time Java got around some of this by running the garbage collector more often, and in shorter bursts that were interleaved. It was fast enough to work on an app streaming real time video. Please note, I'm not a fan of Java in general, but I do respect the work that was put into it to make it run real time work loads.
            Garbage Collector introduce some benefits when you deal with large datasets too.. at same time, you also need a system with plenty of Memory( its the downside of it.. )

            It can turn your application faster..
            Imagine the situation were you will need 1000 objects of a type..

            In C/C++ you have to create the objects( malloc them ) and later initialize them.

            With Garbage Collector in place,
            It could be that you already have them in the collection, and you only need to assign pointers to previously deleted objects, and initialize them...

            In C/C++ you have to malloc them first( loosing a lot of time doing this type of operations repetitively ), to later initialize them..
            So, its a mechanism that permits re-utilization of "previously deleted" objects..

            In Midlewares you take advantage of it..

            Comment


            • #36
              Originally posted by tuxd3v View Post

              Garbage Collector introduce some benefits when you deal with large datasets too.. at same time, you also need a system with plenty of Memory( its the downside of it.. )

              It can turn your application faster..
              Imagine the situation were you will need 1000 objects of a type..

              In C/C++ you have to create the objects( malloc them ) and later initialize them.

              With Garbage Collector in place,
              It could be that you already have them in the collection, and you only need to assign pointers to previously deleted objects, and initialize them...

              In C/C++ you have to malloc them first( loosing a lot of time doing this type of operations repetitively ), to later initialize them..
              So, its a mechanism that permits re-utilization of "previously deleted" objects..
              That isn't an inherent difference.
              • Allocators like jemalloc implement mechanisms to maximize efficiency of memory reuse, such a pooled allocation.
              • Collections like Rust's Vec<T> implement a distinction between capacity (number of items that memory has been allocated for) and length (number of items currently stored) and provide APIs like with_capacity(...) which allow you to pre-allocate to a size you expect to need which will then remain fixed unless you exceed that and force it to reallocate. (To amortize the cost of memory allocation, Vec<T> doesn't shrink unless specifically asked. It just waits until the whole container is dropped to free the memory. It also takes inspiration from read-ahead caching and requests more memory than immediately necessary when it grows unless you manually take over managing the allocated capacity.)
              The inherent difference is that a tracing garbage collector or other system that knows every pointer in play can move allocations around. (Classic MacOS and Windows 3.1 accomplished that for languages like C by using "pointer to a global table of pointers" references where you had to acquire the memory handle before dereferencing it, which would lock it so the memory allocator couldn't move the allocation behind your back, then release it when you're done with it for the moment.)
              Last edited by ssokolow; 21 March 2020, 10:01 PM.

              Comment


              • #37
                Originally posted by Mario Junior View Post
                mitigations=off and fuck this shit!
                how much did the NSA pay you for that?

                Comment


                • #38
                  I don't understand one thing all speculative executation issues are result of threads running on one CPU core having access to cached data from the other threads, right?

                  Then why just invalidating caches for insecure threads before switching context to them would be not enough? Hell, if whole environment is insecure invalidate on each context switch, it still would be a lot faster than Google's solution.

                  Comment


                  • #39
                    Originally posted by blacknova View Post
                    I don't understand one thing all speculative executation issues are result of threads running on one CPU core having access to cached data from the other threads, right?
                    no. some attacks are within the same thread (JS in a browser reading data from outside its sandbox, for example).

                    Originally posted by blacknova View Post
                    Then why just invalidating caches for insecure threads before switching context to them would be not enough?
                    even if we ignore attacks within the same thread, SMT exists.

                    Comment


                    • #40
                      Originally posted by blacknova View Post
                      I don't understand one thing all speculative executation issues are result of threads running on one CPU core having access to cached data from the other threads, right?
                      No, it has to do with the fact that, on superscalar CPUs, performance is achieved by being parallel within a single core. (eg. Spectre comes from how the CPU speculatively executes code within a single thread but fails to clean up properly if it turns out to have guessed wrong about which branch the code will take. That allows malicious code to trick the CPU into leaking information about sensitive code it "didn't execute", bypassing the defined security restrictions.)

                      The mitigations make things so slow because the only reliable way to request safety with the interfaces available to software boils down to requesting that the CPU throw out all caches and start over.

                      Comment

                      Working...
                      X