Announcement

Collapse
No announcement yet.

Why FreeBSD Doesn't Aim For OpenMP Support Out-Of-The-Box

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by jrch2k8 View Post
    2.) Autovectorization: Is not part of OpenMP but part of the compiler and regardless the compiler is nowhere near as good as handwritten SIMD code except for very very obvious low hanging fruits, yeah not even the all mighty ICC. Compilers can show you ASM outputs for a reason people ....
    The advantage of OpenMP's loops is that you write the loops once and when the compiler adopts new instructions, a recompilation is enough to speed up your old program. For instance SSE -> AVX -> AVX2 -> AVX512 (buy Xeon or Cannon Lake when it comes).

    4.) OpenMP is really performant and is not dead by any standard, is just good enough to remain stable for a while.
    What I meant is that OpenMP is an ugly bandaid for legacy imperative languages. There are a lot better ideas and frameworks for higher level languages. C/C++ aren't that good for semantic analysis which could help with the 'autovectorization'. The syntax is awful, it's like an overgrown pre-processor which also needs some compiler support. There's also Cilk Plus now which offers a bit different POV, but is basically the same. People might also want software transactional memory, auto-tuning, skeleton libraries, futures etc.

    6.) OpenMP is an INNER LOOP tool, is not to start functions, is not to do arithmetics of 3 ints, is not to parallelize operations, etc.
    Since OpenMP 3 there's also been some task support.

    Comment


    • #32
      Originally posted by cade View Post
      "FreeBSD 1" was released in November 1993, not in 1970's.
      Unlike OpenMP, PThreads is generic enough to be used for any kind of parallelism.
      The reason for this is that PThreads is a lower-level API while OpenMP represents a higher-level API.

      PThreads offers more "plumbing" and so more time is used in design/implementation/testing of
      threaded-solutions but these solutions can be more creative/novel than what is possible from threaded-solitions
      implemented using the restricted (less plumbing-like) nature of OpenMP.

      There's a reason why threadding solutions (paradigms) like PThreads, OpenMP, etc. exist.
      They offer different programming paradigms that cater for different types of
      thread-solution-implementations.

      It is foolish to think that OSes in the class of FreeBSD had a problem
      taking advantage of multi-{CPU, core} hardware.
      You make the false assumption that I don't know what POSIX threads are. Nowadays as OpenMP has tasks and all, it's pretty much a platform agnostic superset of POSIX threads. Sure, there might be some special corner cases that still need pthreads, but for parallel programming, pthreads and OpenMP are more or less the same. They're also based on the same paradigm. Sure, OpenMP encourages fork/join parallelism, but you can program it like pthreads just as well. There is no real performance benefit to be gained for picking pthreads instead. Your post just gives the impression that you have no idea how large OpenMP has become. Besides, even OpenMP is pretty low level. Language like Clojure or Scala or Haskell have totally different abstractions.

      Comment


      • #33
        Originally posted by caligula View Post
        Well, I guess FreeBSD doesn't run on multi-core systems then. Seems rather stupid to waste 50-96% of the performance potential since your programming paradigm was invented in the 1970s. The basic work sharing constructs in OpenMP are already years old. I already implemented some programming assignments in school some 10 years ago with OpenMP 2.0.
        It's firewall pf is fully SMP capable so the base OS itself should have no issues with it either..

        Comment


        • #34
          Originally posted by aht0 View Post

          It's firewall pf is fully SMP capable so the base OS itself should have no issues with it either..
          The only problem is, if some existing multimillion LOC program already used OpenMP, Cilk+, OpenCL, OpenACC, or C++/boost wrappers for those, or whatever new parallel programming framework (Akka actors etc), they'll expect runtime support for those. Pthreads isn't sufficient since people already switched to other platforms (even if platforms like OpenMP actually depend on a pthreads backend). It's rarely the case that *BSD is the platform of choice for new software so you need to have the appropriate support or the multi-threading is disabled as a compile time option.

          Comment


          • #35

            You could have googled the necessary information, that OpenMP support is there, just not enabled by default, during less time it took you to write the 4-line statement. Or just concluded it from reading the Phoronix "source"-news itself.

            Why post if you really do not have least idea what FreeBSD is capable for or not? I can't logically conclude anything else from the statement. Just presumption that there isn't any and why it's bad..

            I've been accused on same thing about Linux but least I have some excuse of using the Linux.

            You do not even have to do anything tricky, just download most recent ports, extract it, go to necessary piece of software and type make (or make makeconfig) or just run portmaster on the folder. It's going to quizz you about the options you like and also options for all the other software pieces that are somehow depending from your choice. By default it's going to offer pre-selection of options but you can remove-add things based on your needs..

            If you need it painted red and illustrated

            Comment


            • #36
              Originally posted by caligula View Post

              Let me explain what the problem is. When generating thumbnails of various sizes, the input can be for example a 50 megapixel RAW image. This file can grow up to 50 MB these days. They sometimes come with embedded previews, sometimes not, sometimes the previews have useless size. When you scale down a 50 MP image into a fullhd preview or let's say a 128x128 pixel icon, the rendering speed is not an issue. Indeed the icons can be drawn sequentially. The backend task for scaling the image is 100% independent. The program only depends on the output image. You can even do it in a seccomp sandbox. Scaling the image may take few seconds. Synchronizing two threads few milliseconds at most. If you ever use some program like Adobe Lightroom, you'll see that this operation is really demanding and stresses the CPU. You really want to max all the cores. If one thumbnail is generated in a second, a 200 picture photoshoot will render the pictures for over 3 minutes. It's way too long, you just wanted to see the contents of one folder. Libjpeg-turbo might speed up this by a factor or 2 or so, but a OpenMP solution can easily speed up 16x or more on a Xeon.
              On a Xeon... Should have started with that. Because as a libjpeg (v8 & v9+drop patches apis) library user, I coded and scripted enough client and server side crap with it to know any meaningful loads will burden the bus rather then the CPU.
              Maybe now as NVM is trickling down to the consumer lines we'll see some bottlenecks shifting... But don't tell me OpenMP is a general purpose. Because for the most part, it's an optimization for the high-end that has performance costs for the low-end.

              Comment


              • #37
                Originally posted by c117152 View Post

                On a Xeon... Should have started with that. Because as a libjpeg (v8 & v9+drop patches apis) library user, I coded and scripted enough client and server side crap with it to know any meaningful loads will burden the bus rather then the CPU.
                Maybe now as NVM is trickling down to the consumer lines we'll see some bottlenecks shifting... But don't tell me OpenMP is a general purpose. Because for the most part, it's an optimization for the high-end that has performance costs for the low-end.
                Like I said, OpenMP also supports tasks, not just loop parallelization. Tasks can run as long as you want, for instance for one hour or day. The cost of handling tasks is not relevant once the tasks become big enough. I agree that compared to some truly concurrent / parallel languages, OpenMP doesn't look that useful.

                Comment


                • #38
                  Originally posted by caligula View Post

                  Like I said, OpenMP also supports tasks, not just loop parallelization. Tasks can run as long as you want, for instance for one hour or day. The cost of handling tasks is not relevant once the tasks become big enough. I agree that compared to some truly concurrent / parallel languages, OpenMP doesn't look that useful.
                  It's not that it's not as useful. It's that it doesn't occupy any use spaces at all: If you're targeting high-end, you're going to manually parallelize. If you're targeting the low-end, OpenMP is too slow.
                  Of course, this is all in the sphere of parallelism. For general concurrency that abstract multi-processing with some model or language, you might actually find OpenMP buried somewhere in the compilers with the appropriate uses (like cores >= 12). I think some of the CSP\Green threads python libraries mix and much Duff's devices, OpenMP, pthreads and good old locks and mutexes in their code. I suppose that's a legitimate use case

                  Comment


                  • #39
                    Originally posted by c117152 View Post

                    It's not that it's not as useful. It's that it doesn't occupy any use spaces at all: If you're targeting high-end, you're going to manually parallelize. If you're targeting the low-end, OpenMP is too slow.
                    Of course, this is all in the sphere of parallelism. For general concurrency that abstract multi-processing with some model or language, you might actually find OpenMP buried somewhere in the compilers with the appropriate uses (like cores >= 12). I think some of the CSP\Green threads python libraries mix and much Duff's devices, OpenMP, pthreads and good old locks and mutexes in their code. I suppose that's a legitimate use case
                    You're full of crap.

                    Comment


                    • #40
                      Originally posted by jrch2k8 View Post

                      Latency and throughput on extremely high network loads, BSD network tcp stack is quite cool in features for this cases.
                      I'd love to see examples. If it's true I wonder why BSD is nearly non existent in server market share, mission critical computing and so on.

                      Solaris firewall in its time was more robust than IPtables on certain very very complex routing scenarios(i think bpf took a lot from it)
                      There's better option than iptables: nftables.

                      OpenBSD is quite tough on network security(or at least was for a good while), equivalent to Linux+grsec+hardening
                      I posted article for few times which proves it's not. Generic Linux distributions like Ubuntu and others that come with MAC are more secure.

                      Comment

                      Working...
                      X