Announcement

Collapse
No announcement yet.

LLVM Clang 15 Enables Faster Square Root Instructions For AMD Zen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by AdrianBc View Post
    ... but the reciprocal approximation instructions are not microcoded, but hardwired, ...
    This is what I was wondering. Do you know this for a fact? Because all I see is talk about latency and this does not encourage me to believe there is much hardwiring going on.

    Comment


    • #22
      Originally posted by coder View Post
      My experience of compressing images to attach to emails and bug reports matches up with what brucethemoose said. JPEG compression with 4:2:0 color subsampling and quality of ~85 looks nearly perfect and is rather consistently smaller than a 24-bit PNG. Once you go to 8-bit, PNGs are very rarely still "bit-perfect".
      Using 420 chroma on webpages with text is shot in foot. It is not so terrible on black on white text, but the moment you see text on diffrent background OR colored text, image starts to look like total crap. 420 chroma does good job at movies, and photos of real world where you have noise etc. anyway but not for anything engineered in pixel by pixel basis (like fonts).

      What is Chroma Subsampling and where is this visible? Chroma subsampling is a type of compression that reduces the color information in a signal in favor of luminance data.


      Also PNG is better at compressing images that contains a lot of same color etc. which most graphs/webpages are. Unless you want to try JPEG-XL or AVIF, PNG is quite no-brainer over JPG in such cases. Leave JPG to what it was designed to, like photos.

      Comment


      • #23
        Originally posted by sdack View Post
        This is what I was wondering. Do you know this for a fact? Because all I see is talk about latency and this does not encourage me to believe there is much hardwiring going on.
        Agner Fog claims it decodes to 1 uOP, on AMD CPUs, as far back as the K7.


        Contrast that to their transcendental functions, which still decode to between 30 and 130 uOPS on Zen 3.

        Comment


        • #24
          Originally posted by piotrj3 View Post
          Using 420 chroma on webpages with text is shot in foot. It is not so terrible on black on white text, but the moment you see text on diffrent background OR colored text, image starts to look like total crap.
          Yes, I know. Text is mostly designed to contrast with the background in the luma channel. You're right that there will be more artifacts where there are also color contrasts.

          Originally posted by piotrj3 View Post
          Also PNG is better at compressing images that contains a lot of same color etc. which most graphs/webpages are.
          I know, right? That's why it seems counterintuitive, at first. However, JPEG uses zigzag encoding of DCT blocks, which means that if you have a constant color (i.e. your AC coefficients are all zero), the entire DCT block basically goes away. Then, the DC coefficients are run-length encoded. This makes it amazingly efficient at encoding a region of constant color!

          Originally posted by piotrj3 View Post
          PNG is quite no-brainer over JPG in such cases. Leave JPG to what it was designed to, like photos.
          Or, you could actually try it yourself. You might be surprised!

          Comment


          • #25
            Originally posted by sdack View Post
            This is what I was wondering. Do you know this for a fact? Because all I see is talk about latency and this does not encourage me to believe there is much hardwiring going on.
            Yes.

            The throughput and latency of the Intel and AMD CPUs can be found on the Web in several places, e.g. in Agner Fog's instruction tables or at the site http://instlatx64.atw.hu/ .

            The instructions to be searched are RSQRTSS, RSQRTPS, VRSQRTSS, VRSQRTPS.

            On most CPUs, the throughput of these instructions is 1 per cycle, but many of the older or cheaper CPUs, especially those from Intel, do not have enough reciprocal sqrt computation units to cover the entire width of a 256-bit or 128-bit register, so they might need 2 cycles for a 256-bit result or even 4 cycles for a 256-bit result and 2 cycles for an 128-bit result.

            However, most recent CPUs have enough reciprocal sqrt computation units to have a throughput of 1 instruction per cycle at any register width.

            While the throughput of a reciprocal sqrt unit, which is a dedicated hardware device, is always 1 per cycle, the time required for the computation might be longer than 1 cycle, in which case the units are pipelined. While the throughput is 1 per cycle, the latency may be larger and for many older CPUs it might be up to 5 cycles.

            The same is true for most floating-point operations, due to their complexity they are implemented by pipelined units, whose throughput is 1 per cycle but whose latency is typically between 3 cycles and 6 cycles.





            Comment


            • #26
              There are some nice youtube videos regarding fast sqrt in Quake.

              Originally posted by birdie View Post
              PNG compresses screenshots made mostly of text a lot better than JPEG unless JPEG at 70% with a ton of fringing is OK for you.
              JPEG: 230,595 bytes (looks like absolute crap)
              PNG: 183,786 bytes (100% pristine and bit perfect)
              You compared 2 different images, but i recompressed your PNG with JPG to the same size (180 kB) and the quality difference is visible. While I don't think its absolute crap, its objectivly worse. And yeah I don't get it either, it would be super easy to let your software compress with both methods and select PNG if its smaller.
              Another hint, you made a screenshot with subpixel rendering, that is bad if someone has a different LCD-panel or not viewing at 1:1 pixel size. Also your PNG would be smaller without it.
              And another one, reducing the color palette gives only small gains while reduced image quallity is sometimes visible.

              Comment


              • #27
                Originally posted by Anux View Post
                There are some nice youtube videos regarding fast sqrt in Quake.


                You compared 2 different images, but i recompressed your PNG with JPG to the same size (180 kB) and the quality difference is visible. While I don't think its absolute crap, its objectivly worse. And yeah I don't get it either, it would be super easy to let your software compress with both methods and select PNG if its smaller.
                Another hint, you made a screenshot with subpixel rendering, that is bad if someone has a different LCD-panel or not viewing at 1:1 pixel size. Also your PNG would be smaller without it.
                And another one, reducing the color palette gives only small gains while reduced image quallity is sometimes visible.
                I could have tried to convert it to a 8bit paletter but that would have been unfair and people could start screaming that I'm cheating. Disabling antialiasing is also beneficial for PNG and also a sort of cheating. So, I just went ahead and captured and saved is as is.

                Comment


                • #28
                  Originally posted by birdie View Post
                  I could have tried to convert it to a 8bit paletter but that would have been unfair and people could start screaming that I'm cheating.
                  Nah, I recommended not to use 8 bit palette because of the small gains. BTW PNG autmatically detects if there are less than 257 colors in an image and uses a custom 8 bit palette in that case with no quality loss.
                  Disabling antialiasing is also beneficial for PNG and also a sort of cheating. So, I just went ahead and captured and saved is as is.
                  Not disabling anti aliasing, that would look awful, only subpixel AA. If you zoom in on your image you should see weird colors around each letter, for example look here:
                  lk7mCZW.png
                  The less DPI your display has, the more it shows, works well only on high DPI (> 100). And ofcourse if your panel has a different sub pixel layout it looks extra horrible on other panels.

                  Comment

                  Working...
                  X