Announcement

Collapse
No announcement yet.

It's Past Time To Stop Using egrep & fgrep Commands, Per GNU grep 3.8

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    I whole heartedly agree with ssokolow .

    I would also go so far as to say that the much meme'd "many small programs that communicate over stdin/stdout with pipes in between" concept was never a particularly wonderful idea, and arose at the time because C was difficult to link to libraries.

    The truth is that while stdin/stdout are adequate for the most basic interactive console applications, attempting to shove any significant amount of data through it will bring it to it's knees. It doesn't even require all that much, if you attempt to write a console game thinking it will be easier because you don't have to deal with assets and whatever else, you'll very quickly run into learning you need to batch your writes to stdout rather than just naively writing to the screen line by line. To further underline how dramatic the effect of using stdin/stdout as an IPC method is... I have some utilities I wrote for use at work but as it turned out doing:

    Code:
    foo bar.txt > baz.txt
    caused the utility to take 15 minutes or more to process the file, simply adding an -o flag to the utility and directly writing the output to a file brought the time down to a matter of seconds.

    once you realize that it rather quickly becomes obvious that writing against libraries rather than programs is much more ideal, especially as regards things like text processing.

    Comment


    • #42
      Originally posted by Luke_Wolf View Post
      the much meme'd "many small programs that communicate over stdin/stdout with pipes in between" concept was never a particularly wonderful idea, and arose at the time because C was difficult to link to libraries.
      Citation needed.

      Originally posted by Luke_Wolf View Post
      To further underline how dramatic the effect of using stdin/stdout as an IPC method is... I have some utilities I wrote for use at work but as it turned out doing:

      Code:
      foo bar.txt > baz.txt
      caused the utility to take 15 minutes or more to process the file, simply adding an -o flag to the utility and directly writing the output to a file brought the time down to a matter of seconds.

      once you realize that it rather quickly becomes obvious that writing against libraries rather than programs is much more ideal, especially as regards things like text processing.
      Challenge accepted.

      I tried them in the reverse order as what you described. First, direct output:
      Code:
      # sync
      # time dd if=/dev/zero bs=65536 count=65536 of=/tmp/4GB
      
      65536+0 records in
      65536+0 records out
      4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.82847 s, 1.5 GB/s
      
      real    0m2.846s
      user    0m0.010s
      sys     0m1.379s
      Now, stdout:
      Code:
      # rm /tmp/4GB
      # sync
      # time dd if=/dev/zero bs=65536 count=65536 > /tmp/4GB
      
      65536+0 records in
      65536+0 records out
      4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.6421 s, 1.6 GB/s
      
      real    0m2.644s
      user    0m0.067s
      sys     0m1.322s
      This suggests your issue is an instance of PEBCAK. Whenever you think you're smarter than Linux, check yourself.
      Last edited by coder; 05 September 2022, 06:28 AM.

      Comment


      • #43
        Originally posted by DRanged View Post

        sh and ksh are still heavely used in solaris, aix, hp-unix, tru64 and some others.
        YEP! OpenBSD defaults to ksh
        Had to look NetBSD up but in defaults to sh
        FreeBSD does sh or Tcsh depending on the user account but 14.0 will be sh for everyone (adding history to sh)
        Mac OS defaulted to ksh before an ancient bash and now uses zsh

        Comment


        • #44
          Originally posted by Lycanthropist View Post
          What is the difference between "[email protected]" and "[email protected]@"? I've never come across the latter before.
          Phoronix quoted the GNU info page (with all its original markup)—bad choice, but oh well. @ is a meta character in that markup, and so @@ is the literal at-sign.

          Comment


          • #45
            Originally posted by TuesdayPogo View Post
            Well, compared to glibc DT_HASH deprecation, at least this one has a sane deprecation process.
            grep has started the deprecation process now, and you can expect it in... maybe 1 year to be pulled through.
            DT_HASH was on the deprecation plank for... 16-17 years.

            Not sure if you're jesting or serious. 16y is an insane(unnecessary) amount of time, but (some people may argue that) 1y is also an insane speed.

            Comment


            • #46
              ssokolow I largely have the same thoughts as you, except in my case I use Scala as a scripting language which is supported on various levels and even without the JVM (scala-native/graalvm allows you to make statically linked binaries). There is also stuff like https://ammonite.io/. Scala also has some really nice libraries for doing things like parsing CLI arguments (i.e. https://ben.kirw.in/decline/).

              To me Python is too brittle (once you start using static typing you never go back) and Rust is overkill (Rust is fine for the problems it was originally designed to solve however I find its syntax painful to read and it can be a bit ceremonious in some circumstances due to its zero-cost abstraction principles). I would however gladly use either instead of shell, I also had to deal with getting woken up a decade ago when writing shell scripts.

              Comment


              • #47
                Originally posted by DRanged View Post

                sh and ksh are still heavely used in solaris, aix, hp-unix, tru64 and some others.
                Still a non-issue. If you are not using bash, don't rely on bash specific RC files and functions. Not to mention that you are unlikely to be using gnu grep in the first case on those systems...

                Comment


                • #48
                  Originally posted by coder View Post
                  Citation needed.
                  Someone clearly hasn't written in C/C++ or else you would know. There's a reason that nobody really uses raw makefiles, and GNU Autotools is referred to as Autohell. CMake isn't particularly wonderful either compared to other modern build files.

                  Originally posted by coder View Post
                  Challenge accepted.

                  I tried them in the reverse order as what you described. First, direct output:
                  Code:
                  # sync
                  # time dd if=/dev/zero bs=65536 count=65536 of=/tmp/4GB
                  
                  65536+0 records in
                  65536+0 records out
                  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.82847 s, 1.5 GB/s
                  
                  real 0m2.846s
                  user 0m0.010s
                  sys 0m1.379s
                  Now, stdout:
                  Code:
                  # rm /tmp/4GB
                  # sync
                  # time dd if=/dev/zero bs=65536 count=65536 > /tmp/4GB
                  
                  65536+0 records in
                  65536+0 records out
                  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.6421 s, 1.6 GB/s
                  
                  real 0m2.644s
                  user 0m0.067s
                  sys 0m1.322s
                  This suggests your issue is an instance of PEBCAK. Whenever you think you're smarter than Linux, check yourself.
                  No, rather it suggests that you have never tried to write such things on your own. Much has been written on the subject of getting around the performance issues, I don't have it off hand but there was an article out there where one of the uutils devs was talking about all the jank he had to do to get the application he was writing up to the speed of the GNU util he was replacing, it's a common complaint on game dev forums, and there's plenty of cases of others talking about how to mitigate the issue https://perl.plover.com/FAQs/Buffering.html

                  Also see: https://danluu.com/term-latency/​

                  I suppose I should also mention my particular case involves throwing files containing tens or hundreds of millions of lines of text output through the program, which is rather different from you single line of 0s case.
                  Last edited by Luke_Wolf; 05 September 2022, 05:13 PM.

                  Comment


                  • #49
                    Originally posted by Luke_Wolf View Post
                    Someone clearly hasn't written in C/C++ or else you would know.
                    You made a specific, historical claim that:

                    the ... "many small programs that communicate over stdin/stdout with pipes in between" ... idea ... arose at the time because C was difficult to link to libraries.

                    Please cite at least one good source on that, or don't pull such BS out of your ass.

                    Originally posted by Luke_Wolf View Post
                    No, rather it suggests that you have never tried to write such things on your own.
                    I countered your claim with evidence produced by a specific, repeatable test. You've got to come back with better than that.

                    Originally posted by Luke_Wolf View Post
                    there's plenty of cases of others talking about how to mitigate the issue https://perl.plover.com/FAQs/Buffering.html
                    That, in no way, supports your claim. In fact, it even goes some way towards refuting your claim, stating: "only filehandles attached to the terminal are line-buffered by default." So, that + experimental evidence makes it pretty clear your performance problem wasn't simply due to redirecting stdout vs. writing directly to a file.

                    Originally posted by Luke_Wolf View Post
                    Also see: danluu.com/term-latency/
                    Again, this is apropos of nothing. It seems like you're just spamming us with random links, hoping something somewhere in them might somehow backup your nonsense claim.

                    Originally posted by Luke_Wolf View Post
                    I suppose I should also mention my particular case involves throwing files containing tens or hundreds of millions of lines of text output through the program, which is rather different from you single line of 0s case.
                    Sounds like another testable claim.

                    Code:
                    #include <stdio.h>
                    #include <stdlib.h>
                    
                    int main( int argc, char *argv[] )
                    {
                        const char str[] = "Lorem ipsum dolor sit amet, consectetur adipiscing elit"
                            ", sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n";
                    
                        const ssize_t count = (argc > 1) ? atol( argv[1] ) / (sizeof (str) - 1) : 1;
                        FILE *file = (argc > 2) ? fopen( argv[2], "w" ) : stdout;
                    
                        for (ssize_t i = 0; i < count; ++i) fprintf( file, str );
                    
                        fclose( file );
                    
                        return 0;
                    }
                    This time, stdout first:
                    Code:
                    # sync
                    # time ./printf_perf 4294967296 > /tmp/4GB
                    
                    real    0m3.879s
                    user    0m1.676s
                    sys     0m2.203s​
                    Now, direct file output:
                    Code:
                    # rm /tmp/4GB
                    # sync
                    # time ./printf_perf 4294967296 /tmp/4GB
                    
                    real    0m3.831s
                    user    0m1.472s
                    sys     0m2.359s
                    ​
                    Again, no substantial difference. The difference between them is less than the successive run-to-run variability.
                    Last edited by coder; 05 September 2022, 06:17 PM.

                    Comment


                    • #50
                      Originally posted by Luke_Wolf View Post
                      I whole heartedly agree with ssokolow .

                      I would also go so far as to say that the much meme'd "many small programs that communicate over stdin/stdout with pipes in between" concept was never a particularly wonderful idea, and arose at the time because C was difficult to link to libraries.
                      Funny thing about that. The original formulation of the UNIX philosophy says nothing about shell pipes or line-oriented plaintext formats. It just says "Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input."

                      That applies equally well to things like "Provide a --json flag or equivalent" and "Don't write GUI-only tools". Fundamentally, it's just about "be as accomodating as possible to people who want to automate/compose their tooling".

                      (The original 1978 formulation did a much better job of staying technology-agnostic than the paraphrase from 1994 by someone else that people tend to quote because it introduced "Write programs to handle text streams, because that is a universal interface.")

                      Comment

                      Working...
                      X