Announcement

Collapse
No announcement yet.

Mercurial Revision Control System Continues Rust'ing For Better Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by habilain View Post
    case in point: in order to get Git to work well for the Windows code base, Microsoft had to write a custom file system driver
    Can you provide a link to that? I would be interested to see that.

    Comment


    • #22
      Originally posted by phoronix View Post
      ... working to optimize its performance in part by transitioning more of it to the Rust programming language...

      http://www.phoronix.com/scan.php?pag...More-Rust-2021
      moving to rust is about code correctness - not about performance.

      Comment


      • #23
        Originally posted by lowflyer View Post

        Can you provide a link to that? I would be interested to see that.


        But it wasn't git's fault. Microsoft's code was just all in one big repo.

        Comment


        • #24
          Originally posted by bug77 View Post



          But it wasn't git's fault. Microsoft's code was just all in one big repo.
          It's a combination of Git's and Microsoft's fault. Git is designed around a particular workflow and a particular set of assumptions. One of those assumptions is that you don't have truly big repos (because, if nothing else, Git's diff operation gets terribly slow at this point). Microsoft knew that, but decided to use Git anyway for political reasons (i.e. to show that they're engaging with the open source community).

          And for the anyone who says "monorepos are dumb": they have their place. Git's not designed to handle that use case, and that's a perfectly valid design decision. But the fact is that big monolithic repositories do exist, and they exist for valid reasons. For those repos, Git isn't a good choice simply due to the fact it wasn't designed to handle them, and you'd be better off with another tool (like Mercurial or Perforce perhaps). Similarly, you wouldn't use Git to track large binary files, because Git is a source code management tool and, by design, doesn't handle large binary files well (there are some extensions for this, but to be honest they're not as good as some other tools).

          Comment


          • #25
            Originally posted by habilain View Post

            It's a combination of Git's and Microsoft's fault. Git is designed around a particular workflow and a particular set of assumptions. One of those assumptions is that you don't have truly big repos (because, if nothing else, Git's diff operation gets terribly slow at this point). Microsoft knew that, but decided to use Git anyway for political reasons (i.e. to show that they're engaging with the open source community).

            And for the anyone who says "monorepos are dumb": they have their place. Git's not designed to handle that use case, and that's a perfectly valid design decision. But the fact is that big monolithic repositories do exist, and they exist for valid reasons. For those repos, Git isn't a good choice simply due to the fact it wasn't designed to handle them, and you'd be better off with another tool (like Mercurial or Perforce perhaps). Similarly, you wouldn't use Git to track large binary files, because Git is a source code management tool and, by design, doesn't handle large binary files well (there are some extensions for this, but to be honest they're not as good as some other tools).
            I'm pretty sure you don't get to 300GB with source code only. They simply didn't break things up where it made sense to do so (probably Windows' #1 problem) and shoved too many assets in there.
            You can say not being able to manage that big a repository is a Git limitation, but honestly, if Git could handle that, could a human? You'd get lost in no time at all. (And no, I'm not saying Git couldn't shouldn't be optimized whenever some bottleneck is identified.)

            Comment


            • #26
              Originally posted by habilain View Post

              It's a combination of Git's and Microsoft's fault. Git is designed around a particular workflow and a particular set of assumptions. One of those assumptions is that you don't have truly big repos (because, if nothing else, Git's diff operation gets terribly slow at this point). Microsoft knew that, but decided to use Git anyway for political reasons (i.e. to show that they're engaging with the open source community).

              And for the anyone who says "monorepos are dumb": they have their place. Git's not designed to handle that use case, and that's a perfectly valid design decision. But the fact is that big monolithic repositories do exist, and they exist for valid reasons. For those repos, Git isn't a good choice simply due to the fact it wasn't designed to handle them, and you'd be better off with another tool (like Mercurial or Perforce perhaps). Similarly, you wouldn't use Git to track large binary files, because Git is a source code management tool and, by design, doesn't handle large binary files well (there are some extensions for this, but to be honest they're not as good as some other tools).
              Do you even git submodule?

              Comment


              • #27
                Originally posted by bug77 View Post

                I'm pretty sure you don't get to 300GB with source code only. They simply didn't break things up where it made sense to do so (probably Windows' #1 problem) and shoved too many assets in there.
                You can say not being able to manage that big a repository is a Git limitation, but honestly, if Git could handle that, could a human? You'd get lost in no time at all. (And no, I'm not saying Git couldn't shouldn't be optimized whenever some bottleneck is identified.)
                The bigger issue isn't the 300GB of history, but as I understand it, it's more to do with Git's diff not really scaling to big repos with large amounts of actively changed files + commits. There's lots of ways to work around this; for example Linux has a federated model in committing which reduces a lot of churn in the commits (and it's where the GC approach pays dividends in keeping the number of commits in history lower), but some organisations will want to keep that history, if only to keep tabs on what their employees are doing each day. When you start doing that, i.e. working against Git's design choices, that's when Git starts to be a problem.

                Originally posted by Luke_Wolf View Post
                Do you even git submodule?
                At the risk of repeating myself, the issue is when for some reason, an organisation decides not to follow Git's best usage guidelines. Using submodules and breaking up the repository into smaller chunks would let Git work at higher performance. However, I've named three of the biggest, most successful tech companies (Facebook, Google, Microsoft) who have decided not to follow that approach, so I suspect it's true that in some places it's possible to derive benefits from approaches other than what Git's optimum workflow prescribes.

                Comment


                • #28
                  Originally posted by dirlewanger88

                  I'm a professional C programmer. I can code circles around your slow Rust junk.
                  At least Rust programmers never need to deal with GNU autoshit.

                  Also, "slow Rust junk" shows that you don't know jackshit about Rust, you brain-dead troll. Unless you're also saying Clang/LLVM is "slow junk".

                  Comment


                  • #29
                    Originally posted by habilain View Post
                    The bigger issue isn't the 300GB of history, but as I understand it, it's more to do with Git's diff not really scaling to big repos with large amounts of actively changed files + commits. There's lots of ways to work around this; for example Linux has a federated model in committing which reduces a lot of churn in the commits (and it's where the GC approach pays dividends in keeping the number of commits in history lower), but some organisations will want to keep that history, if only to keep tabs on what their employees are doing each day. When you start doing that, i.e. working against Git's design choices, that's when Git starts to be a problem.
                    Did you read the article I linked? It clearly says Microsoft worked around this by implementing a virtual file system to fetch files on demand. That's not a fix for a slow diff.
                    (Not to mention it's a direct violation of the DVCS philosophy that every copy is a fully working copy.)

                    Comment


                    • #30
                      Originally posted by bug77 View Post

                      Did you read the article I linked? It clearly says Microsoft worked around this by implementing a virtual file system to fetch files on demand. That's not a fix for a slow diff.
                      (Not to mention it's a direct violation of the DVCS philosophy that every copy is a fully working copy.)
                      Yep, I'm aware. However, one of the other things that VFSforGit does is manage Git's internals, so that Git doesn't have to check every file for diffs, only the ones that have actually changed, and that is how you fix a slow diff - by reducing the number of files to diff, where possible. To quote from https://vfsforgit.org/

                      "VFS for Git also manages Git's internal state so that it only considers the files you have accessed, instead of having to examine every file in the repository. This ensures that operations like status and checkout are as fast as possible."

                      And indeed, VFSforGit works by essentially turning Git as a DVCS into a centralised VCS. However, it's also inevitable for any kind of megarepo, which is what it's designed to handle.
                      Last edited by habilain; 12 February 2021, 11:03 PM.

                      Comment

                      Working...
                      X