Announcement

Collapse
No announcement yet.

Debian Wheezy To Take Up 73 CDs Or 11 DVDs

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Does apt even support installing more then one deb at a time?

    Comment


    • #17
      Originally posted by curaga View Post
      I was thinking more about Debian, TFA. I sure hope they don't go packing their debs at those extreme settings.

      Fedora yeah, it's been too heavy for that class for years. BTW, what phone has 1gb ram? Most come with half a gig if that.
      A phone I bought in April 2011 has a gig of RAM. So does my current one, and my mom's. My next planned upgrade has 2GB. http://www.phonearena.com/phones/Sam...Verizon_id7114

      BTW, the compression / decompression memory for LZMA/LZMA2 (of which Xz is just one implementation) is asymmetrical. It may require several gigs to compress but only a few hundred MB to decompress. And this is per archive with no regard for how big that archive is. So if you have 10 packages and each one is in an archive, and one of them is 512 KB, one of them is 10 MB, and one of them is 60 GB, all of them are going to occupy the same amount of resident program memory during decompression. And because dpkg can only install one package at a time, they won't be decompressing simultaneously. Have you ever watched a large update? It goes through the list "Unpacking" each package in serial (i.e. one at a time). So the amount of memory used by Xz decompression is only going to be about 64 MB total (and that memory will get allocated, then freed, then re-allocated again as each dpkg process unpacks a different archive file).

      Comment


      • #18
        Originally posted by GreatEmerald
        What do they put into them, anyway? Every single existing DEB package that works with that Debian release?
        Yes.

        Originally posted by curaga View Post
        BTW, what phone has 1gb ram? Most come with half a gig if that.
        I wasn't expecting these many but there you go: http://bit.ly/OXy7ni

        Comment


        • #19
          Originally posted by allquixotic View Post
          Have you ever watched a large update? It goes through the list "Unpacking" each package in serial (i.e. one at a time).
          Now that you mention it, wouldn't it be a smart thing to analyze the dependencies before the update process and uncompressing/installing packages that don't affect each other in parallel?

          Comment


          • #20
            Originally posted by devius View Post
            Now that you mention it, wouldn't it be a smart thing to analyze the dependencies before the update process and uncompressing/installing packages that don't affect each other in parallel?
            Only if the decompression algorithm isn't using all your CPUs. From what I know of LZMA, it's technically capable of being parallelized, but it's not embarrassingly parallel, so you can't just scale it up effortlessly.

            You could actually decompress all of the packages in parallel regardless of their dependencies, and start installing (in serial) at the lowest level leaf of the dependency tree that is decompressed.

            Problem is, most filesystems fall down under heavy parallel load. You get a lot of context switching because the kernel tries to give each decompression process its own fair share of CPU time, and the filesystem tries to give each process its own fair share of IO time, and so on. Look at some of Michael's benchmarks for 64MB random reads/writes with 8 or 16 threads, even on processors that have that many physical threads. It goes down to a crawl.

            Why? Because, assuming the individual archives have relatively low internal fragmentation, you introduce a lot more seeks into the operation when you have many processes doing reads and writes in parallel. There aren't really any good solutions that maintain >90% throughput under this type of load, without being inordinately unfair to one process and just letting that one process run the show for an extended period of time. But our modern stacks are configured for just the opposite scenario because of the demand for "responsive" desktops.

            Of course, if you have an SSD, seeks are basically a no-op. Telling an SSD to seek just elicits a reply like "LOL OK" because there are no moving parts involved that need to move before the data can be retrieved. So you can get great parallel performance on an SSD.

            Alternatively, if you have an insane amount of RAM (greater than 16GB), you could store all the archives in a RAM disk and decompress them from there -- in parallel. That would be "holy hell" fast, and then you could just directly copy the uncompressed data from the ramdisk to the HDD/SDD for long-term storage during package installation.

            Shoot; the packages would probably already be in RAM anyway, because you just downloaded them. If only there were a way to "force" those pages to stay in memory between the network-downloading phase and the unpacking phase, so that you don't have to download the packages, write them to disk, read the compressed data from disk, then write the uncompressed data back to disk. You'd just download them straight to RAM, then write the uncompressed data to disk. That eliminates writing the compressed data to disk and then reading the compressed data from disk. But you need as much RAM as the size of the packages you're downloading.

            OR...

            OK, this is cool. This is cool. We're getting somewhere.

            Download each package to a buffer in RAM. As they are downloading, directly wire up Xz (which is a streaming decompressor) to the buffer, so that you are decompressing in parallel with the download. Since a lossless decompressor reading from RAM is going to be many orders of magnitude faster than any internet connection, you're practically guaranteed that the Xz decompressor will be sitting there twiddling its thumbs for minutes on end, occasionally waking up to read a block of data and decode it.

            As it's decoding, it's writing its results to disk. So at the same time as you're pulling the data off of the network, you could even set it up so that you mmap the network buffer itself to make it a zero copy architecture, so that the data travels: NIC -> buffer in RAM -> XZ reads data from buffer (zero copy) -> XZ writes data to disk.

            Then, once Xz writes the decoded data to disk, you just "mv" (re-link) the files to the correct locations.

            Installation of packages would go from taking many minutes to taking exactly as long as it takes to download the packages. And the effect would be better the slower your internet connection is.

            I think I'm on to something. The only caveat is that you'd have to have enough dependency information up front to know the correct order to download the packages (because presumably you can't install packages before all their dependencies are installed, because the install scripts may depend on some other package already being there). But if you have multiple isolatable independent tree structures (for example if you install Eclipse and all of its dependencies and GIMP and all of its dependencies in the same go) you could parallelize those separate trees and use all of your CPU cores while your network downloads multiple files from the network and decodes them on the fly......

            I'm smart.
            Last edited by allquixotic; 07-10-2012, 03:16 PM.

            Comment


            • #21
              Originally posted by devius View Post
              Now that you mention it, wouldn't it be a smart thing to analyze the dependencies before the update process and uncompressing/installing packages that don't affect each other in parallel?
              Even packages that do not depend on each other might need to update the same configuration file(s). Also, how do you handle failed installations? Do you abort all others, or do you wait for them to complete before failing? How do you write a log file that's actually readable (or do you write multiple log files and merge them post-install)? I'm not saying it can't be done, but it's probably too much work for too little gain.

              Originally posted by garegin View Post
              just to show the user unfriendliness of debian. it's the only major distro that does not have a single install disk. it's either netinstall, which has to pull the intall content from the internet or multiple disks. the only sane option is to use debian live which does come as a single disk
              Afaik you only need the first disk for base installation, all the other disks contain optional software (at least it used to be this way the last time I used rotating media, aka. hundreds of years ago). I've gone with netinstall for the last few installations though. I need a very limited number of packages, so why would I download a full CD/DVD? Netinstall was definitely faster than download+burn.

              Originally posted by rohcQaH View Post
              By the time you've finished downloading and burning all those discs, half of the packages contained within are outdated.
              You are aware you're talking about Debian, right? So unless you it takes you more than a couple years you'll probably only need to upgrade a few packages post-install

              Comment


              • #22
                Originally posted by Wildfire View Post
                You are aware you're talking about Debian, right? So unless you it takes you more than a couple years you'll probably only need to upgrade a few packages post-install
                Unless you use unstable/experimental, or compile from source on top of Debian

                Also, he might have 56k, leave him alone

                Comment


                • #23
                  Originally posted by allquixotic View Post
                  Only if the decompression algorithm isn't using all your CPUs. From what I know of LZMA, it's technically capable of being parallelized, but it's not embarrassingly parallel, so you can't just scale it up effortlessly.
                  I've monitored cpu usage during decompression and even with an SSD it hardly uses more than one core to decompress. Actually, even real-world compression isn't as paralleled as the synthetic benchmarks make it look like. I/O seems to be much more important, that's why I loved this idea:

                  Originally posted by allquixotic View Post
                  Download each package to a buffer in RAM. As they are downloading, directly wire up Xz (which is a streaming decompressor) to the buffer, so that you are decompressing in parallel with the download. (...) As it's decoding, it's writing its results to disk. So at the same time as you're pulling the data off of the network, you could even set it up so that you mmap the network buffer itself to make it a zero copy architecture, so that the data travels: NIC -> buffer in RAM -> XZ reads data from buffer (zero copy) -> XZ writes data to disk.

                  Then, once Xz writes the decoded data to disk, you just "mv" (re-link) the files to the correct locations.

                  Installation of packages would go from taking many minutes to taking exactly as long as it takes to download the packages. And the effect would be better the slower your internet connection is.
                  As for the dependencies it would require that the package contents can be read without the package being completely downloaded and the package metadata is the first thing in the file. Or something like that. Maybe a kind of uncompressed package that contains an information file and the actual compressed package contents. Does this exist? I mean, in the open-source world.

                  Anyway, even decompressing while downloading would probably also provide some gains by itself.

                  Comment


                  • #24
                    Originally posted by devius View Post
                    As for the dependencies it would require that the package contents can be read without the package being completely downloaded and the package metadata is the first thing in the file. Or something like that. Maybe a kind of uncompressed package that contains an information file and the actual compressed package contents. Does this exist? I mean, in the open-source world.

                    Anyway, even decompressing while downloading would probably also provide some gains by itself.
                    The metadata (including dependency info) is already available in a separate file for all modern distros. For Fedora it's usually in a file called something like primary.xml.gz (or xz). The file is several megabytes worth of dependency info. If you didn't have this up-front, then figuring out how to "yum install" some package would involve downloading EVERY package in the repository to figure out the dependency graph.

                    To put it simply, you are under-estimating the information we already have available. You can assume that cheaply/efficiently obtaining the dependency graph for the desired packages is an easy and standard feature for Linux package managers.

                    Comment


                    • #25
                      Originally posted by allquixotic View Post
                      BTW, the compression / decompression memory for LZMA/LZMA2 (of which Xz is just one implementation) is asymmetrical. I... about 64 MB total (and that memory will get allocated, then freed, then re-allocated again as each dpkg process unpacks a different archive file).
                      I'm aware of all of that. 64mb to decompress one archive is _way_ too much.

                      As for dpkg not supporting parallel installation, I haven't used Debian in about five years. I'd have thought them to have improved in this time, as even back then it was rather annoying to be unable to install something else in another terminal when one apt-get was doing its thing.

                      (yes, that problem could be solved simply by queuing. But true parallel install should be possible.)

                      Comment


                      • #26
                        dear allquixotic;

                        How To Optimize Apt Archives
                        Code:
                            sudo chmod u+x /etc/rc.local
                        Add following line into /etc/fstab
                        Code:
                            tmpfs /var/cache/apt/archives/ tmpfs defaults,noatime 0 0
                        Add following line into /etc/rc.local
                        Code:
                            mkdir /var/cache/apt/archives/partial
                        All deb package will be downloaded into ramfs, and keep in mind not to make to much upgrade at one time to save ram space, usually 512MB should be enough for most install and upgrade.

                        Close enough right? Credits to the above commands go to here

                        Comment


                        • #27
                          Software on a stick.

                          I like the idea of having one big image to put on a usb stick. Could this be an iso bigger than the 4gig limit - is there a way to make an 80gig iso image to put on a usb 3.0 stick? All top quality free software tons on one stick.

                          Comment


                          • #28
                            Rediculous

                            Can't I just use 1 cd?

                            Comment


                            • #29
                              Debian packages are an ar archive (same as static libraries), containing 2 compressed tarballs: the first has metadata, the second has the package content. I think they want to move from tar.gz to tar.xz for these two tarballs.
                              Debian policy requires that anything that can be compressed must be. That includes manpages, fonts, and much of the documentation. Besides that, you have shell scripts and stripped binaries (plus debug symbols for some). There isn't much that's highly compressible.
                              It's quite possible to install from 1 CD; however, the "Debian operating system" includes every package in main. Hence a 73-CD media set; almost noone wants all of it, but it is available (for those who want to set up a workstation offline, or such).
                              Last time I tried installing everything (press + over uninstalled in aptitude), there were ~400 conflicts, it would take ~90 GB, and it took nearly 2 minutes to resolve the order. The archives are a lot larger now, so it might be near 200 GB.
                              The install media they're talking about is for the next Debian stable; that means you might have a DVD or two worth of updates by the next release. And yes, they do offer a disk containing all updates.
                              A minimal install of Debian is around 300 MB. It will run on i486, with minimum RAM in the 32-64 MB range.

                              Does anyone know of a distro that does real package management and allows parallel operations?
                              Systems that don't handle dependencies are irrelevant; I mean something where you can't get a race condition by say starting gnome install and then (while that's in progress) uninstalling GTK.

                              Comment


                              • #30
                                Originally posted by linuxease.com View Post
                                I like the idea of having one big image to put on a usb stick. Could this be an iso bigger than the 4gig limit - is there a way to make an 80gig iso image to put on a usb 3.0 stick? All top quality free software tons on one stick.
                                If you put it on USB, why would you use an ISO-image at all?

                                Comment

                                Working...
                                X