Announcement

Collapse
No announcement yet.

Multi-Core Scaling In A KVM Virtualized Environment

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multi-Core Scaling In A KVM Virtualized Environment

    Phoronix: Multi-Core Scaling In A KVM Virtualized Environment

    Earlier this week we published benchmarks comparing Oracle VM VirtualBox to Linux KVM and the Linux system host performance. Some of the feedback from individuals said that it was a bad idea using the Intel Core i7 "Gulftown" with all twelve of its CPU threads available to the hardware-virtualized guest since virtualization technologies are bad in dealing with multiple virtual CPUs. But is this really the case? With not being able to find any concrete benchmarks in that area, we carried out another set of tests to see how well the Linux Kernel-based Virtual Machine scales against the host as the number of CPU cores available to each is increased. It can be both good and bad for Linux virtualization.

    http://www.phoronix.com/vr.php?view=15554

  • #2
    Looking at the graph it looks like more than 4 core is causing a bottleneck. I wonder were its located. Is KVM memory restricted, cache restricted, or even bus restricted? Knowing it would surely help developers improve their code. Could you try again the tests using slower/faster settings for RAM, cache and bus?

    Comment


    • #3
      Michael, quick tip: you don't need to go into the BIOS to enable/disable CPU's, both real and with HT, you can do it all with /sys.

      For example, on a dual-quadcore Xeon machine (with HT) I get:
      Code:
      root:/sys/devices/system/cpu# cat cpu0/topology/core_siblings_list 
      0,2,4,6,8,10,12,14
      This means that these 8 are the first node (uneven numbers are the other).

      So, we have 8 cpu's. To find out which ones are paired in the same physical core by HT:
      Code:
      root:/sys/devices/system/cpu# cat cpu0/topology/thread_siblings_list    
      0,8
      This means that (0,8) are a HT pair. By going through all the others I built a picture to help me when I want to test specific configurations (specific to this machine, but an example):
      Code:
      Package #0              Package #1
      ╔══════════════╗        ╔══════════════╗
      ║┌─────┐┌─────┐║        ║┌─────┐┌─────┐║
      ║│  0  ││  2  │║        ║│  1  ││  3  │║
      ║│  8  ││ 10  │║        ║│  9  ││ 11  │║
      ║└─────┘└─────┘║        ║└─────┘└─────┘║
      ║┌─────┐┌─────┐║        ║┌─────┐┌─────┐║
      ║│  4  ││  6  │║        ║│  5  ││  7  │║
      ║│ 12  ││ 14  │║        ║│ 13  ││ 15  │║
      ║└─────┘└─────┘║        ║└─────┘└─────┘║
      ╚══════════════╝        ╚══════════════╝

      So now it's just a matter of doing echo 0 or 1 > /sys/devices/system/cpu/cpuN/online to enable/disable the cores.

      I have a quick hackish script to do this second part, here:
      Code:
      #!/usr/bin/env python
       # -*- coding: utf-8 -*-
      
      ## cpucontrol.py version 0.12
      ##      Quick hack to enable/disable cores on a linux machine
      ##      by Ivo Anjo <knuckles@gmail.com>
      
      import os
      import sys
      
      prefix='/sys/devices/system/cpu/'
      
      def get_max_cpu_id():
              return int(open(prefix + "present").readlines()[0].split("-")[1])
      
      def change_cpu_state(cpu, state):
              os.system("echo '" + str(state) + "' > " + prefix + "cpu" + str(cpu) + "/online")
      
      if len(sys.argv) != 2:
              print "syntax: " + sys.argv[0] + " comma-separated list of cpus to put online"
              print "\t\texample: 1,2,3"
              print "\t\texample: all"
              print "\t\tnote: cpu0 cannot be disabled"
              sys.exit(1)
      
      # disable all cpus
      for i in range(1, get_max_cpu_id()+1):
              change_cpu_state(i, 0)
      
      # hack to support 'all'
      if sys.argv[1].strip().lower() == 'all':
              # enable all
              for i in range(1, get_max_cpu_id()+1):
                      change_cpu_state(i, 1)
      else:
              # enable the ones requested
              for i in sys.argv[1].split(','):
                      if int(i) != 0:
                              change_cpu_state(i, 1)
      
      print "Online:", open(prefix + "online").readlines()[0],
      print "Possible:", open(prefix + "present").readlines()[0],

      Comment


      • #4
        I think one of the more interesting conclusions is that hyper-threading and visualisation isn't always a good thing.

        But what I find the most weird is when going from 4 to 6 cores causes slowdowns!

        I would think that KVM has the option to limit the number of cores given to a virtual machine right? If so, I would be interested to see what happened if you left the host at 6 cores (no hyperthreading), then scaled the number of cores given to the host from 1-6, then host with 6 cores/12 threads (hyperthreading on) with the guest being scaled/tested with 1-12 cores.

        I do wonder how much the different generations of hyperthreading hurt or help performance, P4 HT, Atom 330 HT, Atom D510 HT, Early i7 HT (eg 920), and later i7 HT (eg the 960 you have and maybe a 860 as well).

        Comment


        • #5
          Interesting tests. However, in the preponderance of the tests, enabling 6 or 12 cores does result in a performance increase, or at least, the lack of a decrease. TTSIOD, NAS LU.A, and GraphicsMagick are the only tests that clearly show that enabling more than 4 cores in the guest results in a marked performance decrease.

          This may mean that added overhead due to the number of cores is limiting the effectiveness of the core count scalability in the guest, but even a slight performance increase or flatline does not convince me to disable some of my cores or HT in the guest.

          It can, and probably will get better from here, as new hardware support for virtualization is introduced over the years, and as the software hypervisors get smarter. But even today it is apparent to me that running at least KVM in a server workload environment with this many SMP threads is not a problem, unless you constantly run benchmarks and/or GraphicsMagick as your daily workload.

          Comment


          • #6
            Also noteworthy is the observation that the tests where the guest shows a performance decrease with >4 cores are the same tests that show a logarithmic flattening or even a slight decrease in performance on the host. If the host's cores aren't scaling well with that benchmark, then maybe the fault lies with the benchmark itself, and not with the virtualization platform.

            Comment


            • #7
              Originally posted by allquixotic View Post
              Interesting tests. However, in the preponderance of the tests, enabling 6 or 12 cores does result in a performance increase, or at least, the lack of a decrease. TTSIOD, NAS LU.A, and GraphicsMagick are the only tests that clearly show that enabling more than 4 cores in the guest results in a marked performance decrease.

              This may mean that added overhead due to the number of cores is limiting the effectiveness of the core count scalability in the guest, but even a slight performance increase or flatline does not convince me to disable some of my cores or HT in the guest.

              It can, and probably will get better from here, as new hardware support for virtualization is introduced over the years, and as the software hypervisors get smarter. But even today it is apparent to me that running at least KVM in a server workload environment with this many SMP threads is not a problem, unless you constantly run benchmarks and/or GraphicsMagick as your daily workload.
              Those performance decreases are syntetic as you pointed out. However, as I said previously, they results of a bottlenecks somewhere. Some real world usage could be impacted by this bottleneck. Havint the possibility to bench it and resolve were the bottleneck lies, could possibily helps greatly the developers. As Phoronix have the infrastructure, I do think it would be real nice for them do to so. By the same time, it could show off how to optimize your server (i.e. do faster ram result in better virtualization performance ...). It would be a win-win situation. Anyhow, they are free to do what they want, I'm only suggesting

              Comment


              • #8
                Originally posted by werfu View Post
                Those performance decreases are syntetic as you pointed out. However, as I said previously, they results of a bottlenecks somewhere. Some real world usage could be impacted by this bottleneck. Havint the possibility to bench it and resolve were the bottleneck lies, could possibily helps greatly the developers. As Phoronix have the infrastructure, I do think it would be real nice for them do to so. By the same time, it could show off how to optimize your server (i.e. do faster ram result in better virtualization performance ...). It would be a win-win situation. Anyhow, they are free to do what they want, I'm only suggesting
                As we do usually in these cases. If people have constructive ways of quantifying the issues, we are open to it. Realistically, in the cases of the virtualization projects, the companies behind them most likely have more hardware than is necessary.

                I liked the results, the historic statement that virtualization doesn't work with multiple CPUs has now been reduced to "for some workloads" it collapses at some point. There are probably half a dozen more targetted tests (with/without HT, per package with/without HT, etc).

                Comment


                • #9
                  Hi,

                  i'm running 4x12cores opteron kvm guests at work (true cores,not hyperthreading), so i can make test for you if you want !.

                  Comment


                  • #10
                    Hmmm... According to my tests some time ago Apache scaled very well inside a KVM on a HP DL 380 (2 CPUs, 4 Cores, HT enabled):

                    http://www.tauceti.net/kvm-benchmark.../composite.xml

                    Here are my complete benchmarks (text is in German, but just click the links for the other benchmarks):

                    http://www.tauceti.net/roller/cetixx...ozone_graphics

                    Comment

                    Working...
                    X