Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Multi-Core Scaling In A KVM Virtualized Environment

  1. #1
    Join Date
    Jan 2007
    Posts
    15,652

    Default Multi-Core Scaling In A KVM Virtualized Environment

    Phoronix: Multi-Core Scaling In A KVM Virtualized Environment

    Earlier this week we published benchmarks comparing Oracle VM VirtualBox to Linux KVM and the Linux system host performance. Some of the feedback from individuals said that it was a bad idea using the Intel Core i7 "Gulftown" with all twelve of its CPU threads available to the hardware-virtualized guest since virtualization technologies are bad in dealing with multiple virtual CPUs. But is this really the case? With not being able to find any concrete benchmarks in that area, we carried out another set of tests to see how well the Linux Kernel-based Virtual Machine scales against the host as the number of CPU cores available to each is increased. It can be both good and bad for Linux virtualization.

    http://www.phoronix.com/vr.php?view=15554

  2. #2
    Join Date
    Sep 2010
    Posts
    56

    Default

    Looking at the graph it looks like more than 4 core is causing a bottleneck. I wonder were its located. Is KVM memory restricted, cache restricted, or even bus restricted? Knowing it would surely help developers improve their code. Could you try again the tests using slower/faster settings for RAM, cache and bus?

  3. #3
    Join Date
    Jun 2006
    Location
    Portugal
    Posts
    543

    Default

    Michael, quick tip: you don't need to go into the BIOS to enable/disable CPU's, both real and with HT, you can do it all with /sys.

    For example, on a dual-quadcore Xeon machine (with HT) I get:
    Code:
    root:/sys/devices/system/cpu# cat cpu0/topology/core_siblings_list 
    0,2,4,6,8,10,12,14
    This means that these 8 are the first node (uneven numbers are the other).

    So, we have 8 cpu's. To find out which ones are paired in the same physical core by HT:
    Code:
    root:/sys/devices/system/cpu# cat cpu0/topology/thread_siblings_list    
    0,8
    This means that (0,8) are a HT pair. By going through all the others I built a picture to help me when I want to test specific configurations (specific to this machine, but an example):
    Code:
    Package #0              Package #1
    ╔══════════════╗        ╔══════════════╗
    ║┌─────┐┌─────┐║        ║┌─────┐┌─────┐║
    ║│  0  ││  2  │║        ║│  1  ││  3  │║
    ║│  8  ││ 10  │║        ║│  9  ││ 11  │║
    ║└─────┘└─────┘║        ║└─────┘└─────┘║
    ║┌─────┐┌─────┐║        ║┌─────┐┌─────┐║
    ║│  4  ││  6  │║        ║│  5  ││  7  │║
    ║│ 12  ││ 14  │║        ║│ 13  ││ 15  │║
    ║└─────┘└─────┘║        ║└─────┘└─────┘║
    ╚══════════════╝        ╚══════════════╝

    So now it's just a matter of doing echo 0 or 1 > /sys/devices/system/cpu/cpuN/online to enable/disable the cores.

    I have a quick hackish script to do this second part, here:
    Code:
    #!/usr/bin/env python
     # -*- coding: utf-8 -*-
    
    ## cpucontrol.py version 0.12
    ##      Quick hack to enable/disable cores on a linux machine
    ##      by Ivo Anjo <knuckles@gmail.com>
    
    import os
    import sys
    
    prefix='/sys/devices/system/cpu/'
    
    def get_max_cpu_id():
            return int(open(prefix + "present").readlines()[0].split("-")[1])
    
    def change_cpu_state(cpu, state):
            os.system("echo '" + str(state) + "' > " + prefix + "cpu" + str(cpu) + "/online")
    
    if len(sys.argv) != 2:
            print "syntax: " + sys.argv[0] + " comma-separated list of cpus to put online"
            print "\t\texample: 1,2,3"
            print "\t\texample: all"
            print "\t\tnote: cpu0 cannot be disabled"
            sys.exit(1)
    
    # disable all cpus
    for i in range(1, get_max_cpu_id()+1):
            change_cpu_state(i, 0)
    
    # hack to support 'all'
    if sys.argv[1].strip().lower() == 'all':
            # enable all
            for i in range(1, get_max_cpu_id()+1):
                    change_cpu_state(i, 1)
    else:
            # enable the ones requested
            for i in sys.argv[1].split(','):
                    if int(i) != 0:
                            change_cpu_state(i, 1)
    
    print "Online:", open(prefix + "online").readlines()[0],
    print "Possible:", open(prefix + "present").readlines()[0],

  4. #4
    Join Date
    Mar 2010
    Posts
    30

    Default

    I think one of the more interesting conclusions is that hyper-threading and visualisation isn't always a good thing.

    But what I find the most weird is when going from 4 to 6 cores causes slowdowns!

    I would think that KVM has the option to limit the number of cores given to a virtual machine right? If so, I would be interested to see what happened if you left the host at 6 cores (no hyperthreading), then scaled the number of cores given to the host from 1-6, then host with 6 cores/12 threads (hyperthreading on) with the guest being scaled/tested with 1-12 cores.

    I do wonder how much the different generations of hyperthreading hurt or help performance, P4 HT, Atom 330 HT, Atom D510 HT, Early i7 HT (eg 920), and later i7 HT (eg the 960 you have and maybe a 860 as well).

  5. #5
    Join Date
    Sep 2008
    Posts
    989

    Default

    Interesting tests. However, in the preponderance of the tests, enabling 6 or 12 cores does result in a performance increase, or at least, the lack of a decrease. TTSIOD, NAS LU.A, and GraphicsMagick are the only tests that clearly show that enabling more than 4 cores in the guest results in a marked performance decrease.

    This may mean that added overhead due to the number of cores is limiting the effectiveness of the core count scalability in the guest, but even a slight performance increase or flatline does not convince me to disable some of my cores or HT in the guest.

    It can, and probably will get better from here, as new hardware support for virtualization is introduced over the years, and as the software hypervisors get smarter. But even today it is apparent to me that running at least KVM in a server workload environment with this many SMP threads is not a problem, unless you constantly run benchmarks and/or GraphicsMagick as your daily workload.

  6. #6
    Join Date
    Sep 2008
    Posts
    989

    Default

    Also noteworthy is the observation that the tests where the guest shows a performance decrease with >4 cores are the same tests that show a logarithmic flattening or even a slight decrease in performance on the host. If the host's cores aren't scaling well with that benchmark, then maybe the fault lies with the benchmark itself, and not with the virtualization platform.

  7. #7
    Join Date
    Sep 2010
    Posts
    56

    Default

    Quote Originally Posted by allquixotic View Post
    Interesting tests. However, in the preponderance of the tests, enabling 6 or 12 cores does result in a performance increase, or at least, the lack of a decrease. TTSIOD, NAS LU.A, and GraphicsMagick are the only tests that clearly show that enabling more than 4 cores in the guest results in a marked performance decrease.

    This may mean that added overhead due to the number of cores is limiting the effectiveness of the core count scalability in the guest, but even a slight performance increase or flatline does not convince me to disable some of my cores or HT in the guest.

    It can, and probably will get better from here, as new hardware support for virtualization is introduced over the years, and as the software hypervisors get smarter. But even today it is apparent to me that running at least KVM in a server workload environment with this many SMP threads is not a problem, unless you constantly run benchmarks and/or GraphicsMagick as your daily workload.
    Those performance decreases are syntetic as you pointed out. However, as I said previously, they results of a bottlenecks somewhere. Some real world usage could be impacted by this bottleneck. Havint the possibility to bench it and resolve were the bottleneck lies, could possibily helps greatly the developers. As Phoronix have the infrastructure, I do think it would be real nice for them do to so. By the same time, it could show off how to optimize your server (i.e. do faster ram result in better virtualization performance ...). It would be a win-win situation. Anyhow, they are free to do what they want, I'm only suggesting

  8. #8
    Join Date
    Jun 2006
    Posts
    311

    Default

    Quote Originally Posted by werfu View Post
    Those performance decreases are syntetic as you pointed out. However, as I said previously, they results of a bottlenecks somewhere. Some real world usage could be impacted by this bottleneck. Havint the possibility to bench it and resolve were the bottleneck lies, could possibily helps greatly the developers. As Phoronix have the infrastructure, I do think it would be real nice for them do to so. By the same time, it could show off how to optimize your server (i.e. do faster ram result in better virtualization performance ...). It would be a win-win situation. Anyhow, they are free to do what they want, I'm only suggesting
    As we do usually in these cases. If people have constructive ways of quantifying the issues, we are open to it. Realistically, in the cases of the virtualization projects, the companies behind them most likely have more hardware than is necessary.

    I liked the results, the historic statement that virtualization doesn't work with multiple CPUs has now been reduced to "for some workloads" it collapses at some point. There are probably half a dozen more targetted tests (with/without HT, per package with/without HT, etc).

  9. #9
    Join Date
    Apr 2009
    Posts
    36

    Default

    Hi,

    i'm running 4x12cores opteron kvm guests at work (true cores,not hyperthreading), so i can make test for you if you want !.

  10. #10
    Join Date
    Dec 2010
    Posts
    2

    Default

    Hmmm... According to my tests some time ago Apache scaled very well inside a KVM on a HP DL 380 (2 CPUs, 4 Cores, HT enabled):

    http://www.tauceti.net/kvm-benchmark.../composite.xml

    Here are my complete benchmarks (text is in German, but just click the links for the other benchmarks):

    http://www.tauceti.net/roller/cetixx...ozone_graphics

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •