Announcement

**BlackStar** · 28 March 2012, 09:04 AM

Originally posted by Ansla View Post

If anybody is interested in the sources in order to replicate the results (maybe get clang c++0x working?) I will post them in another comment, this one is getting too big already.

Please do. A pastebin link would be perfect.

**Ansla** · 28 March 2012, 10:15 AM

Originally posted by ciplogic View Post

Java slow?

I made a similar code in Java and C++ (I will do C# if needed)

As you noticed the Java code does solve the same computation and uses Integer value (which boxes an int "primitive type") in 114 ms against VS 2010 release mode C++ which needed 20685 ms. Changing the line:

Well, this is my code that generated those results:

smart_pointers.cpp

Code:

#include <vector>
#include <memory>
#include <boost/smart_ptr.hpp>

#ifndef POINTER
#define POINTER Item*
//#define POINTER std::shared_ptr<Item>
//#define POINTER boost::shared_ptr<Item>
#endif

struct Item {
        int x;
        int y;
};

std::vector<POINTER> items;

__attribute__ ((noinline)) void ProcessItem(POINTER item) {}

int main() {
        for(int i = 0; i < 10; i++) {
                POINTER x(new Item);
                items.push_back(x);
        }
        for(int i = 0; i < 100000000; i++)
                for(auto it = items.begin(); it != items.end(); ++it)
                        ProcessItem(*it);
        return 0;
}

smart_pointers.cs

Code:

using System.Collections.Generic;

class Item {
        int x;
        int y;
};

class Dummy {
        static List<Item> items = new List<Item>();

        static void ProcessItem(Item item) {
        }

        static int Main(string[] args) {
                for(int i = 0; i < 10; i++) {
                        items.Add(new Item());
                }
                for(int i = 0; i < 100000000; i++)
                        foreach(Item item in items)
                                ProcessItem(item);
                return 0;
        }
}

smart_pointers.java

Code:

import java.util.*;

class Item {
        int x;
        int y;
};

class Dummy {
        static Vector<Item> items = new Vector<Item>();

        static void ProcessItem(Item item) {
        }

        public static void main(String[] args) {
                for(int i = 0; i < 10; i++) {
                        items.add(new Item());
                }
                for(int i = 0; i < 100000000; i++)
                        for(Item item : items)
                                ProcessItem(item);
        }
}

Makefile

Code:

CXXFLAGS=-std=c++0x -O2
#clang chokes on STL headers with c++0x
CLANG_EXTRA_FLAGS=-std=c++98
LDFLAGS=-lboost_system

all: execute

gccregular: smart_pointers.cpp
        g++ -o $@ $(CXXFLAGS) $(LDFLAGS) $^

gccsmart: smart_pointers.cpp
        g++ -o $@ -DPOINTER=std::shared_ptr\<Item\> $(CXXFLAGS) $(LDFLAGS) $^

gccboost: smart_pointers.cpp
        g++ -o $@ -DPOINTER=boost::shared_ptr\<Item\> $(CXXFLAGS) $(LDFLAGS) $^

clangregular: smart_pointers.cpp
        clang++ -o $@ $(CXXFLAGS) $(CLANG_EXTRA_FLAGS) $(LDFLAGS) $^

clangsmart: smart_pointers.cpp
        clang++ -o $@ -DPOINTER=std::shared_ptr\<Item\> $(CXXFLAGS) $(CLANG_EXTRA_FLAGS) $(LDFLAGS) $^

clangboost: smart_pointers.cpp
        clang++ -o $@ -DPOINTER=boost::shared_ptr\<Item\> $(CXXFLAGS) $(CLANG_EXTRA_FLAGS) $(LDFLAGS) $^

smart_pointers.exe: smart_pointers.cs
        mcs $^

Dummy.class: smart_pointers.java
        javac $^

gcj: smart_pointers.java
        gcj -o $@ --main=Dummy $^

execute: gccregular gccsmart gccboost clangregular clangboost smart_pointers.exe Dummy.class gcj
        echo -e \\ngcc with regular pointers
        time ./gccregular
        echo -e \\ngcc with boehm-gc
        LD_PRELOAD=/usr/lib/libgc.so time ./gccregular
        echo -e \\ngcc with native smart pointers
        time ./gccsmart
        echo -e \\ngcc with boost smart pointers
        time ./gccboost
        echo -e \\nclang with regular pointers
        time ./clangregular
        echo -e \\nclang with boehm-gc
        LD_PRELOAD=/usr/lib/libgc.so time ./clangregular
        #echo -e \\nclang with native smart pointers
        #time ./clangsmart
        echo -e \\nclang with boost smart pointers
        time ./clangboost
        echo -e \\nMono:
        time mono smart_pointers.exe
        echo -e \\nJava:
        time java Dummy
        echo -e \\nGcj:
        time ./gcj

clean:
        rm -f gccregular gccsmart gccboost clangregular clangsmart clangboost gcj *.exe *.class

The main difference I see is that you got rid of the vector in the test you performed, so I suppose it's iterating over the vector that kills java performance in my case.

**ciplogic** · 28 March 2012, 10:40 AM

Originally posted by Ansla View Post

smart_pointers.java

Code:

import java.util.*;

class Item {
        int x;
        int y;
};

class Dummy {
        static Vector<Item> items = new Vector<Item>();

        static void ProcessItem(Item item) {
        }

        public static void main(String[] args) {
                for(int i = 0; i < 10; i++) {
                        items.add(new Item());
                }
                for(int i = 0; i < 100000000; i++)
                        for(Item item : items)
                                ProcessItem(item);
        }
}

The main difference I see is that you got rid of the vector in the test you performed, so I suppose it's iterating over the vector that kills java performance in my case.

Yes, I do think the same. ForEach loops in Java and C# will get an iterator. Java will do type checks (as Generics are implemented with Type Erasure), so it will check every time if Object that is stored in items collection is really an Item. Is like you will make an array of void* in C++, and do a dynamic cast at every loop step, from void* to Item, and this pointer you will give it to the Process method. The dynamic_cast may be the most expensive operation in your loop, not the operations of the loop. C# generics do not lose the types so it doesn't need to do the check of the iterator at every step, so this check is not a part of the C#/C++ generics/templates code.

I change the program lightly, to iterate over an array of Item, like following:

Code:

package run;
(...)
        static Item[] items = new Item[10];
(...)
                for(int i = 0; i < 10; i++) {
                        items[i] =new Item();
                }
(...)

And the output is:

Code:

Time: 727 ms

On my machine the original time was:

Code:

Time: 55430 ms

Also, if you would play around, and every time you will get 0 ms, means that you hit a compiler optimization (either from Java or C++ side). In my original code, if you removed the +2 (in the process method) the Java optimizer would optimize it out, so you will get like 4 to 10 ms times, but I wanted to be as objective as possible.

**Ansla** · 28 March 2012, 11:03 AM

Originally posted by ciplogic View Post

Yes, I do think the same. ForEach loops in Java and C# will get an iterator. Java will do type checks (as Generics are implemented with Type Erasure), so it will check every time if Object that is stored in items collection is really an Item. Is like you will make an array of void* in C++, and do a dynamic cast at every loop step, from void* to Item, and this pointer you will give it to the Process method. The dynamic_cast may be the most expensive operation in your loop, not the operations of the loop.

I change the program lightly, to iterate over an array of Item, like following:

Code:

package run;
(...)
        static Item[] items = new Item[10];
(...)
                for(int i = 0; i < 10; i++) {
                        items[i] =new Item();
                }
(...)

And the output is:

Code:

Time: 727 ms

I replaced Vector with ArrayList and it reduced the time for Icedtea to 3.17 seconds and gcj to 37.86 seconds. So it's just the Vector in Java that's slow as hell.

Originally posted by ciplogic View Post

Also, if you would play around, and every time you will get 0 ms, means that you hit a compiler optimization (either from Java or C++ side). In my original code, if you removed the +2 (in the process method) the Java optimizer would optimize it out, so you will get like 4 to 10 ms times, but I wanted to be as objective as possible.

That's why I used __attribute__ ((noinline)) in the C++ code. Do you know if there is an equivalent for Java/C#? Or are they just too high level to allow such details to be tweaked?

**ciplogic** · 28 March 2012, 11:12 AM

Originally posted by Ansla View Post

I replaced Vector with ArrayList and it reduced the time for Icedtea to 3.17 seconds and gcj to 37.86 seconds. So it's just the Vector in Java that's slow as hell.

That's why I used __attribute__ ((noinline)) in the C++ code. Do you know if there is an equivalent for Java/C#? Or are they just too high level to allow such details to be tweaked?

Yes, in .Net 4.5/latest Mono, but are a fairly new development. As for Java, I don't think there is something like this.

Also Vector is a synchronized version of the "std::vector" when ArrayList is not synchronized. In the link I gave it to you also states that basic arrays are the fastest construct (the dynamic cast check would not be here).
Java supports escape analysis, but have to be enabled as flags, and sometimes it may convert the Vector to ArrayList, if it match the optimization. You have to add to Java's arguments the following: -server -XX:+DoEscapeAnalysis

**ciplogic** · 28 March 2012, 12:24 PM

Making the same change:

(...)
static Item[] items = new Item[10];
(...)
for(int i = 0; i < 10; i++) {
items[i] =new Item();
}
(...)

in C# will improve the performance around 2.5x.
if you don't want to change in two places and you want that your program to store List<Item>, the "hot" loop have to be rewritten like following:

Code:

        var itemArray = items.ToArray();
        for (var i = 0; i < 10000000; i++)
            foreach (var item in itemArray)
                ProcessItem(item);

And you will get: 6411 ms.

If you will revert the loops (making the iterating over array the outer loop:

Code:

        var itemArray = items.ToArray();
        foreach (var item in itemArray)
            for (var i = 0; i < 10000000; i++)
                ProcessItem(item);

And you will get: 3308 ms.

For reference your code with C# original time: 14040 ms.

**Ansla** · 28 March 2012, 01:03 PM

Originally posted by ciplogic View Post

Making the same change:

in C# will improve the performance around 2.5x.
if you don't want to change in two places and you want that your program to store List<Item>, the "hot" loop have to be rewritten like following:

Code:

        var itemArray = items.ToArray();
        for (var i = 0; i < 10000000; i++)
            foreach (var item in itemArray)
                ProcessItem(item);

And you will get: 6411 ms.

If you will revert the loops (making the iterating over array the outer loop:

Code:

        var itemArray = items.ToArray();
        foreach (var item in itemArray)
            for (var i = 0; i < 10000000; i++)
                ProcessItem(item);

And you will get: 3308 ms.

For reference your code with C# original time: 14040 ms.

Since iterating over a collection has such a big impact on VM based languages the better solution for measuring strictly the cost of making a copy of a smart pointer is using code with no collections involved, I only included a collection in my tests because that was to use case you suggested initially.

Anyway, smart pointers are no different then any other class that a programmer used to a VM language might be tempted to pass as value (or return as value) instead of const reference when writing C++ code for the first time. The worst case is when that class is a collection, imagine code like this:

Code:

std::vector<Item> GetItems() {
    return m_vector;
}

where sizeof(Item) == 100 and m_vector.size() == 100000. Copying 10MB of data around, plus anything else the copy constructor of Item might do, thousands and thousands of times for each call to a simple getter. Now that's overhead!!!!

**ciplogic** · 28 March 2012, 01:15 PM

Originally posted by Ansla View Post

Since iterating over a collection has such a big impact on VM based languages the better solution for measuring strictly the cost of making a copy of a smart pointer is using code with no collections involved, I only included a collection in my tests because that was to use case you suggested initially.

Anyway, smart pointers are no different then any other class that a programmer used to a VM language might be tempted to pass as value (or return as value) instead of const reference when writing C++ code for the first time. The worst case is when that class is a collection, imagine code like this:

Code:

std::vector<Item> GetItems() {
    return m_vector;
}

where sizeof(Item) == 100 and m_vector.size() == 100000. Copying 10MB of data around, plus anything else the copy constructor of Item might do, thousands and thousands of times for each call to a simple getter. Now that's overhead!!!!

As you understood my answer, I agree. I mean, when people judge Java/Mono as being slow, they don't do it with data. I don't want to prove to you that C# is faster than C++. Also, I had (somewhat) bad experience with performance implication of boost's smartpointer, which I think it should be considered. When references are in fact pointers in Java/C# (with an integer as "vtable", so they have memory overhead, but no performance overhead), the smart-pointers have performance overhead (as they "need" to update with increments/decrements for every time you use a pointer from an external component, when when you get a Java Object, or C# Object, you don't need to). Also smartpointers need to have extra thought process (as you need to use weak-references to break circular dependencies).
Also, as you noticed, by mistake (I'm not blaming to make it) you did make your code inefficient as using a bad construct of Java (and C#) as many programmers may do, and get the wrong conclusion. C# can be used to make snappy application (or at least snappy enough) and if is not fast enough, in many cases is not the fault of C# or Java. Sometimes it is, but in most cases, people, as you wrote, do not know the performance implications of their constructs.
As being high level, C# brings some features that C++ could simulate them too, but will be harder to make it right natively (like dependency injection, parallel loops, or maybe runtime code generation and executing it via Reflection, in C++ you would invoke an external compiler, make a DLL and load it with LoadLibrary/dlload) which should be credited for. And people just programming using "older" language will miss the abstractions. How useful is it to have a JIT? For 95% of applications, it should not matter, but for having shaders using LLVM (as OS X does), is nice to have.
I think that you would use Mono if you would have performance in range of GCC (if you use a lot of classes/pointers, instead of using shared_ptr and going to put everywhere where compiler does not complain a const &, which is at least tedious) and you know that Mono or .Net is installed on the target machine, isn't so?
Note: about me suggesting that you should need to use a collection, I don't know where, please quote me to see as I don't know where I've told that.

**Ansla** · 28 March 2012, 04:11 PM

Originally posted by ciplogic View Post

As you understood my answer, I agree. I mean, when people judge Java/Mono as being slow, they don't do it with data. I don't want to prove to you that C# is faster than C++. Also, I had (somewhat) bad experience with performance implication of boost's smartpointer, which I think it should be considered. When references are in fact pointers in Java/C# (with an integer as "vtable", so they have memory overhead, but no performance overhead), the smart-pointers have performance overhead (as they "need" to update with increments/decrements for every time you use a pointer from an external component, when when you get a Java Object, or C# Object, you don't need to). Also smartpointers need to have extra thought process (as you need to use weak-references to break circular dependencies).

shared_ptr is the most complex smart pointer, so probably the one with the highest overhead, that I used only on rare occasions, most of the time when the algorithm is simple enough it just has to know whether it passed ownership to some other component or it has to free it. And in this cases auto_ptr/unique_ptr are better suited.

Originally posted by ciplogic View Post

Also, as you noticed, by mistake (I'm not blaming to make it) you did make your code inefficient as using a bad construct of Java (and C#) as many programmers may do, and get the wrong conclusion. C# can be used to make snappy application (or at least snappy enough) and if is not fast enough, in many cases is not the fault of C# or Java. Sometimes it is, but in most cases, people, as you wrote, do not know the performance implications of their constructs.

Yeah, I had to switch from C++ to Java then back to C++ in the past years, and the transition wasn't exactly smooth in either direction. And now I also have to write some C# code from time to time for my job, that doesn't always end well either

Originally posted by ciplogic View Post

As being high level, C# brings some features that C++ could simulate them too, but will be harder to make it right natively (like dependency injection, parallel loops, or maybe runtime code generation and executing it via Reflection, in C++ you would invoke an external compiler, make a DLL and load it with LoadLibrary/dlload) which should be credited for. And people just programming using "older" language will miss the abstractions. How useful is it to have a JIT? For 95% of applications, it should not matter, but for having shaders using LLVM (as OS X does), is nice to have.
I think that you would use Mono if you would have performance in range of GCC (if you use a lot of classes/pointers, instead of using shared_ptr and going to put everywhere where compiler does not complain a const &, which is at least tedious) and you know that Mono or .Net is installed on the target machine, isn't so?

Many of the features above also allow great abuses/security risks and were used by viruses long before C# or even Java appeared... And none of them is truly impossible to achieve with native code, just trickier or plain different. The true advantage of VMs I think is ABI compatibility. With C# or Java there are no ways to break ABI without also breaking the API, and the API is pretty easy to keep backwards compatible. With C/C++ there are plenty of ways to mess up the ABI alone, like simply adding a new member to a class/struct. And this is not just a problem caused by the big bad corporations shipping binary-only programs, no rolling distribution likes to rebuild and push updates for half their packages to the users just because some widely use lib broke its ABI. Sure, it's not impossible to keep a stable ABI over a long period of time in C++, you just have to follow 27 easy steps like those presented in the KDE policy for ABI compatibility.

Originally posted by ciplogic View Post

Note: about me suggesting that you should need to use a collection, I don't know where, please quote me to see as I don't know where I've told that.

You didn't exactly suggest I should use it, but my experiment started from the following code you posted and I tried to keep as close to it as possible:

Originally posted by ciplogic View Post

I know that this code can have multiple forms, and I know that it can be changed using a const reference, but the idea remains the same (if in the loop it is assigned a smart_ptr to another).

Code:

vector<smart_ptr<Item>> items;
for(auto it = items.begin(); it!=items.end(); ++it)
{
    ProcessItem(*it); //increment and decrement of reference
}

I wanted to see just how much overhead copying the smart pointer could have, as I never encountered this problem in a real project, and I assumed it would have been negligible compared to the rest of the stuff going on in a real application, but I wanted to make sure. I still think the smart pointer is the least of your worry here, unless that code happens to be a really hot path, an even doing a postfix increment on the iterator could do more damage if the compiler doesn't optimize the temporary iterator away.

**ciplogic** · 28 March 2012, 04:32 PM

Originally posted by Ansla View Post

shared_ptr is the most complex smart pointer, so probably the one with the highest overhead, that I used only on rare occasions, most of the time when the algorithm is simple enough it just has to know whether it passed ownership to some other component or it has to free it. And in this cases auto_ptr/unique_ptr are better suited.
(...)
You didn't exactly suggest I should use it, but my experiment started from the following code you posted and I tried to keep as close to it as possible:
(...)
I wanted to see just how much overhead copying the smart pointer could have, as I never encountered this problem in a real project, and I assumed it would have been negligible compared to the rest of the stuff going on in a real application, but I wanted to make sure. I still think the smart pointer is the least of your worry here, unless that code happens to be a really hot path, an even doing a postfix increment on the iterator could do more damage if the compiler doesn't optimize the temporary iterator away.

Yes, you're right. I wanted to show that a code like that was a loop, and there were created some boost::smart_ptr inside it and was enough to be visible in profiler, that was from a real application (an aviation CWP). It happened to be in a tight loop as another system would pump data to you (and the developer had to create local objects out of them) via an ORB system. This is why there was in profiler first: memory allocator, the second was the reference counter (and later it was profiled and some of slow performance cases were solved with const &). As the system was component, the most projects that had fewer failures were the ones written in Ada (!), the very second was Java, the third was C++. C++ was somewhat faster on slower machines, and Java was faster on faster machines (it may be that it was parallelized or that the Hotspot was working simply better on higher end machines).
So Java was a bit better (always at least by adding the better development experience). The C++ codebase (if matters) was "very template based" so was "performance driven", but very hard to debug (templates errors expanding on a page were not uncommon, because the developer forgot a column).
Later I worked in C# world, and I found C# to be a bit slower, but fast enough (slower both than Java and C++), but very fast to integrate with C codebases, and friendlier from a C++ standpoint (a developer which I was for 4-5 years). And sticking that Mono is slow (that someone may quote it for it) it misses the point in my view (either as GC or as generated code), but its productivity that people when attack it, will lose all that comes down to it. And I know where C# can go slow (invisible boxing, using strings, using Collections.Generics everywhere, a place where STL is also not so fantastic). Yet in both Linux (using parameters for Mono) or Windows (using SharpDevelop) you have a very easy way to profile (and works better than their C++ Valgrind/KCacheGrind equivalent) so you can find where is really "too slow", and it can be optimized.
I think that KDE may not need Mono (I think that Qt is a close replacement, QObject + Moc + QML is fantastic in my opinion), but Gnome's platform is not so fortunate. Vala helps a little, but as a streamlined workflow, Monodevelop + Gtk# + C# is in my opinion (after maybe Python) the best way to develop on Gtk+. People shunning Mono, shuns an option for developers that they may really use.

Announcement

Mono 2.11 Release Brings Many Changes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment