Merging In The GNU D Language Compiler To GCC

ciplogic replied

14 October 2011, 06:15 AM
Originally posted by movieman View Post

Not returning pointers to things allocated on the stack is easier then working around garbage collector crud. In fact, I'd imagine it's not hard for a compiler to warn you any time you do so.

Obviously you noticed that the compiler do not have information about who has the ownership of life-cycle of the objects and I do agree that some class of problems can be solved in a lot of nice ways, was a real case when you can make mistakes and is real code. Also was reflecting the point that faster stack operations can lead to other risks. In general missing of GC means that you simply don't know for a pointer (excluding is NULL) that is valid or not (because can be deleted).

Originally posted by movieman View Post

BTW, another reason why garbage collection is a pain: after you gave your text editor the 1GB that it needs to run happily with garbage collection without long pauses, then it gets swapped out, then you open it up again and type something, it has to swap most of that memory back in so it can do a garbage collection.

You did it wrong to give 1GB of memory to your text editor. Big heaps are used for big, long running applications. Anyway, did you notice that swapping can happen for C++ applications too? Did I miss something about OS architectures? In fact there are cases when independent of memory allocation strategies of different runtimes. For example Android will drop applications (in multitasking) based on memory usages.
At the end there are applications that do less, using C++ and use more memory than ones that use a GC. SharpDevelop loading the same projects (I tried for a 1000 classes project, SD will use 120M RAM, when VC# will use 140) will give consistently less memory usages that Visual C# Express, and reflects in almost 1:1 functionality. IntelliJ IDEA is fairly lean IDE. I am not justifying that writing counting every byte will not make C applications leaner, but people do not write in this way anymore.
Also as it was talked earlier GC algorithms are different, so there are desktop GCs which are small pause to increase responsiveness. If you use Ubuntu and it happen to use Banshee, I did not notice it to swap. If you used an installer of Ubuntu or Fedora, you probably noticed the killer pauses of the GC as the live ISO remained without RAM (pun intended!).
In fact there are so many GC based applications compared with the wisdom that they are too clumsy (Flash, Mono/.Net/Java, JavaScript, Python, Ruby with Rails), that I still wonder why there are people that are just defending irrationally the C++ language mechanics, when obviously other mechanisms work too. At the end the "managed" world is much bigger than a memory allocator abstraction, most people will use today SQL, and no one seem to attack it in case that it has other problems, like: what if the query is not well optimized and instead of 10ms will take 10 seconds to compute, also that there are enterprise like behaviors (Oracle, SQL Server, MySQL, and so on) which are optimized for throughput, which (obviously) behave different than an SQLite or Firebird. And applications use both, what if the database server will use huge memory that is not a part of application creator. What if when you execute a query, the server would use memory that enters on swap (remembering me on GC talks).
My observation: best tool (not only in raw performance terms, but in the way that problem is really solved, maintenance costs, skill set, and so on) for the needed problem.
Leave a comment:
movieman replied

13 October 2011, 07:08 PM
Not returning pointers to things allocated on the stack is easier then working around garbage collector crud. In fact, I'd imagine it's not hard for a compiler to warn you any time you do so.

BTW, another reason why garbage collection is a pain: after you gave your text editor the 1GB that it needs to run happily with garbage collection without long pauses, then it gets swapped out, then you open it up again and type something, it has to swap most of that memory back in so it can do a garbage collection.
Leave a comment:
ciplogic replied

08 October 2011, 02:56 PM
A typical GC case when stack operations may break your code

I worked with strings and I encounter one case (I found it as crash occurs, if that means fun ) and is a Qt code like this:

Code:

const char *cStr = myQString.toAscii().data();

I do understand that was my mistake, and so on, but sometimes GC can make the code much cleaner without working with internals and to see that a variable is given on stack and you have to make a copy to not lose the data. as it deallocates when things get out of scope.
Leave a comment:
kayosiii replied

08 October 2011, 01:37 AM
Originally posted by bnolsen View Post

And the biggest point about GC that people seem to ignore....it doesn't scale very well to highly efficient multithreaded applications. The GC just gets in the way, bottlenecking execution and stalling threads out. You can do thread local storage at that point but then why would you want a GC attached to each and every thread?

D does default to thread local storage. http://www.informit.com/articles/article.aspx?p=1609144
the current garbage collector could be implemented better however. you don't have to use the GC for your heavy lifting however.

IMHO "d" was designed when "garbage collection" was all the rage. Other complaints: the number of keywords in the language is staggering even compared to many mature languages current in widespread use.

When "D" was announced I was excited. c++ is pretty long in the tooth, compilation is slow, c++ compilers all implemented the spec differently. I was hoping "D" would be a better c++, cleaning up the stupidity. Instead they introduced questionable concepts themselves, loading up the language with somewhat questionable features based on "fads" from the 90's and early 00's.

As far as I can tell the majority of the extra keywords come from merging prepossessing and template meta programming into the core language.
I am not exactly sure which 90s-early 00s fads you are talking about.
Empirical and Functional programming have been popular for a lot longer than that.
Concurrent programming? I have a hard time concidering a Fad given the way that hardware is going.
Object Oriented programming (maybe - what's your rational there though)?
Template Meta Programming (perhaps - why is C++ adding a lot of the same features to c++11 though?)
Unit-testing (that would get you rid of 1 keyword)
Contract Programming (that would let you get rid of another keyword, 3 more if you remove features from other parts of the language).

I can understand wanting a smaller simpler language (perhaps go or vala?) would be more appealing. I just don't quite get your specific complaints.
Leave a comment:
bnolsen replied

07 October 2011, 07:25 PM
And the biggest point about GC that people seem to ignore....it doesn't scale very well to highly efficient multithreaded applications. The GC just gets in the way, bottlenecking execution and stalling threads out. You can do thread local storage at that point but then why would you want a GC attached to each and every thread?

IMHO "d" was designed when "garbage collection" was all the rage. Other complaints: the number of keywords in the language is staggering even compared to many mature languages current in widespread use.

When "D" was announced I was excited. c++ is pretty long in the tooth, compilation is slow, c++ compilers all implemented the spec differently. I was hoping "D" would be a better c++, cleaning up the stupidity. Instead they introduced questionable concepts themselves, loading up the language with somewhat questionable features based on "fads" from the 90's and early 00's.
Leave a comment:
ciplogic replied

07 October 2011, 02:25 PM
Originally posted by mirv View Post

(...) I personally think that garbage collector or no, a programmer should be aware of what they're doing with memory. Know if you've left something for later deallocation, know if you need things to stay around, know what your code is doing. Garbage collectors should not be relied upon to clean up bugs - that's just lazy and poor programming; the bugs shouldn't be there in the first place.

The GC changes the code as it is. The difference between a GC and a non GC code is mostly: GC do guarantee that the references are to an either: allocated memory block or to null. In this way simply removing the references in a natural way (like not using the variable anymore) or by explicitly setting it up to null, will make references to be removed.
GC proved to have some good parts and drawbacks. The biggest drawbacks I think are that GC process can be slow, unpredictable, and is memory bandwidth dependent. The biggest features are in code that you don't know what it does, like for example big web frameworks. When you have to do a desktop application, sometimes the code can become messy, (I don't talk about me and you) because are multiple programming styles, broken paradigms, multiple contributors. For this kind of applications, C++ can be painful to be maintained. And for this reason people try to use another paradigms that will remove the small leak from here, the buffer overflow form there. Some may use an SqlLite database, or maybe a shell script to do a specific task, so if that shell script will leak memory, will basically not matter as the process will stop, the process will give back the used memory.
At the end not all GCs are equal. Most GCs are optimized for enterprise, meaning throughput. This means: if the pauses can be overcomed (it is done in most cases with a balancer) what matters is to remove most objects in the shortest time.
Another class of collectors are the D one, or in general the Boehm's ones, meaning they are C based collectors where they scan the stack and are conservative ones. They are the slowest in their class, but they do achieve a safe running program.
But the latest class of collectors are named "cycle collectors" which are mostly used by some JavaScript engines and ActionScript. So by default it is used a reference counting algorithm, and at the end if the GC is triggered, the GC will simply look for dangling cycles. As reference counting algorithms are mostly predictable, will make that the pauses to occur not that often, if ever and the pauses to be smaller. Yet this gives a performance hit over iterating in collections and need to be a tight coupling with compiler.
Is hard to achieve perfect GC and there is a lot of tuning to make it smooth. Is for a server bad when a GC happen in a cluster for a minute? In some way it does, but not to such of a great extend. On the other way is better to have a node be blocked for some time just doing GC if the task is migrated to another node. Is it bad if the GC will happen in a Mono application, like Pinta? Mostly you will not notice it, because the extra big allocations will be made in a non moving collector, so even a picture is referenced (or not referenced) the GC will let it as it was placed in memory. And about the small pauses that may appear at Undo/Redo, most users will not notice that was an extra 0.1 seconds as GC triggered.
What about JavaScript/ActionScript games? There are JavaScript, even Java games that do work just fine (look for Tanki). If you will compare the throughput, will be smaller compared with either C or Java allocators.
At the end the question I think is: if someone can make a well behaving application that works under our expectations, can it use std::strings, or it must use char* because is faster? It have to use direct arrays or STL? I think the answer is in developers' hands, not in complaining users. The same should be done on GCs: use them but when you should. Focus to make all of us (mostly happy), and don't take any rumor is around!
Leave a comment:
mirv replied

07 October 2011, 09:18 AM
C++ is meant to be flexible, and powerful. It's not meant to hold the programmer's hand. Many seem to forget that, so thought I'd mention it again.
On the subject of memory management and garbage collection within C++, however, I direct people to:

http://www2.research.att.com/~bs/bs_...age-collection

I personally think that garbage collector or no, a programmer should be aware of what they're doing with memory. Know if you've left something for later deallocation, know if you need things to stay around, know what your code is doing. Garbage collectors should not be relied upon to clean up bugs - that's just lazy and poor programming; the bugs shouldn't be there in the first place.
Leave a comment:
ciplogic replied

07 October 2011, 08:43 AM
Originally posted by movieman View Post

C++ doesn't wait until your program has used a gigabyte of RAM before it clears things out.

As there are many C/C++/"native" memory allocators algorithms, similarly there are obviously many GC algorithms. In fact many things which are told here are in general 10 years old things that GC used to have and abstraction costs that people that are programming in a GC based language are used to make.
Typical missconceptions are written here:
- "C++ doesn't wait tilll has used a gigabyte of RAM before it clears things out": Java (or .Net) it doesn't wait either. The default GC algorithms are by default using a generational garbage collector, meaning when a "mini-heap" of a size of 4M is filled (or less, this is just an example) the GC occurs. Most of short lived objects are removed. This is important for many reasons to be understood: if you make fairly short lived objects or you get from other framework the same short lived objects and you will set some things out using them and leave them after, they will be released really soon. Nothing else trickier. In C/C++ if you get from a framework a reference of an object you sometimes don't have enough information to do delete by your own. COM solved this somewhat using reference counting, and some people are using smartpointers to do that.
- "Compacting is an extremely EXPENSIVE operation as it means MOVING memory blocks around in the heap." So this is why is not done by default but just for small objects! For example here is explained how the GC is improved for large objects heap (which was not moving) It has certainly a cost but is not that big.
- "Not only does fragmentation rarely become a problem in C++ unless you have a poor design -- e.g. repeatedly calling new and delete rather than allocating temporary variables on the stack -- but the C library is smart enough to give memory back to the OS when it can (e.g. allocating large blocks with mmap rather than on the heap)."
The poor design happens in a lot of cases, and all GC languages in question (D, Java, .Net ones) have "objects on stack" with the note that those objects in Java do not have the same semantics (are primitive types, structs and such). In fact even C++ have a different semantic for working with stack vs heap variables. You have to do the management. At the end, the "give back" memory happens in Java and .Net world too, it happen in D too (which use Bohem's algorithms if I know well). You can setup a default heap to not resize up and down it, but will certainly do give back the memory.
- "Only if you don't allocate those objects on the stack; I'd guess at least 90% of all the allocations in C++ projects I've worked on are on the stack, where they can't cause fragmentation. The exceptions are primarily strings, lists and maps, which we don't use much."
Is in the same category: you can allocate on stack in Java too. Can be annoying as there is no C# like "struct" semantic in it but it can work really well. For every time when you want to get a coordinate from a Point class, create a method int getX() and another int getY() and use the logic using those as being a full point. You don't need to heap allocate everything just because is Java (or C#)

And the conclusion:
- "I read an amusing thread on the web a while back where some web developer had allocated about 20GB of RAM to their Java server app because they couldn't handle the overhead of garbage collection. Which worked great until it actually filled up, and then their system froze for several minutes while it cleared out all the accumulated crap. That's the kind of excitement that garbage collection offers you, and when it happens you suddenly have to go back and rewrite significant amounts of your code to work around it. "
Garbage collection have a cost, for small heaps is too small to count, and certainly as data size grows, will increase the GC time. Also some years back, change may be different, people can experiment and improve.
The G1 GC algorithm in Java can have fairly good times: (source: http://stackoverflow.com/questions/2...-in-production) "I've found that it is very good at keeping within the pause target you give it most of the time. The default appears to be a 100ms (0.1 second) pause, and I've been telling it to do half that (-XX:MaxGCPauseMillis=50). However, once it gets really low on memory, it panics and does a full stop-the-world garbage collection. With 65GB, that takes between 30 seconds and 2 minutes. "
So "Stop the world" can take 2 minutes to free all the garbage. I don't want to be cozy but to work with this sizes a C memory allocator will have slow times to get the next malloc too. At the end, why people do not write web pages in C/C++, or at least not that often? Seems everything is just great in C++, people will argue that the performance is simply amazing, the memory manager do make things to finish even faster. Is here any C++ web designer can write a proper response to me?
Leave a comment:
kayosiii replied

07 October 2011, 03:13 AM
This gives some insight into the design differences between java and D.

SafeD - D Programming Language 2.0 - Digital Mars

http://www.digitalmars.com/d/2.0/safed.html

D Programming Language

And this video gives you a bit of background in implementing webservers in D vs Java including memory management. (it's long)

Error 404 (Not Found)!!1

http://video.google.com/videoplay?docid=-4010965350602541568
Leave a comment:
kayosiii replied

07 October 2011, 02:59 AM
This gives some insight into the design differences between java and D.

SafeD - D Programming Language 2.0 - Digital Mars

http://www.digitalmars.com/d/2.0/safed.html

D Programming Language

And this video gives you a bit of background in implementing webservers in D vs Java.

Error 404 (Not Found)!!1

http://video.google.com/videoplay?docid=-4010965350602541568
Leave a comment:

Announcement

Merging In The GNU D Language Compiler To GCC

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: