January 09, 2014
Am 09.01.2014 11:36, schrieb Tobias Pankrath:
> On Thursday, 9 January 2014 at 10:14:08 UTC, Benjamin Thaut wrote:
>> If requested I can make a list with all language features / decisions
>> so far that prevent the implementation of a state of the art GC.
>
> At least I am interested in your observations.

Ok I will put together a list. But as I'm currently swamped with end of semester stuff, you shouldn't expect it within the next 3 weeks. I will post it on my blog (www.benjamin-thaut.de) and I will post it in the "D.annouce" newsgroup.

Kind Regards
Benjamin Thaut
January 09, 2014
On Thursday, 9 January 2014 at 17:15:46 UTC, Walter Bright wrote:
> How does that work when you pass it "hello"? allocated with malloc()? basically any data that has mixed ancestry?

Why would you do that? You would have to overload cat then.

> Note that your code doesn't always have control over this - you may have written a library intended to be used by others, or you may be calling a library written by others.

The typical C (and the old C++) way has been to roll your own to get what you want and only use very focused libraries (like zlib, fft etc), or only use one big framework that define all their own stuff in a efficient and uniform manner with their own systems (Qt etc).

But it becomes tedious when using more than one framework.

> That doesn't work if you're passing strings with mixed ancestry.

Well, you have to decide if you want to roll your own, use a framework or use the old C way.

The point is more: you can make your own and make it C-compatible, and reasonably efficient.

Usually there are different representations that are more or less efficient or convenient based on what you want to do. Even for strings.  For instance, you can have a high speed ascii MSB  string representation that is 64 bit aligned and that sorts fine using 64 bit uint, and which is 0 terminated (padded to the 8 byte-aligned boundary).
January 09, 2014
On Thursday, 9 January 2014 at 17:17:53 UTC, Walter Bright wrote:
> GC doesn't even make those techniques harder.
>
> I can't see any merit to the idea that GC makes for excessive allocation.

People do what they are accustomed to and what is easy. Library writers are more likely to do allocation for you if they can forget about ownership.

I am more likely to use several single object "new" calls in C++, and more likely to do a "shared malloc" in C. C++ support RAII, C doesn't. "shared malloc" is a cheap version of RAII.
January 09, 2014
On 1/9/2014 10:18 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Thursday, 9 January 2014 at 17:15:46 UTC, Walter Bright wrote:
>> How does that work when you pass it "hello"? allocated with malloc()?
>> basically any data that has mixed ancestry?
>
> Why would you do that? You would have to overload cat then.

So you agree that it won't work.

BTW, it happens all the time when dealing with strings. For example, dealing with filenames, file extensions, and paths. Components can come from the command line, string literals, malloc, slices, etc., all mixed up together.

Overloading doesn't work because a string literal and a string allocated by something else have the same type.


>> That doesn't work if you're passing strings with mixed ancestry.
>
> Well, you have to decide if you want to roll your own, use a framework or use
> the old C way.
>
> The point is more: you can make your own and make it C-compatible, and
> reasonably efficient.

My point is you can't avoid making the extra copies without GC in any reasonable way.

January 09, 2014
On Wednesday, 8 January 2014 at 19:17:08 UTC, H. S. Teoh wrote:
> On Wed, Jan 08, 2014 at 11:35:19AM +0000, Atila Neves wrote:
>> http://atilanevesoncode.wordpress.com/2014/01/08/adding-java-and-c-to-the-mqtt-benchmarks-or-how-i-learned-to-stop-worrying-and-love-the-garbage-collector/
>
> I have to say, this is also my experience with C++ after I learnt D.
> Writing C++ is just so painful, so time-consuming, and so not rewarding
> for the amount of effort you put into it, that I just can't bring myself
> to write C++ anymore when I have the choice. And manual memory
> management is a big part of that time sink. Which is why I believe that
> a lot of the GC-phobia among the C/C++ folk is misplaced.  I can
> sympathise, though, because coming from a C/C++ background myself, I was
> highly skeptical of GC'd languages, and didn't find it to be a
> particularly appealing aspect of D when I first started learning it.
>
> But as I learned D, I eventually got used to having the GC around, and
> discovered that not only it reduced the number of memory bugs
> dramatically, it also increased my productivity dramatically: I never realized just how much time and effort it took to write code with manual memory management: you constantly have to think about how exactly you're going to be storing your objects, who it's going to get passed to, how to decide who's responsible for freeing it, what's the best strategy for deciding who allocates and who frees. These considerations permeate every aspect of your code, because you need to know whether to
> pass/return an object* to someone, and whether this pointer implies
> transfer of ownership or not, since that determines who's responsible to free it, etc.. Even with C++'s smart pointers, you still have to decide which one to use, and what pitfalls are associated with them (beware of cycles with refcounted pointers, passing auto_ptr to somebody might invalidate it after they return, etc.). It's like income tax: on just about every line of code you write, you have to pay the "memory
> management tax" of extra mental overhead and time spent fixing pointer bugs in order to not get the IRS (Invalid Reference Segfault :P)
> knocking on your shell prompt.

This is what initially drew me to D from C++.  Having a GC is a
huge productivity gain.

> Manual memory management is a LOT of effort, and to be quite honest, unless you're writing an AAA 3D game engine, you don't *need* that last 5% performance improvement that manual memory management *might* gives you. That is, if you get it right. Which most C/C++ coders don't.

The other common case is server apps, since unpredictable delays
can be quite undesirable as well.  Java seems to mostly get
around this by having very mature and capable GCs despite having
a standard library that wants you to churn through memory like
pies at an eating contest.  The best you can do with D so far is
mostly to just not allocate whenever possible, by slicing strings
and such, since scanning can still be costly.  I think there's
still some work to do here, despite loving the GC as a general
feature.
January 09, 2014
On Thursday, 9 January 2014 at 18:34:58 UTC, Walter Bright wrote:
> On 1/9/2014 10:18 AM, "Ola Fosheim Grøstad"
>> Why would you do that? You would have to overload cat then.
>
> So you agree that it won't work.

It will work for string literals or for malloc'ed strings, but not for both using the same function unless you start to depend on the data sections used for literals (memory range testing). Which is a dirty tool-dependent hack.

> Overloading doesn't work because a string literal and a string allocated by something else have the same type.

Not if you return your own type, but have the same structure? You return a struct, containing a variabled sized array of char, and overload on that?

But I see your point regarding literal/malloc, const char* and char* is a shady area, you can basically get anything cast to const char*.
January 09, 2014
Am 09.01.2014 19:34, schrieb Walter Bright:
> On 1/9/2014 10:18 AM, "Ola Fosheim Grøstad"
> <ola.fosheim.grostad+dlang@gmail.com>" wrote:
>> On Thursday, 9 January 2014 at 17:15:46 UTC, Walter Bright wrote:
>>> How does that work when you pass it "hello"? allocated with malloc()?
>>> basically any data that has mixed ancestry?
>>
>> Why would you do that? You would have to overload cat then.
>
> So you agree that it won't work.
>
> BTW, it happens all the time when dealing with strings. For example,
> dealing with filenames, file extensions, and paths. Components can come
> from the command line, string literals, malloc, slices, etc., all mixed
> up together.
>
> Overloading doesn't work because a string literal and a string allocated
> by something else have the same type.
>
>
>>> That doesn't work if you're passing strings with mixed ancestry.
>>
>> Well, you have to decide if you want to roll your own, use a framework
>> or use
>> the old C way.
>>
>> The point is more: you can make your own and make it C-compatible, and
>> reasonably efficient.
>
> My point is you can't avoid making the extra copies without GC in any
> reasonable way.
>

Every time I see such discussions, it reminds me when I started coding in the mid-80s and the heresy of using languages like Pascal and C dialects for microcomputers, instead of coding everything in Assembly or Forth.

:)

--
Paulo
January 09, 2014
On Thu, Jan 09, 2014 at 07:08:42PM +0000, digitalmars-d-bounces@puremagic.com wrote:
> On Thursday, 9 January 2014 at 18:34:58 UTC, Walter Bright wrote:
> >On 1/9/2014 10:18 AM, "Ola Fosheim Grøstad"
> >>Why would you do that? You would have to overload cat then.
> >
> >So you agree that it won't work.
> 
> It will work for string literals or for malloc'ed strings, but not for both using the same function unless you start to depend on the data sections used for literals (memory range testing). Which is a dirty tool-dependent hack.
> 
> >Overloading doesn't work because a string literal and a string allocated by something else have the same type.
> 
> Not if you return your own type, but have the same structure? You return a struct, containing a variabled sized array of char, and overload on that?
> 
> But I see your point regarding literal/malloc, const char* and char* is a shady area, you can basically get anything cast to const char*.

And since it is C, people expect to pass char* and const char* around. So most likely what will happen is that if there's any way at all to get a char* or const char* out of your opaque struct, they will do it, and then pass it to strcat, strlen, and who knows what else. You can't really stop this except by convention, because the language doesn't enforce the encapsulation, and making it truly opaque (via void* with PIMPL) will require an extra layer of indirection and make it unusable with commonly-expected C APIs like printf.

But we all know what happens with programming by convention when the team grows bigger -- old people who know the Right Way of doing things leave, and new people come in ignorant of how things are Supposed To Be, falling back to const char*, so the code quickly degenerates into a horrible mess of mixed conventions and memory leaks / pointer bugs everywhere. Then you start strdup'ing everything Just In Case. Which was Walter's original point.


T

-- 
By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality. -- D. Knuth
January 09, 2014
On Thu, Jan 09, 2014 at 08:16:12PM +0100, Paulo Pinto wrote:
> Am 09.01.2014 19:34, schrieb Walter Bright:
> >On 1/9/2014 10:18 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> >>On Thursday, 9 January 2014 at 17:15:46 UTC, Walter Bright wrote:
> >>>How does that work when you pass it "hello"? allocated with malloc()?  basically any data that has mixed ancestry?
> >>
> >>Why would you do that? You would have to overload cat then.
> >
> >So you agree that it won't work.
> >
> >BTW, it happens all the time when dealing with strings. For example, dealing with filenames, file extensions, and paths. Components can come from the command line, string literals, malloc, slices, etc., all mixed up together.
> >
> >Overloading doesn't work because a string literal and a string allocated by something else have the same type.
> >
> >
> >>>That doesn't work if you're passing strings with mixed ancestry.
> >>
> >>Well, you have to decide if you want to roll your own, use a framework or use the old C way.
> >>
> >>The point is more: you can make your own and make it C-compatible, and reasonably efficient.
> >
> >My point is you can't avoid making the extra copies without GC in any reasonable way.
> >
> 
> Every time I see such discussions, it reminds me when I started coding in the mid-80s and the heresy of using languages like Pascal and C dialects for microcomputers, instead of coding everything in Assembly or Forth.
> 
> :)
[...]

Ah, the good ole 80's. I remember I was strongly pro-assembly in those days. Back then compiler / interpreter technology was still rather young, and the little that I saw of it didn't leave a good impression, so I regarded all high-level languages with suspicion. :) Especially languages that sport "nice" string operators, since back then many language implementations had rather naïve string implementations, which are really slow and inefficient.


T

-- 
Always remember that you are unique. Just like everybody else. -- despair.com
January 09, 2014
On Thu, Jan 09, 2014 at 07:01:59PM +0000, Sean Kelly wrote:
> On Wednesday, 8 January 2014 at 19:17:08 UTC, H. S. Teoh wrote:
[...]
> >Manual memory management is a LOT of effort, and to be quite honest, unless you're writing an AAA 3D game engine, you don't *need* that last 5% performance improvement that manual memory management *might* gives you. That is, if you get it right. Which most C/C++ coders don't.
> 
> The other common case is server apps, since unpredictable delays can be quite undesirable as well.  Java seems to mostly get around this by having very mature and capable GCs despite having a standard library that wants you to churn through memory like pies at an eating contest.  The best you can do with D so far is mostly to just not allocate whenever possible, by slicing strings and such, since scanning can still be costly.  I think there's still some work to do here, despite loving the GC as a general feature.

I think we all agree that D's GC in its current state needs a lot of improvement. While I have come to accept GCs as a good thing, that doesn't mean that D's current GC is *that* good. Yet. I wish I had the know-how (and the time!) to improve D's GC, because if D can get a GC that's on par with Java's, then D can totally beat Java flat, since the existence of value types greatly reduces the memory pressure on the GC, so the GC will have much less work to do compared to an equivalent Java program.

OTOH, even with D's suboptimal GC, I'm already seeing great productivity gains at only a low cost, so that's a big thumbs up for GC's. And the nice thing about being able to call malloc from D (which you can't in Java) means you can still do manual memory management in critical code sections when you need to squeeze out some extra performance.


T

-- 
Turning your clock 15 minutes ahead won't cure lateness---you're just making time go faster!