January 07, 2013
On Monday, 7 January 2013 at 17:19:25 UTC, Jonathan M Davis wrote:
> I don't think that any of the documentation or D's developers have ever
> claimed that you could use the full language without the GC. Quite the
> opposite in fact. There are a number of language features that require the GC
> - including AAs, array concatenation, and closures.

True, there is some documentation describing that certain features require the use of the GC. Although I would say that the documentation needs to be made a lot more clear on this point. For example in the AA section there's no mention that the GC is required.

What you are saying is that while the GC is considered optional, it is not really optional given the language as a whole, only a (I assume large) subset of the language will work without the GC. In other words, the GC is partly optional.

I think we can do a lot better to make it more clear that the GC is not 100% optional, and also indicate clearly what features will not work without one.

>> You _can_
> program in D
> without the GC, but you lose features, and there's no way around that. It may
> be the case that some features currently require the GC when they shouldn't,
> but there are definitely features that _must_ have the GC and _cannot_ be
> implemented otherwise (e.g. array concatenation and closures).

Is this a hard fact, or can there be a way to make it work? For example what about the custom allocator idea?

From a marketing POV, if the language can be made 100% free of the GC it would at least not be a deterrent to those who cannot accept having to use one. From a technical POV, there are definitely many situations where not using a GC is desirable.

--rt
January 07, 2013
Rob T:

> What you are saying is that while the GC is considered optional, it is not really optional given the language as a whole, only a (I assume large) subset of the language will work without the GC. In other words, the GC is partly optional.

Technical users get angry when they uncover some marketing lies in technical documentation. It's much better to tell them the truth since the beginning.

Bye,
bearophile
January 07, 2013
On Mon, Jan 07, 2013 at 11:26:02PM +0100, Rob T wrote:
> On Monday, 7 January 2013 at 17:19:25 UTC, Jonathan M Davis wrote:
[...]
> >You _can_ program in D without the GC, but you lose features, and there's no way around that. It may be the case that some features currently require the GC when they shouldn't, but there are definitely features that _must_ have the GC and _cannot_ be implemented otherwise (e.g. array concatenation and closures).
> 
> Is this a hard fact, or can there be a way to make it work? For example what about the custom allocator idea?

Some features of D were *designed* with a GC in mind. As Jonathan has already said, array slicing, concatenation, etc., pretty much *require* a GC. I don't see how else you could implement code like this:

	int[] f(int[] arr) {
		assert(arr.length >= 4);
		return arr[2..4];
	}

	int[] g(int[] arr) {
		assert(arr.length >= 2);
		return arr[0..2];
	}

	int[] h(int[] arr) {
		assert(arr.length >= 3);
		if (arr[0] > 5)
			return arr[1..3];
		else
			return arr[2..3] ~ 6;
	}

	void main() {
		int[] arr = [1,2,3,4,5,6,7,8];
		auto a1 = f(arr[1..5]);
		auto a2 = g(arr[3..$]);
		auto a3 = h(arr[0..6]);
		a2 ~= 123;

		// Exercise for the reader: write manual deallocation
		// for this code.
	}

Yes, this code *can* be rewritten to use manual allocation, but it will be a major pain in the neck (not to mention likely to be inefficient, due to the required overhead of tracking where each array slice went and whether a reallocation was needed and what must be freed at the end).

Not to mention that h() makes it impossible to do static analysis in the compiler to keep track of what's going on (it will reallocate the array or not depending on runtime data, for example). So you're pretty much screwed if you don't have a GC.

To make it possible to do without the GC at the language level, you'd have to basically cripple most of the main selling points of D arrays, so that they become nothing more than C arrays with fancy syntax. Along with all the nasty caveats that made C arrays (esp. strings) so painful to work with. In particular, h() would require manual re-implementation and major API change (it needs to somehow return a flag of some sort to indicate whether or not the input array was reallocated), along with all code that calls it (check for the flag, then decide based on where a whole bunch of other pointers are pointing whether the input array needs to be deallocated, etc., all the usual daily routine of a C programmer's painful life). This cannot be feasibly automated, which means it can't be done by the compiler, which means using D doesn't really give you any advantage here, and therefore you might as well just write it in straight C to begin with.


> From a marketing POV, if the language can be made 100% free of the GC it would at least not be a deterrent to those who cannot accept having to use one. From a technical POV, there are definitely many situations where not using a GC is desirable.
[...]

I think much of the aversion to GCs is misplaced.  I used to be very aversive of GCs as well, so I totally understand where you're coming from. I used to believe that GCs are for lazy programmers who can't be bothered to think through their code and how to manage memory properly, and that therefore GCs encourage sloppy coding. But then, after having used D extensively for my personal projects, I discovered to my surprise that having a GC actually *improved* the quality of my code -- it's much more readable because I don't have to keep fiddling with pointers and ownership (or worse, reference counts), and I can actually focus on how to make the algorithms better. Not to mention the countless frustrating hours spent chasing pointer bugs and memory leaks are all gone -- 'cos I don't have to use pointers directly anymore.

As for performance, I have not noticed any significant performance problems with using a GC in my D code. Now I know that there are cases when the intermittent pause of the GC's mark-n-sweep cycle may not be acceptable, but I suspect that 90% of applications don't even need to care about this. Most applications won't even have any noticeable pauses.

The most prominent case where this *does* matter is in game engines, that must squeeze out every last drop of performance from the hardware, no matter what. But then, when you're coding a game engine, you aren't writing general application code per se; you're engineering a highly-polished and meticulously-tuned codebase where all data structures are already carefully controlled and mapped out -- IOW, you wouldn't be using GC-dependent features of D in this code anyway. So it shouldn't even be a problem.

The problem case comes when you have to interface this highly-optimized core with application-level code, like in-game scripting or what-not. I see a lot of advantages in separating out the scripting engine into a separate process from the high-performance video/whatever-handling code, so you can have the GC merrily doing its thing in the scripting engine (targeted for script writers, level designers, who aren't into doing pointer arithmetic in order to get the highest polygon rates from the video hardware), without affecting the GC-independent core at all. So you get the best of both worlds.

Crippling the language to cater to the 10% crowd who want to squeeze every last drop of performance from the hardware is the wrong approach IMO.


T

-- 
"Life is all a great joke, but only the brave ever get the point." -- Kenneth Rexroth
January 08, 2013
Yes I can see in your example why removing the GC fully will be difficult to deal with.

I am not actually against the use of the GC, I was only wondering if it could be fully removed. I too did not at first agree with the GC concept, thinking the same things you mention. I still have to consider performance issues caused by the GC, but the advantage is that I can do things that before I would not even bother attempting because the cost was too high. The way I program has changed for the better, there's no doubt about it.

So if the GC cannot be removed fully, then there's no point trying to fully remove it, and performance issues have to be solved through improving the GC implementation, and also with better selective manual control methods.

As for the claims made that D's GC is "optional", that message is coming from various sources one encounters when reading about D for the first time.

For example:
http://www.drdobbs.com/tools/new-native-languages/232901643
"D has grown to embrace a wide range of features — optional memory management (garbage collection), ..."

Sure you can "optionally" disable the GC, but it means certain fundamental parts of the language will no longer be usable, leading to misconceptions that the GC is fully optional and everything can be made to work as before.

I know D's documentation is *not* claiming that the GC is optional, you get that impression from reading external sources instead, however it may be a good idea to counter the possible misconception in the FAQ.

Improved documentation will also help those who want to do selective manual memory management. As it is, I cannot say for certain what parts of the language require the use of the GC because the specification either leaves this information out, or is not specified clearly enough.

--rt
January 08, 2013
On Tue, 8 Jan 2013, Rob T wrote:

> I am not actually against the use of the GC, I was only wondering if it could be fully removed. I too did not at first agree with the GC concept, thinking the same things you mention. I still have to consider performance issues caused by the GC, but the advantage is that I can do things that before I would not even bother attempting because the cost was too high. The way I program has changed for the better, there's no doubt about it.

There's some issues that can rightfully be termed "caused by the GC", but most of the performance issues are probably better labled "agregious use of short lived allocations", which cost performance regardless of how memory is managed.  The key difference being that in manual management the impact is spread out and in periodic garbage collection it's batched up.

My primary point being, blaming the GC when it's the application style that generates enough garbage to result in wanting to blame the GC for the performance cost is misplaced blame.

My 2 cents,
Brad
January 08, 2013
On Tuesday, 8 January 2013 at 02:06:02 UTC, Brad Roberts wrote:
> On Tue, 8 Jan 2013, Rob T wrote:
>
>> I am not actually against the use of the GC, I was only wondering if it could
>> be fully removed. I too did not at first agree with the GC concept, thinking
>> the same things you mention. I still have to consider performance issues
>> caused by the GC, but the advantage is that I can do things that before I
>> would not even bother attempting because the cost was too high. The way I
>> program has changed for the better, there's no doubt about it.
>
> There's some issues that can rightfully be termed "caused by the GC", but
> most of the performance issues are probably better labled "agregious use
> of short lived allocations", which cost performance regardless of how
> memory is managed.  The key difference being that in manual management the
> impact is spread out and in periodic garbage collection it's batched up.
>
> My primary point being, blaming the GC when it's the application style
> that generates enough garbage to result in wanting to blame the GC for the
> performance cost is misplaced blame.
>
> My 2 cents,
> Brad

You'll also find out that D's GC is kind of slow, but this is an implementation issue more than a conceptual problem with he GC.
January 08, 2013
On Tue, Jan 08, 2013 at 02:57:31AM +0100, Rob T wrote: [...]
> So if the GC cannot be removed fully, then there's no point trying to fully remove it, and performance issues have to be solved through improving the GC implementation, and also with better selective manual control methods.

I know people *have* tried to use D without GC-dependent features; it would be great if this information can be collected in one place and put into the official docs. That way, people who are writing game engines or real-time code know what to do, and the other 90% of coders can just continue using D as before.


> As for the claims made that D's GC is "optional", that message is coming from various sources one encounters when reading about D for the first time.
> 
> For example:
> http://www.drdobbs.com/tools/new-native-languages/232901643
> "D has grown to embrace a wide range of features — optional memory
> management (garbage collection), ..."
> 
> Sure you can "optionally" disable the GC, but it means certain fundamental parts of the language will no longer be usable, leading to misconceptions that the GC is fully optional and everything can be made to work as before.

Does Dr. Dobbs allow revisions to previously published articles? If not, the best we can do is to update our own docs to address this issue.


> I know D's documentation is *not* claiming that the GC is optional, you get that impression from reading external sources instead, however it may be a good idea to counter the possible misconception in the FAQ.

Yeah that's a good idea.


> Improved documentation will also help those who want to do selective manual memory management. As it is, I cannot say for certain what parts of the language require the use of the GC because the specification either leaves this information out, or is not specified clearly enough.
[...]

I don't know if I know them all, but certainly the following are GC-dependent:

- Slicing/appending arrays (which includes a number of string
  operations), .dup, .idup;
- Delegates & anything requiring access to local variables after the
  containing scope has exited;
- Built-in AA's;
- Classes (though I believe it's possible to manually manage memory for
  classes via Phobos' emplace), including exceptions (IIRC);
- std.container (IIRC Andrei was supposed to work on an allocator model
  for it so that it's usable without a GC)

AFAIK, the range-related code in Phobos has been under scrutiny to contain no hidden allocations (hence the use of structs instead of classes for various range constructs). So unless there are bugs, std.range and std.algorithm should be safe to use without involving the GC.

Static arrays are GC-free, and so are array literals (I *think*) as long as you don't do any memory-related operation on them like appending or .dup'ing. So strings should be still somewhat usable, though quite limited. I don't know if std.format (including writefln & friends) invoke the GC -- I think they do, under the hood. So writefln may not be usable, or maybe it's just certain format strings that can't be used, and if you're careful you may be able to pull it off without touching the GC.

AA literals are NOT safe, though -- anything to do with built-in AA's will involve the GC. (I have an idea that may make AA literals usable without runtime allocation -- but CTFE is still somewhat limited right now so my implementation doesn't quite work yet.)

But yeah, it would be nice if the official docs can indicate which features are GC-dependent.


T

-- 
Latin's a dead language, as dead as can be; it killed off all the Romans, and now it's killing me! -- Schoolboy
January 08, 2013
On Tuesday, 8 January 2013 at 02:06:02 UTC, Brad Roberts wrote:
> There's some issues that can rightfully be termed "caused by the GC", but
> most of the performance issues are probably better labled "agregious use
> of short lived allocations", which cost performance regardless of how
> memory is managed.  The key difference being that in manual management the
> impact is spread out and in periodic garbage collection it's batched up.
>
> My primary point being, blaming the GC when it's the application style
> that generates enough garbage to result in wanting to blame the GC for the
> performance cost is misplaced blame.
>
> My 2 cents,
> Brad

There's more to it than just jerkiness caused by batching. The GC will do collection runs at inappropriate times, and that can cause slow downs well in excess of an otherwise identical application with manual memory management. For example, I've seen 3x performance penalty caused by the GC doing collection runs at the wrong times. The fix required manually disabling the GC during certain points and re-enabling afterwards.

The 2 or 3 lines of extra code I inserted to fix the 3x performance penalty was a lot easier than performing full manual management, but it means that you cannot sit back and expect the GC to always do the right thing.

--rt

January 08, 2013
On Monday, 7 January 2013 at 23:13:13 UTC, H. S. Teoh wrote:
> ...
>
> Crippling the language to cater to the 10% crowd who want to squeeze
> every last drop of performance from the hardware is the wrong approach
> IMO.
>
>
> T

Agreed.

Having used GC languages for the last decade, I think the cases where manual memory management is really required are very few.

Even if one is forced to do manual memory management over GC, it is still better to have the GC around than do everything manually.

But this is based on my experience doing business applications, desktop and server side or services/daemons.

Other's experience may vary.

--
Paulo
January 08, 2013
On Monday, 7 January 2013 at 17:19:25 UTC, Jonathan M Davis wrote:
> On Monday, January 07, 2013 17:55:35 Rob T wrote:
>> On Monday, 7 January 2013 at 16:12:22 UTC, mist wrote:
>> > How is D manual memory management any worse than plain C one?
>> > Plenty of language features depend on GC but stuff that is left
>> > can hardly be named "a lousy excuse". It lacks some convenience
>> > and guidelines based on practical experience but it is already
>> > as capable as some of wide-spread solutions for systems
>> > programming (C). In fact I'd be much more afraid of runtime
>> > issues when doing system stuff than GC ones.
>> 
>> I think the point being made was that built in language features
>> should not be dependent on the need for a GC because it means
>> that you cannot fully use the language without a GC present and
>> active. We can perhaps excuse the std library, but certainly not
>> the language itself, because the claim is made that D's GC is
>> fully optional.
>
> I don't think that any of the documentation or D's developers have ever
> claimed that you could use the full language without the GC. Quite the
> opposite in fact. There are a number of language features that require the GC
> - including AAs, array concatenation, and closures. You _can_ program in D
> without the GC, but you lose features, and there's no way around that. It may
> be the case that some features currently require the GC when they shouldn't,
> but there are definitely features that _must_ have the GC and _cannot_ be
> implemented otherwise (e.g. array concatenation and closures). So, if you want
> to ditch the GC completely, it comes at a cost, and AFAIK no one around here
> is saying otherwise. You _can_ do it though if you really want to.
>
> In general however, the best approach if you want to minimize GC involvement
> is to generally use manual memory management and minimize your usage of
> features that require the GC rather than try and get rid of it entirely,
> because going the extra mile to remove its use completely generally just isn't
> worth it. Kith-Sa posted some good advice on this just the other day, and he's
> written a game engine in D:
>
> http://forum.dlang.org/post/vbsajlgotanuhmmpnspf@forum.dlang.org
>
> - Jonathan M Davis

Just speaking as a bystander but I believe it is becoming apparent that a good guide to using D without the GC is required. We have a growing number of users who could be useful converts doing things like using it as a game engine, giving some general help with approaches and warnings about what does and doesn't require the GC would greatly smooth the process. Sadly I lack the talent to write such a guide.