September 14, 2016
On Wednesday, 14 September 2016 at 06:33:59 UTC, Shachar Shemesh wrote:
> On 14/09/16 09:05, Walter Bright wrote:
>> On 9/13/2016 10:38 PM, Shachar Shemesh wrote:
>>> But if you do want to allow it, then my original problem comes back.
>>> You have to
>>> scan the malloced memory because you are not sure where that memory might
>>> contain pointers to GC managed memory.
>>
>> If mallocing for types that are statically known at compile time, it
>> should be knowable if they need scanning or not.
>>
>
> I believe you are under the assumption that structs will not be GC allocated. I don't think it is a good assumption to make. Even if it is, however:
>
> struct SomeStruct {
>   string something;
> }
>
> Please let me know if scanning this struct for GC pointers is necessary or not. Also, even if this is knowable, I'm not sure how you are suggesting we mange it.
>
> The only practical approach I see is that if any RC managed memory might contain pointers to GC managed memory, then all RC managed memory needs to be scanned (unless you think the compiler can help out with that).
>
> Shachar

In D code that I have read where people use RC types they have different names making it quite clear - eg RCString.but

If you're worrying about GC presumably you have a decent size problem anyway (and I admire the ambition of weka in this respect).  So then seems to me that yes,  there is a tax in writing code that uses the language in a way that a minority of D users are using it in.   The generation after the pioneers, if they avoid the arrows of the prior generation, at least have a lot more hard work than subsequent generations.   But on the other hand it is a package deal, and initial costs amortise.

How would you end up with a GC allocated struct by mistake (presuming you think it through first) at the size you are at? 200k lines and 30 people is a lot,  but it's also not Windows scale.   And if you did,  and it mattered,  wouldn't you pick it up quickly with GC profiling?




September 14, 2016
On Wed, Sep 14, 2016 at 05:19:45AM -0700, Jonathan M Davis via Digitalmars-d wrote:
> On Tuesday, September 13, 2016 16:13:05 H. S. Teoh via Digitalmars-d wrote:
> > On Tue, Sep 13, 2016 at 03:19:54PM -0700, Jonathan M Davis via Digitalmars-d wrote: [...]
> >
> > > But none of the code that's marked @nogc can throw an exception unless you're either dealing with pre-allocated exceptions (in which case, they're less informative),
> >
> > I don't see why pre-allocated exceptions would be less informative. You can always modify the exception object before throwing it, after all.  In fact, I've always wondered about the feasibility of a @nogc exception handling system where the exception is emplaced onto a fixed static buffer, so that no allocation (except at the start of the program) is actually necessary. Of course, chained exceptions throw(!) a monkey wrench into the works, but assuming we forego chained exceptions, wouldn't this work around the problem of being unable to allocate exceptions in @nogc code? (Albeit with its own limitations, obviously.  But it would be better than being unable to use exceptions at all in @nogc code.)
> 
> As Walter points out, it's a problem if exceptions are ever saved (which some code does need to do). The fact that you lose chaining as you pointed out is also a problem.

Honestly, I've never actually run across a real-life case where chained exceptions matter. Most of the code (that I work with, anyway) involve simply throwing an exception when some problem occurs, and the catch block simply prints the error message and aborts the current operation. I agree that chained exceptions are theoretically cool and everything, but I haven't actually found myself needing them.


> You also have problems because the file, line, and message of the exception either aren't going to be specific to when that exception is thrown, or you have to set them all before throwing, in which case you have issues with if/when the exception is reused.

Clearly, if a static buffer is to be used for emplacing the exception, you'll have to sacrifice some things.  But I'm just saying that the limitations aren't as onerous as it may seem at first, and in fact covers a lot of common use cases.


> And regardless of all of that, the fact that string is used for the message throws a wrench in things for @nogc, since that's going to require GC allocation unless you cast something to string, and then you have serious problems if you need to mutate it later, since you'll end up violating the compiler guarantees for immutable.

Most uses of exceptions in code that I've seen involve setting a static string as the message. This does not require GC allocation.

Granted, it's also relatively common to make the exception message more verbose / informative, e.g. using format() to embed specific details about the problem besides the static message string. My own code uses this idiom quite often. This would have to be sacrificed, or some other workaround found, of course.


[...]
> > There's nothing about the 'throw' keyword that requires GC allocation.  It's just that `throw new Exception(...)` has become a standard incantation. The exception object itself can, for example, be emplaced onto a static buffer as I propose above.
> 
> Yes, there are ways to work around allocating an exception with new right before throwing it, but that's really how things are designed to work, and there are serious problems with any attempt to work around it.  At minimum, it makes throwing exceptions to be a lot more of a pain then it is when using the GC [...]

I disagree. There is nothing about 'throw' that requires the use of 'new'.  A further development of the emplacement idea is to pre-initialize a region allocator specifically for throwing exceptions, then you can write:

	throw makeException("Error occurred", ...);

where makeException is a global function that allocates the exception using the region allocator (which is initialized at startup). The catch block then deallocates the region and re-initializes it.  This is not that much more painful than writing:

	throw new Exception("Error occurred", ...);

This is just a quick-n-dirty example, of course. In an actual implementation you'd templatize makeException() so that you can create different exception types, e.g.:

	throw make!UserDefinedException("...", ...);

Filename, line numbers, etc., can be easily accomodated the same way they're currently handled in exception ctors.


> But the fact that the workarounds either require that you don't have unique, independent exceptions or that you know that you need to manually free the exception after catching it is a serious problem. And that's without even taking the exception chaining into account.
[...]

Using a preinitialized region allocator, we no longer have such limitations.


T

-- 
GEEK = Gatherer of Extremely Enlightening Knowledge
September 14, 2016
On Wednesday, September 14, 2016 07:43:29 H. S. Teoh via Digitalmars-d wrote:
> > But the fact that the workarounds either require that you don't have unique, independent exceptions or that you know that you need to manually free the exception after catching it is a serious problem. And that's without even taking the exception chaining into account.
>
> [...]
>
> Using a preinitialized region allocator, we no longer have such limitations.

And how would you deal with the fact that the catching code is generally going to assume that it has a GC-allocated exception? It's not going to do anything to free the exception when it's done with it, and it could keep the exception around for an indeterminate amount of time. So, having the code that manages the allocator later assume that the exception was freed would be unsafe. I don't see how it's going to work in the general case to have a an exception class which is anything other than GC allocated - not as long as it can't be wrapped in a struct that knows how to deal with freeing its memory when it's done. Because you either end up with an exception that gets leaked, because the catching code doesn't know that it needs to do something to free it, or you run the risk of it being freed prematurely when the code that manages its memory assumes that the code that caught it didn't keep it.

Obviously, if you're talking about a smaller application that isn't sharing code with anything, you have more control over what's going on, and you can afford to make assumptions about what is happening with exceptions and are thus more likely to get away with stuff that won't work in the general case. But libraries definitely have to care, and larger applications are going to tend to have to care, because if there's enough code, it simply isn't going to work to assume that it all behaves in a way that differs from how exceptions normally work. Someone else is going to come onto the project and write perfectly normal D code that then has nasty bugs when an exception gets thrown, because other code within that large application was doing something with exceptions that didn't work with normal D code (like allocating them with a different allocator that won't allow you to hold onto the exception for an arbitrary amount of time without it being mutated out from under you or even outright freed).

- Jonathan M Davis

September 14, 2016
On Wednesday, 14 September 2016 at 07:55:24 UTC, John Colvin wrote:
> On Tuesday, 13 September 2016 at 22:28:09 UTC, deadalnix wrote:
>> On Tuesday, 13 September 2016 at 22:19:54 UTC, Jonathan M Davis wrote:
>>> The big problem with exceptions being allocated by the GC isn't really the GC but @nogc.
>>
>> No the problem IS @nogc . Allocating with the GC is absolutely not a problem is you deallocate properly. What is a problem is when you leak (ie, when the ownership is transferred to the GC). If you don't leak, GC do not kicks in.
>
> Can you explain a bit more here? Do you mean in practice (I.e. in current implementation) or in theory?

My point is that if you have lifetime information for some data (and it looks like this is where we want to go with things like DIP25 and DIP1000, but let's not get lost in the specific of these proposal now) you know they are going to end up being freed without the GC having to do a collection cycle.

Therefore, you know you'll not end up having to rely on the GC as long as you can track lifetime, even if you allocate with it.

Now that this is established it follows that disallowing GC allocation in @nogc code is needlessly restrictive and promote unsafe patterns (like allocating using malloc + giving the range to the GC, which is both slower than allocating on the GC directly and more error prone).

A more sensible approach is to allow GC allocation in @nogc code but disallow cases where GC's alloc lifetime cannot be tracked. Note that this isn't the case for most exceptions.

For instance, when you do
throw new Exception("tagada");

The compiler can deduce that the ownership of the exception is transferred to the runtime, which will transfers back to the catch block that gets it. Depending of what this catch block is doing, it may or may not be @nogc, but there is no reason that for throw to not be allowed in @nogc code.

However, if you have something like
throw e;

With e a reference to an Exception who's lifetime cannot be tracked, then it makes sense to disallow it in @nogc code.

TL;DR : The problem is not new, the problem is the rhs of assignment operations.
September 14, 2016
On Wednesday, 14 September 2016 at 13:28:45 UTC, finalpatch wrote:
> On Tuesday, 13 September 2016 at 18:24:26 UTC, deadalnix wrote:
>> No you don't, as how often the GC kicks in depend of the rate at which you produce garbage, which is going to be very low with an hybrid approach.
>
> This is simply not true.
>
> Assume in a pure GC program the GC heap can grow up to X Mb before a collection cycle happens, which has to scan X Mb of memory.
>

No it has to scan the live set. If we assume we are ready to accept a 2X overhead in the GC heap in that program, the collection cycle needs to scan X/2 Mb of memory. We are for a bad start here.

> Now let's say we have a hybrid program that uses 0.5X Mb of RCed memory and 0.5X Mb of GC memory so the total memory consumption is still X Mb. When the GC heap reaches 0.5X Mb, it has to scan both RC and GC memory.
>

Your assumption that there are 2 heap is bogus, your 2 programs have different live sets (A has 500kb and B 750ko of live sets). In addition, why the fuck is your RC system only able to reclaim 50% of the garbage emitted ? Even with such stupids humber, you end up with program A able to manage 500kb with 100% overhead, and program B able to manage 750ko with 33% overhead, which completely proves my point: the hybrid approach is far superior.

Now let's get an appropriate model of how thing work in the real world. Let's assume we have a program that emit 1Mb of garbage per second, has a live set of 1Mb and we assume we can accept a 2X memory overhead for the GC.

With the pure GC approach, we emit 1Mb of garbage per second on top of our live set of 1Mb, so we need one collection cycle per second. This collection cycle has to scan the living set, namely 1Mb of data.

With the hybrid approach, we still emit 1Mb of garbage per second, but the RC system can reclaim 90% of it. We end up with a rate of garbage for the GC to collect of 100ko per second. If we allow the same memory overhead, we end up with a collection cycle every 10s. The live set still has the same size, so the GC still has to scan 1Mb of data. Therefore, we effectively divided by 10 the resource we needed to allocate to the GC.

> It's quite obvious that the time(t) it takes for program 1 to produce X Mb of garbage is the same as program 2 to produce 0.5X Mb of garbage, and after time t, both program have to scan X Mb of memory.  However program 2 also has to pay the cost of reference counting on top of that.
>

Rule of thumb, when someone start by "it's obvious that" you can be sure that 99% of the time, what follows is confirmation bias rather than anything cogent. I think we've established this is the case here.

September 15, 2016
On 14/09/16 16:49, Laeeth Isharc wrote:

> In D code that I have read where people use RC types they have different
> names making it quite clear - eg RCString.but

I find the suggestion that the compiler make code generation decisions based on type names not one I would gladly see happen.

>
> If you're worrying about GC presumably you have a decent size problem
> anyway (and I admire the ambition of weka in this respect).

Not everything a Weka employee says on this forum is about what Weka is doing.

> How would you end up with a GC allocated struct by mistake (presuming
> you think it through first) at the size you are at? 200k lines and 30
> people is a lot,  but it's also not Windows scale.   And if you did,
> and it mattered,  wouldn't you pick it up quickly with GC profiling?

We didn't end up with a struct that was allocated by mistake. In fact, I do not consider what we're doing to be a hybrid approach. More of a "use GC only when the language leaves us no other choice" approach, which is far from being the same.

With the hybrid approach, getting there is far from difficult.

struct SomeRCNonGCDataStructure {
}

...

class SomethingUseful {
   private SomeRCNonGCDataStructure dataStructure;
}

Unless you suggest that people implement each algorithm twice, having structs on the heap may be hard to avoid.

Shachar
September 15, 2016
On Wednesday, 14 September 2016 at 14:43:29 UTC, H. S. Teoh wrote:
> Honestly, I've never actually run across a real-life case where chained exceptions matter. Most of the code (that I work with, anyway) involve simply throwing an exception when some problem occurs, and the catch block simply prints the error message and aborts the current operation.

See e.g. https://github.com/dlang/druntime/blob/master/src/core/thread.d#L783
September 15, 2016
On Tuesday, 13 September 2016 at 17:59:52 UTC, Andrei Alexandrescu wrote:
> So do you agree you were wrong in positing 2x as the rule of thumb?

I didn't look into how he connects perceived application performance with collection frequency.
September 15, 2016
On Monday, 12 September 2016 at 22:57:23 UTC, Andrei Alexandrescu wrote:
> [snip] If it is, that provides more impetus for reference counting for D by the following logic: (a) it is likely that in the future more code will run on portable, battery-powered systems; (b) battery power does not follow a Moore trajectory, so at this point in history demand for battery lifetime is elastic.
>
>
> Andrei

I'm keen on reference counting because of embedded and bare-metal programming so I'm super keen on D having a short reference long garbage standard. I guess we already do this to some extent with scope etc.

Battery life can last for years in some of the new embedded devices but most languages targeting the M0 etc are really archaic, I have seen a rust port somewhere... but if I'm honest I only really care for D so I really want LDC -> ARM M0 etc... :D
September 16, 2016
On Wed, 14 Sep 2016 13:37:03 +0000, Laeeth Isharc wrote:

> On Wednesday, 14 September 2016 at 13:28:45 UTC, finalpatch wrote:
>> On Tuesday, 13 September 2016 at 18:24:26 UTC, deadalnix wrote:
>>> No you don't, as how often the GC kicks in depend of the rate at which you produce garbage, which is going to be very low with an hybrid approach.
>>
>> This is simply not true.
>>
>> Assume in a pure GC program the GC heap can grow up to X Mb before a collection cycle happens, which has to scan X Mb of memory.
>>
>> Now let's say we have a hybrid program that uses 0.5X Mb of RCed memory and 0.5X Mb of GC memory so the total memory consumption is still X Mb. When the GC heap reaches 0.5X Mb, it has to scan both RC and GC memory.
> 
> Could you elaborate?

You can store a pointer to a GC-owned memory block inside an RCed object, just like how you can store a pointer to a GC-owned memory block on the stack.

There are three ways to handle this:

* Keep a pointer to the GCed object inside GCed memory.
* Tell the GC to pin the object, preventing it from being collected.
* Have the GC scan RCed memory as well as GC-owned memory.
1 2 3 4 5 6
Next ›   Last »