September 29, 2009
Walter Bright:

>No, it is done with one indirection.<

If even Andrei, a quite intelligent person that has written big books on C++, may be wrong on such a basic thing, then I think there's a problem.

It can be good to create an html page that explains how some basic things of D are implemented in the front-end. Such page can also contain box & arrow images that show how structures and memory are organized for various of such data structures.

Such html page is useful for both normal programmers that want to understand what's under the hood, and for people that may want to fix/modify the front-end.

Bye,
bearophile
September 29, 2009
Don:

> Maybe if D had better flow analysis, the demand for non-nullable references wouldn't be so great.

I know a good enough C# programmer that agrees with you, he says that thanks to the flow analysis C#compiler performs, the need for non-nullable references is not so strong.


> (Neither is a pure subset of the other, flow analysis works for all variables, non-nullable references catches more complex logic errors. But there is a very significant overlap).

I like how you can see things a little more clearly than other people (like me).
Flow analysis helps for all variables, but it's limited in the scope. Nonnullable references are a program-wide contract, their effect extends to called functions, etc. And helps avoid null tests inside them too.
Probably flow analysis is the most important among such two features. I think having both is better, they can work in synergy.

Bye,
bearophile
September 29, 2009
Derek Parnell wrote:
> On Mon, 28 Sep 2009 19:27:03 -0500, Andrei Alexandrescu wrote:
> 
>> language_fan wrote:
>>>   int min;
>>>
>>>   foreach(int value; list)
>>>     if (value < min) min = value;
>>>
>>> Oops, you forgot to define a flag variable or initialize to int.min
>> You mean int.max :o).
> 
>   if (list.length == 0)
>      throw( some exception); // An empty or null list has no minimum
>   int min = list[0];   foreach(int value; list[1..$])
>     if (value < min) min = value;
> 
> 
> I'm still surprised by Walter's stance.
> 
> For the purposes of this discussion...
> * Null only applies to the memory address portion of reference types and
> not to value types. The discussion is not about non-nullable value types.
> * There are two types of reference types:
>   (1) Those that can be initialized on declaration because the coder knows
> what to initialize them to; a.k.a. non-nullable. If the coder does not know
> what to initialize them to at declaration time, then either the design is
> wrong, the coder doesn't understand the algorithm or application, or it is
> truly a complex run-time decision.
>   (2) Those that aren't in set (1); a.k.a. nullable.
> * The standard declaration should imply non-nullable. And if not
> initialized the compiler should complain. This encourages protection, but
> does not guarantee it, of course.
> * To declare a nullable type, use a special syntax to denote that the coder
> is deliberately choosing to declare a nullable reference.
> * The compiler will prevent non-nullable types being simply set to null. As
> D is a system language too, there will be a rare cases that need to subvert
> this compiler protection, so there will need to be a method to explicitly
> set a non-nullable type to a null. The point is that such a method should
> be a visible warning beacon to maintenance coders.
> 
> Priority should be given to coders that prefer safe coding. If a coder, for
> whatever reason, chooses to use nullable references or initialize
> non-nullable reference to rubbish data, then the responsibility is on them
> to ensure safe applications. Safe coding practices should not be penalized.
> 
> The C/C++ programming language is inherently "unsafe" in this regard, and
> that is not news to anyone. The D programming language does not have to
> follow this paradigm.

But it doesn't have to follow the paranoid safety paradigm either. I wouldn't like two reference types and casting between the two when they're essentially the same with one having a single value that can't be set out of 4 billions possibilities. Seems like a waste to me, especially since 3 billions of these possibilities will result in the same segfault crash than that one you're trying to make illegal on nonnull types.

> I'm still not ready to use D for anything, but I watch it in hope.

I'm already using D quite a lot, I don't find null vs nonnull references all that meaningful. Like walter said, you can just make your own nonnull invariant.

Here's a very, very simple wrapper, took 10 seconds to write:

struct NonNull(C) if(is(C == class)) {
	C ref;
	invariant() { assert(ref !is null); }
	T opDot() { return ref; }
}

C++ has all sort of pointer wrappers like this one, you don't see a smart pointer feature in the C++ language for the simple reason its widely used and safer. In fact letting the semantics of these pointers up to libraries allow any project to write its custom ones, and quite a lot do.

It should be the same for D, I believe its better to implement flow analysis and let the compiler warn you of uninitialized variables (which will solve most nullptr references, the other half being by NonNull!Object fields). The compiler could also provide better tools to build smart wrapper types upon (like force initialization or prevent void initialization, heck even provide a tuple of valid initializers) and let libraries write their own.

Jeremie
September 29, 2009
bearophile wrote:
> Walter Bright:
> 
>> No, it is done with one indirection.<
> 
> If even Andrei, a quite intelligent person that has written big books on C++, may be wrong on such a basic thing, then I think there's a problem.
> 
> It can be good to create an html page that explains how some basic things of D are implemented in the front-end. Such page can also contain box & arrow images that show how structures and memory are organized for various of such data structures.
> 
> Such html page is useful for both normal programmers that want to understand what's under the hood, and for people that may want to fix/modify the front-end.
> 
> Bye,
> bearophile

I agree, the ABI documentation on digitalmars.com is far from complete, I had to learn a lot of it through trial and error. What was especially confusing was the interface reference vs the interface info vs the interface's classinfo vs the referenced object, I wrote an internal wrapper struct to make most of the casts go away:

struct Interface {
	Object object() const {
		return cast(Object)(cast(void*)&this - interfaceinfo.offset);
	}

	immutable(InterfaceInfo)* interfaceinfo() const {
		return **cast(InterfaceInfo***)&this;
	}

	immutable(ClassInfo) classinfo() const {
		return interfaceinfo.classinfo;
	}
}

immutable struct InterfaceInfo {
	ClassInfo			classinfo;
	void*[]				vtbl;
	ptrdiff_t			offset;
}

These two made implementing D internals a whole lot easier! I think only InterfaceInfo is in druntime (and its confusingly named Interface in there).
September 29, 2009
Yigal Chripun wrote:
> On 29/09/2009 00:31, Nick Sabalausky wrote:
>> "Yigal Chripun"<yigal100@gmail.com>  wrote in message
>> news:h9r37i$tgl$1@digitalmars.com...
>>>
>>>>
>>>> These aren't just marginal performance gains, they can easily be up to
>>>> 15-30% improvements, sometimes 50% and more. If this is too complex or
>>>> the risk is too high for you then don't use a systems language :)
>>>
>>> your approach makes sense if your are implementing say a calculator.
>>> It doesn't scale to larger projects. Even C++ has overhead compared to
>>> assembly yet you are writing performance critical code in c++, right?
>>>
>>
>> It's *most* important on larger projects, because it's only on big systems
>> where small inefficiencies actually add up to a large performance drain.
>>
>> Try writing a competitive real-time graphics renderer or physics simulator
>> (especially for a game console where you're severely limited in your choice
>> of compiler - if you even have a choice), or something like Pixar's renderer
>> without *ever* diving into asm, or at least low-level "unsafe" code. And
>> when it inevitably hits some missing optimization in the compiler and runs
>> like shit, try explaining to the dev lead why it's better to beg the
>> compiler vender to add the optimization you want and wait around hoping they
>> finally do so, instead of just throwing in that inner optimization in the
>> meantime.
>>
>> You can still leave the safe/portable version in there for platforms for
>> which you haven't provided a hand-optimization. And unless you didn't know
>> what you were doing, that inner optimization will still be small and highly
>> isolated. And since it's so small and isolated, not only can you still throw
>> in tests for it, but it's not as much harder as you would think to veryify
>> correctness. And if/when your compiler finally does get the optimization you
>> want, you can just rip out the hand-optimization and revert back to that
>> "safe/portable" version that you had still left in anyway as a fallback.
>>
>>
> 
> I think you took my post to an extreme, I actually do agree with the above description.
> 
> what you just said was basically:
> 1. write portable/safe version
> 2. profile to find bottlenecks that the tools can't optimize and optimize those only while still keeping the portable version.
> 
> My objection was to what i feel was Jeremie's description of writing code from the get go in low level hand optimized way instead of what you described in your own words:

That wasn't what I said, I don't low level hand optimize everything, I do profiling first, only a few parts *known* to me to require optimizations (ie matrix multiplication) are written in sse from the beginning with a high level fallback, there just happen to be a lot of them :)

What I argued about was your view on today's software being too big and complex to bother optimize it.

>> And unless you didn't know
>> what you were doing, that inner optimization will still be small and highly
>> isolated.
September 29, 2009
bearophile wrote:
> Don:
> 
>> Maybe if D had better flow analysis, the demand for non-nullable references wouldn't be so great.
> 
> I know a good enough C# programmer that agrees with you, he says that thanks to the flow analysis C#compiler performs, the need for non-nullable references is not so strong.

Which is what I said half a dozen times in this thread :)
Disclaimer: I have only read about C#, didn't code it.

>> (Neither is a pure subset of the other, flow analysis works for all variables, non-nullable references catches more complex logic errors. But there is a very significant overlap).
> 
> I like how you can see things a little more clearly than other people (like me).
> Flow analysis helps for all variables, but it's limited in the scope. Nonnullable references are a program-wide contract, their effect extends to called functions, etc. And helps avoid null tests inside them too.
> Probably flow analysis is the most important among such two features. I think having both is better, they can work in synergy.
> 
> Bye,
> bearophile

Flow analysis must be implemented by the compiler, nonnull references can be enforced by a runtime wrapper (much like smart_ptr enforces addref and release calls in C++, you don't see smart_ptr being moved in the language spec even if half the C++ community would drool over the idea).

The best thing about flow analysis is that we can take away the whole default initializer idea, since it was made to make non-initialized variable errors easy to pinpoint in the first place, not as a convenience to turn "int a = 0;" into "int a;".

Besides DMD must have some basic flow analysis already since it does notice when a code path does not return, it just need to be extended to include unitialized variables.
September 29, 2009
Jeremie Pelletier:

> Flow analysis must be implemented by the compiler, nonnull references can be enforced by a runtime wrapper

The point of nonnull references is all in its compile-time enforced constraints.


> Besides DMD must have some basic flow analysis already since it does notice when a code path does not return, it just need to be extended to include unitialized variables.

You have probably missed them, but flow analysis in D was discussed a lot in the past. I don't think Walter wants to implement it. If you help implement it, showing that it can be done, he may change his mind.

Bye,
bearophile
September 29, 2009
bearophile wrote:
> Jeremie Pelletier:
> 
>> Flow analysis must be implemented by the compiler, nonnull references can be enforced by a runtime wrapper
> 
> The point of nonnull references is all in its compile-time enforced constraints.
> 
> 
>> Besides DMD must have some basic flow analysis already since it does notice when a code path does not return, it just need to be extended to include unitialized variables.
> 
> You have probably missed them, but flow analysis in D was discussed a lot in the past. I don't think Walter wants to implement it. If you help implement it, showing that it can be done, he may change his mind.
> 
> Bye,
> bearophile

I'll try and hack at it in a few weeks when I get some free time. Its definitely standing high on my D wishlist.

Jeremie
September 29, 2009
On Mon, 28 Sep 2009 21:43:20 -0400, Jesse Phillips <jessekphillips@gmail.com> wrote:

> On Mon, 28 Sep 2009 16:01:10 -0400, Steven Schveighoffer wrote:
>
>> On Mon, 28 Sep 2009 15:35:07 -0400, Jesse Phillips
>> <jesse.k.phillips+d@gmail.com> wrote:
>>
>>> language_fan Wrote:
>>>
>>>> Have you ever used functional languages? When you develop in Haskell
>>>> or SML, how often you feel there is a good change something will be
>>>> initialized to the wrong value? Can you show some statistics that show
>>>> how unsafe this practice is?
>>>
>>> So isn't that the question? Does/can "default" (by human or machine)
>>> initialization create an incorrect state? If it does, do we continue to
>>> work as if nothing was wrong or crash? I don't know how often the
>>> initialization would be incorrect, but I don't think Walter is
>>> concerned with it's frequency, but that it is possible.
>>
>> It creates an invalid, non-compiling program.
>
> No it doesn't, I'm not referring to null as the invalid state.
>
> float a;
>
> In this program it is invalid for 'a' to equal zero. If the compiler
> complains it is not initialized the programmer could fulfill the
> requirements.

I am not arguing for floats (or any value types) to be required to be initialized.

>
> float a = 0;
>
> Hopefully the programmer knows that it shouldn't be 0, but a correction
> like this is still possible, the compiler won't complain and the program
> won't crash. Depending on what 'a' is controlling this could be very bad.
>
> I'm really not arguing either way, I'm trying to make it clear since no
> one seems to be getting Walters positions.

I get his arguments, but I think they are based on an non-analagous situation.  I think his arguments are based on his experience with compilers or corporate rules requiring what you were saying -- initializing all variables.  We don't want that, we just want the developer to clarify "this variable is initialized" or "this variable is ok to be uninitialized".

> BTW, what is it with people writing
>
> SomeObject foo;
>
> If they believe the compiler should enforce explicit initialization? If
> you think an object should always be initialized at declaration don't
> write a statement that only declares and don't set a reference to null.

It's more complicated than that.  For example, you *have* to write this for objects that are a part of aggregates:

class SomeOtherObject
{
  SomeObject foo; // can't initialize here, because you need to use the heap, and compiler only allows CTFE initialization.

  this()
  {
     foo = new SomeObject(); // here is where the initialization sits.
  }
}

This is ok, but what if the initialization is buried, or you add another variable to a large class and forgot to add the initializer to the constructor?

And there *are* cases where you *don't* want to initialize, that should also be possible:

SomeObject? foo;

If this wasn't part of the proposal, I'd agree with Walter 100%, but it gives the lazy programmer an easy way to default to the current behavior (easier than building some dummy object), so given the lazy nature of said programmer, they are more likely to do this than assign a dummy value.

-Steve
September 29, 2009
== Quote from Jeremie Pelletier (jeremiep@gmail.com)'s article
> Andrei Alexandrescu wrote:
> > Jeremie Pelletier wrote:
> >>> Is this Linux specific? what about other *nix systems, like BSD and solaris?
> >>
> >> Signal handler are standard to most *nix platforms since they're part of the posix C standard libraries, maybe some platforms will require a special handling but nothing impossible to do.
> >
> > Let me write a message on behalf of Sean Kelly. He wrote that to Walter and myself this morning, then I suggested him to post it but probably he is off email for a short while. Hopefully the community will find a solution to the issue he's raising. Let me post this:
> >
> > ===================
> > Sean Kelly wrote:
> >
> > There's one minor problem with his code.  It's not safe to throw an exception from a signal handler.  Here's a quote from the POSIX spec at opengroup.org:
> >
> > "In order to prevent errors arising from interrupting non-reentrant
> > function calls, applications should protect calls to these functions
> > either by blocking the appropriate signals or through the use of some
> > programmatic semaphore (see semget() , sem_init() , sem_open() , and so
> > on). Note in particular that even the "safe" functions may modify errno;
> > the signal-catching function, if not executing as an independent thread,
> > may want to save and restore its value. Naturally, the same principles
> > apply to the reentrancy of application routines and asynchronous data
> > access. Note thatlongjmp() and siglongjmp() are not in the list of
> > reentrant functions. This is because the code executing after longjmp()
> > and siglongjmp() can call any unsafe functions with the same danger as
> > calling those unsafe functions directly from the signal handler.
> > Applications that use longjmp() andsiglongjmp() from within signal
> > handlers require rigorous protection in order to be portable."
> >
> > If this were an acceptable approach it would have been in druntime ages
> > ago :-)
> > ===================
>
> Yes but the segfault signal handler is not made to design code that can live with these exceptions, its just a feature to allow segfaults to be sent to the crash handler to get a backtrace dump. Even on windows while you can recover from access violations, its generally a bad idea to allow for bugs to be turned into features.

I don't think it's fair to compare Windows to Unix here because, as far as
I know, Windows (ie. Win32, etc) was built with exceptions in mind (thanks to
SEH), while Unix was not.  So while the Windows kernel may theoretically be fine
with an exception being thrown from within kernel code, this isn't true of Unix.

It's true that as long as only Errors are thrown (and thus that the app intends
to terminate), things aren't as bad as they could be.  Worst case, some mutex
in libc is left locked or in some weird state and code executed during stack
unwinding or when trying to report the error causes the app to hang instead
of terminate.  And this risk is somewhat mitigated because I'd expect most
of these errors to occur within user code anyway.

One thing I'm not entirely sure about is whether the signal handler will always
have a valid, C-style call stack tracing back into user code.  These errors are
triggered by hardware, and I really don't know what kind of tricks are common
at that level of OS code.  longjmp() doesn't have this problem because it doesn't
care about the call stack--it just swaps some registers and executes a JMP.  I
don't suppose anyone here knows more about the feasibility of throwing
exceptions from signal handlers at all?  I'll ask around some OS groups and
see what people say.