Thread overview
Suggestion: class object init value change & new compiler warning message
Aug 23, 2006
Kristian
Aug 23, 2006
xs0
Aug 23, 2006
Chad J
Aug 24, 2006
Kristian
August 23, 2006
Let we have:

struct Stru {...}

class Obj {
    this() {...}
    this(int val) {...}

    bool f() {...}
}

void func() {
    Stru stru;  //valid structure, not NULL as 'obj3'
    Obj obj1 = new Obj;
    Obj obj2 = new Obj(10);
    Obj obj3;  //"obj3 = NULL;"
}

There will be lot of "object = new SomeClass" in code around the globe, as in the example. I think it's a much more common situation than initializing an object with NULL. Hence objects should be initialized with new objects by default. That is, 'func()' would become:

void func() {
    Stru stru;
    Obj obj1;      //"obj1 = new Obj;"
    Obj obj2(10);  //"obj2 = new Obj(10);"
    Obj obj3 = NULL;
}

This reduces redundant code: a class name is not anymore used twice per object.

The amount of typing required is also smaller; fewer possible typos. (It's pain in the #!%%# to write looogn class names over and over again.)

(This suggestion won't break old code.)


Compiler could also optimize the initialization values: if a value is assigned to an object before it's used, then it's initialized with NULL instead of a new object.

void func() {
    Stru stru;
    Obj obj1;      //"obj1 = new Obj;"
    Obj obj2(10);  //"obj2 = new Obj(10);"
    Obj obj3;      //"obj3 = NULL;"

    if(obj1.f()) {...}  //used without initialization

    obj3 = new Obj;  //assigned before used
}

That would be optimal situation of course. Clean and short syntax.


But can the compiler know which init value to use?
The very basic checking is done, of course, as follows:

1) Get the first occurance of an object.
2) If it's on the left side of an assignment statement, it's inited with NULL.
3) Otherwise it's inited with a new object.

Even this simple solution gets very large proportion of the cases right (IMHO). (If not, your coding style is weird. ;) )

You can make the checking smarter by taking pointers/references into account.

For example, MS Visual Studio works as follows:

void func() {
    int a;
    int *ptr;

    ptr = &a;   //'ptr' is the same thing as 'a' now
    *ptr = 1;   //this equals to "a = 1;"
    a++;        //so, 'a' is not used here without initialization; no warning message
}

I think this kind of checking will get all the cases right (see below though).

If I am wrong here, it should not be a problem. If the compiler initializes a variable with a new object instead of NULL (which the compiler must do if it's not sure), normally it's a very minor inefficiency (because it happens very seldom). If necessary (writing some complex code, eh?), you could force the right init value to be used:

    Obj obj = NULL;



And here comes the suggestion #2:

If a variable is used without initialization, the compiler should warn about it.



Of course, a variable can be set inside another function. For axample:

class Obj {...}

void setObj(out Obj o) {
    o = new Obj;
}

void func() {
    Obj obj;  //"obj = new Obj;"

    setObj(obj);
}

Note that compiler should not initialize 'obj' with NULL here, even if it would be possible in this very situation. If an init value would depend on how functions will handle their parameters, it would be a mess.


It's 'interesting' to notice that Visual Studio does not warn in the following cases:

void func2(int val) {
    ...
}
void func3(int &val) {  //in D: "void func3(inout int val)"
    ...
}
void func() {
    int a;

    func2(a);  //no warning even if 'a' is used wihout initialization!
    func3(a);  //no warning!
}

In addition VS assumes that 'a' is set in 'func3()' because it _could_ modify it. For example:

void func() {
    int a;

    func3(a);

    a++;  //no warning!
}

All of these cases should cause a warning message.
That is:

void func() {
    int a;

    func2(a);  //warning
    func3(a);  //warning
    a++;       //warning
}

Of course, only one warning message per variable is enough.
August 23, 2006
Kristian wrote:
> Let we have:
> 
> struct Stru {...}
> 
> class Obj {
>     this() {...}
>     this(int val) {...}
> 
>     bool f() {...}
> }
> 
> void func() {
>     Stru stru;  //valid structure, not NULL as 'obj3'
>     Obj obj1 = new Obj;
>     Obj obj2 = new Obj(10);
>     Obj obj3;  //"obj3 = NULL;"
> }
> 
> There will be lot of "object = new SomeClass" in code around the globe, as in the example.

That's actually good, as one can easily see that a heap allocation is being done.

> I think it's a much more common situation than initializing an object with NULL. Hence objects should be initialized with new objects by default. That is, 'func()' would become:
> 
> void func() {
>     Stru stru;
>     Obj obj1;      //"obj1 = new Obj;"
>     Obj obj2(10);  //"obj2 = new Obj(10);"
>     Obj obj3 = NULL;
> }
> 
> This reduces redundant code: a class name is not anymore used twice per object.

> The amount of typing required is also smaller; fewer possible typos. (It's pain in the #!%%# to write looogn class names over and over again.)

auto obj = new LooognClassName(..);

> (This suggestion won't break old code.)

It could - if the constructor does something, it will be done more often than before (and just wildly allocating objects is not a very good idea either, as it's slow even if mem allocation is all that happens). And, there's the obvious:

Processor p;
while (auto d = readData()) {
   if (p is null) // construct p only if needed
      p = new Processor();
   p.process(d);
}
if (p) { // will always execute under your proposal
   ...
}

> Compiler could also optimize the initialization values: if a value is assigned to an object before it's used, then it's initialized with NULL instead of a new object.
> [snip]
> But can the compiler know which init value to use?
> The very basic checking is done, of course, as follows:
> 
> 1) Get the first occurance of an object.
> 2) If it's on the left side of an assignment statement, it's inited with NULL.
> 3) Otherwise it's inited with a new object.

If only it were that simple; in reality, it's quite impossible to determine where an object will be first used (if at all). Which occurrence is first in the following code?

Foo foo;

if (..) {
   foo = ..;
} else {
   foo = ..;
}

> And here comes the suggestion #2:
> 
> If a variable is used without initialization, the compiler should warn about it.

This is practically impossible to get right (and it's really annoying when the compiler gets it wrong; trust me, I use Java which tries to do it and it is wrong far too often).
Instead, variables in D get initialized to useless values (or 0 for integers), so it's hard to miss usage of an uninitialized value (albeit only at runtime).


xs0
August 23, 2006
I disagree with this.

It allows an object with an invalid value to pass of as valid while your program continues executing.  This makes it difficult to check for uninitialized values at runtime.  Checking for uninitialized values at compile time is quite difficult because people can write arbitrarily complex code, as xs0 pointed out.

Also, I usually want my references to start off knownly invalid and not allocate unnecessary objects.  If this were used, I would have to write '= null;' very, very, frequently.  I suppose it's arguable that explicitly initializing things to null is a better practice.  I wonder how much current D code is actually written with explicit null initializiation though.

In a way I wish structs were initialized to null, though I wonder how that would work with the whole value type idea.  Currently I feel that the wisest thing to do is exactly what D does now - distinguish between value and reference types, and give each their unique properties.

Kristian wrote:
> Let we have:
> 
> struct Stru {...}
> 
> class Obj {
>     this() {...}
>     this(int val) {...}
> 
>     bool f() {...}
> }
> 
> void func() {
>     Stru stru;  //valid structure, not NULL as 'obj3'
>     Obj obj1 = new Obj;
>     Obj obj2 = new Obj(10);
>     Obj obj3;  //"obj3 = NULL;"
> }
> 
> There will be lot of "object = new SomeClass" in code around the globe, as  in the example. I think it's a much more common situation than  initializing an object with NULL. Hence objects should be initialized with  new objects by default. That is, 'func()' would become:
> 
> void func() {
>     Stru stru;
>     Obj obj1;      //"obj1 = new Obj;"
>     Obj obj2(10);  //"obj2 = new Obj(10);"
>     Obj obj3 = NULL;
> }
> 
> This reduces redundant code: a class name is not anymore used twice per  object.
> 
> The amount of typing required is also smaller; fewer possible typos. (It's  pain in the #!%%# to write looogn class names over and over again.)
> 
> (This suggestion won't break old code.)
> 
> 
> Compiler could also optimize the initialization values: if a value is  assigned to an object before it's used, then it's initialized with NULL  instead of a new object.
> 
> void func() {
>     Stru stru;
>     Obj obj1;      //"obj1 = new Obj;"
>     Obj obj2(10);  //"obj2 = new Obj(10);"
>     Obj obj3;      //"obj3 = NULL;"
> 
>     if(obj1.f()) {...}  //used without initialization
> 
>     obj3 = new Obj;  //assigned before used
> }
> 
> That would be optimal situation of course. Clean and short syntax.
> 
> 
> But can the compiler know which init value to use?
> The very basic checking is done, of course, as follows:
> 
> 1) Get the first occurance of an object.
> 2) If it's on the left side of an assignment statement, it's inited with  NULL.
> 3) Otherwise it's inited with a new object.
> 
> Even this simple solution gets very large proportion of the cases right  (IMHO). (If not, your coding style is weird. ;) )
> 
> You can make the checking smarter by taking pointers/references into  account.
> 
> For example, MS Visual Studio works as follows:
> 
> void func() {
>     int a;
>     int *ptr;
> 
>     ptr = &a;   //'ptr' is the same thing as 'a' now
>     *ptr = 1;   //this equals to "a = 1;"
>     a++;        //so, 'a' is not used here without initialization; no  warning message
> }
> 
> I think this kind of checking will get all the cases right (see below  though).
> 
> If I am wrong here, it should not be a problem. If the compiler  initializes a variable with a new object instead of NULL (which the  compiler must do if it's not sure), normally it's a very minor  inefficiency (because it happens very seldom). If necessary (writing some  complex code, eh?), you could force the right init value to be used:
> 
>     Obj obj = NULL;
> 
> 
> 
> And here comes the suggestion #2:
> 
> If a variable is used without initialization, the compiler should warn  about it.
> 
> 
> 
> Of course, a variable can be set inside another function. For axample:
> 
> class Obj {...}
> 
> void setObj(out Obj o) {
>     o = new Obj;
> }
> 
> void func() {
>     Obj obj;  //"obj = new Obj;"
> 
>     setObj(obj);
> }
> 
> Note that compiler should not initialize 'obj' with NULL here, even if it  would be possible in this very situation. If an init value would depend on  how functions will handle their parameters, it would be a mess.
> 
> 
> It's 'interesting' to notice that Visual Studio does not warn in the  following cases:
> 
> void func2(int val) {
>     ...
> }
> void func3(int &val) {  //in D: "void func3(inout int val)"
>     ...
> }
> void func() {
>     int a;
> 
>     func2(a);  //no warning even if 'a' is used wihout initialization!
>     func3(a);  //no warning!
> }
> 
> In addition VS assumes that 'a' is set in 'func3()' because it _could_  modify it. For example:
> 
> void func() {
>     int a;
> 
>     func3(a);
> 
>     a++;  //no warning!
> }
> 
> All of these cases should cause a warning message.
> That is:
> 
> void func() {
>     int a;
> 
>     func2(a);  //warning
>     func3(a);  //warning
>     a++;       //warning
> }
> 
> Of course, only one warning message per variable is enough.
August 24, 2006
xs0 and Chad had good points earlier in this thread. In short, they convinced me that the current D style to init class objects to null is a better way than the one I proposed.

Though I am surprised that a Java compiler gets the "variable being used without initialization" warnings wrong so often... With a C++ compiler there has been no incorrect warnings (missing warnings, yes). But objects are not reference variables in C++, so it must be it.


I came up with another suggestion concerning this issue. It can be read under "Suggestion: shortcut for 'new X'".