Thread overview
Documentation bugs
Mar 08, 2006
Walter Bright
Mar 09, 2006
Thomas Kuehne
Mar 10, 2006
Thomas Kuehne
Strange char[] behaviour, Re: Documentation bugs
Re: Strange char[] behaviour
Mar 20, 2006
Thomas Kuehne
March 08, 2006
1. The following anchor does not work. There is one # too much.

http://www.digitalmars.com/d/faq.html#real

2. There's no mention about class templates
(e.g. class foo(type) { ... }) on

http://www.digitalmars.com/d/class.html

3. There aren't any examples of classes that inherit from super classes / interfaces with protection attributes. In fact I don't even know how and why they should work (maybe the way C++ uses them?). E.g.

 class foo : private bar { ... }


Walter, please don't go the c++ way here. That would make the current interfaces totally useless. E.g.

 interface foo { void fn(); }
 class bar : private foo { private void fn() { ..implementation.. } }
 foo a = new bar();
 a.fn();

would become illegal.

4. One other question - what are the hard limits of DMD? For example, how many objects of a class can be instantiated at most? We have made a suffix tree -algorithm that generates a huge amount of objects. I think the compiler introduces some limits, since the algorithm segfaults only after about >300 identical successful runs - the amount of successful identical runs before segfault decreases when the tree depth increases. It segfaults on the following code:

class Node {
  Node[char] children;
  char[] label;

  void fn(char[] s) {
    assert(s !is null);
    assert(s.length > 0);
    assert(s[0] in children);
    auto a = children[s[0]];
    assert(a !is null);
    char[] l = a.label;  // segfault exactly here
    assert(l != null);
    ...
  }
}

The segfault shouldn't be possible since all objects all guaranteed to be properly allocated from the heap. We don't use any explicit memory management. I can post the whole source if needs be.

-- 
Jari-Matti
March 08, 2006
"Jari-Matti Mäkelä" <jmjmak@utu.fi.invalid> wrote in message news:dul6sh$30eu$1@digitaldaemon.com...
> 4. One other question - what are the hard limits of DMD?

There aren't any, other than identifier length limits.

> For example,
> how many objects of a class can be instantiated at most?

Limited by memory the os can provide.

> We have made a
> suffix tree -algorithm that generates a huge amount of objects. I think
> the compiler introduces some limits, since the algorithm segfaults only
> after about >300 identical successful runs - the amount of successful
> identical runs before segfault decreases when the tree depth increases.
> It segfaults on the following code:
>
> class Node {
>  Node[char] children;
>  char[] label;
>
>  void fn(char[] s) {
>    assert(s !is null);
>    assert(s.length > 0);
>    assert(s[0] in children);
>    auto a = children[s[0]];
>    assert(a !is null);
>    char[] l = a.label;  // segfault exactly here
>    assert(l != null);
>    ...
>  }
> }
>
> The segfault shouldn't be possible since all objects all guaranteed to be properly allocated from the heap. We don't use any explicit memory management. I can post the whole source if needs be.

I'd start adding in more asserts to track it down.


March 09, 2006
Jari-Matti Mäkelä schrieb am 2006-03-08:

[snip]

> 4. One other question - what are the hard limits of DMD? For example, how many objects of a class can be instantiated at most? We have made a suffix tree -algorithm that generates a huge amount of objects. I think the compiler introduces some limits, since the algorithm segfaults only after about >300 identical successful runs - the amount of successful identical runs before segfault decreases when the tree depth increases. It segfaults on the following code:
>
> class Node {
>   Node[char] children;
>   char[] label;
>
>   void fn(char[] s) {
>     assert(s !is null);
>     assert(s.length > 0);
>     assert(s[0] in children);
>     auto a = children[s[0]];
>     assert(a !is null);
>     char[] l = a.label;  // segfault exactly here
>     assert(l != null);
>     ...
>   }
> }
>
> The segfault shouldn't be possible since all objects all guaranteed to be properly allocated from the heap. We don't use any explicit memory management. I can post the whole source if needs be.

Are you sure you aren't hitting the memory limit of your system?

If so, it's a bug reported about one year ago:
(no exection is thrown if the system is out of memory)
http://dstress.kuehne.cn/run/OutOfMemory_03.d

Thomas


March 09, 2006
Thomas Kuehne wrote:
> Jari-Matti Mýkelý schrieb am 2006-03-08:
> 
> [snip]
> 
>>> 4. One other question - what are the hard limits of DMD? For example, how many objects of a class can be instantiated at most? We have made a suffix tree -algorithm that generates a huge amount of objects. I think the compiler introduces some limits, since the algorithm segfaults only after about >300 identical successful runs - the amount of successful identical runs before segfault decreases when the tree depth increases. It segfaults on the following code:
>>>
>>> class Node {
>>>   Node[char] children;
>>>   char[] label;
>>>
>>>   void fn(char[] s) {
>>>     assert(s !is null);
>>>     assert(s.length > 0);
>>>     assert(s[0] in children);
>>>     auto a = children[s[0]];
>>>     assert(a !is null);
>>>     char[] l = a.label;  // segfault exactly here
>>>     assert(l != null);
>>>     ...
>>>   }
>>> }
>>>
>>> The segfault shouldn't be possible since all objects all guaranteed to be properly allocated from the heap. We don't use any explicit memory management. I can post the whole source if needs be.
> 
> Are you sure you aren't hitting the memory limit of your system?

No, I think I'm not. The program is consuming < 50 MB of memory and I have > 2 GB of physical RAM + swap. It takes a while to fill all memory (especially the swap partition), but this algorithm segfaults already in 1-2 seconds. I have another D program that is trying to fill up the memory as fast as possible with small linked lists. It's only able to occupy ~300 MB in 2 seconds.

I'm not sure, but I think the problem is in the slicing code (most
probably), in associative arrays or in the code that creates new
instances of classes. I'm working on a minimal test case. It's ~200 LOC now.

-- 
Jari-Matti
March 10, 2006
Jari-Matti Mäkelä wrote:
> Thomas Kuehne wrote:
>> Jari-Matti Mýkelý schrieb am 2006-03-08:
>>
>> [snip]
>>
>>>> 4. One other question - what are the hard limits of DMD? For example, how many objects of a class can be instantiated at most? We have made a suffix tree -algorithm that generates a huge amount of objects. I think the compiler introduces some limits, since the algorithm segfaults only after about >300 identical successful runs - the amount of successful identical runs before segfault decreases when the tree depth increases. It segfaults on the following code:
>>>>
>>>> class Node {
>>>>   Node[char] children;
>>>>   char[] label;
>>>>
>>>>   void fn(char[] s) {
>>>>     assert(s !is null);
>>>>     assert(s.length > 0);
>>>>     assert(s[0] in children);
>>>>     auto a = children[s[0]];
>>>>     assert(a !is null);
>>>>     char[] l = a.label;  // segfault exactly here
>>>>     assert(l != null);
>>>>     ...
>>>>   }
>>>> }
>>>>
>>>> The segfault shouldn't be possible since all objects all guaranteed to be properly allocated from the heap. We don't use any explicit memory management. I can post the whole source if needs be.
>> Are you sure you aren't hitting the memory limit of your system?
> 
> No, I think I'm not. The program is consuming < 50 MB of memory and I have > 2 GB of physical RAM + swap. It takes a while to fill all memory (especially the swap partition), but this algorithm segfaults already in 1-2 seconds. I have another D program that is trying to fill up the memory as fast as possible with small linked lists. It's only able to occupy ~300 MB in 2 seconds.
> 
> I'm not sure, but I think the problem is in the slicing code (most
> probably), in associative arrays or in the code that creates new
> instances of classes. I'm working on a minimal test case. It's ~200 LOC now.
> 
Well, one thing I found out is that the algorithm segfaults exactly when the input strings for the suffix tree are long enough (>180 chars) or the algorithm is repeated several hundreds of times. It works perfectly with shorter input strings. If I try to writefln the segfaulting node contents, the runtime reports that the node labels are invalid unicode - this is very strange since the input strings consist of characters a-f, which are represented using 8-bit values and should remain valid throughout the slicing processes.

Ok, then I tried to change the data type from char[] to dchar[] and now I'm not encountering any problems anymore. Maybe there are some problems with the implementation of slices on the char-type? One other thing I noticed was that this segfaulted sometimes too:

char[] teststring = "testing";
doAlgorithm(teststring);
writefln("%s", teststring); // segfault here.

the algorithm doesn't write to the string, change its length nor does it free/delete it in any way. Just plain slicing and reference passing. Very weird.

-- 
Jari-Matti
March 10, 2006
Jari-Matti Mäkelä schrieb am 2006-03-10:
> Jari-Matti Mäkelä wrote:
>> Thomas Kuehne wrote:
>>> Jari-Matti Mýkelý schrieb am 2006-03-08:
>>>
>>> [snip]
>>>
>>>>> 4. One other question - what are the hard limits of DMD? For example, how many objects of a class can be instantiated at most? We have made a suffix tree -algorithm that generates a huge amount of objects. I think the compiler introduces some limits, since the algorithm segfaults only after about >300 identical successful runs - the amount of successful identical runs before segfault decreases when the tree depth increases. It segfaults on the following code:
>>>>>
>>>>> class Node {
>>>>>   Node[char] children;
>>>>>   char[] label;
>>>>>
>>>>>   void fn(char[] s) {
>>>>>     assert(s !is null);
>>>>>     assert(s.length > 0);
>>>>>     assert(s[0] in children);
>>>>>     auto a = children[s[0]];
>>>>>     assert(a !is null);
>>>>>     char[] l = a.label;  // segfault exactly here
>>>>>     assert(l != null);
>>>>>     ...
>>>>>   }
>>>>> }
>>>>>
>>>>> The segfault shouldn't be possible since all objects all guaranteed to be properly allocated from the heap. We don't use any explicit memory management. I can post the whole source if needs be.
>>> Are you sure you aren't hitting the memory limit of your system?
>> 
>> No, I think I'm not. The program is consuming < 50 MB of memory and I have > 2 GB of physical RAM + swap. It takes a while to fill all memory (especially the swap partition), but this algorithm segfaults already in 1-2 seconds. I have another D program that is trying to fill up the memory as fast as possible with small linked lists. It's only able to occupy ~300 MB in 2 seconds.
>> 
>> I'm not sure, but I think the problem is in the slicing code (most
>> probably), in associative arrays or in the code that creates new
>> instances of classes. I'm working on a minimal test case. It's ~200 LOC now.
>> 
> Well, one thing I found out is that the algorithm segfaults exactly when the input strings for the suffix tree are long enough (>180 chars) or the algorithm is repeated several hundreds of times. It works perfectly with shorter input strings. If I try to writefln the segfaulting node contents, the runtime reports that the node labels are invalid unicode - this is very strange since the input strings consist of characters a-f, which are represented using 8-bit values and should remain valid throughout the slicing processes.
>
> Ok, then I tried to change the data type from char[] to dchar[] and now I'm not encountering any problems anymore. Maybe there are some problems with the implementation of slices on the char-type? One other thing I noticed was that this segfaulted sometimes too:
>
> char[] teststring = "testing";
> doAlgorithm(teststring);
> writefln("%s", teststring); // segfault here.
>
> the algorithm doesn't write to the string, change its length nor does it free/delete it in any way. Just plain slicing and reference passing. Very weird.

Please replace
> doAlgorithm(teststring);
with
> doAlgorithm(teststring.dup);

I know the behaviour shouldn't change as "the algorithm doesn't write to the string", but ...

Thomas


March 10, 2006
Thomas Kuehne wrote:
> Jari-Matti Mýkelý schrieb am 2006-03-10:
>>> char[] teststring = "testing";
>>> doAlgorithm(teststring);
>>> writefln("%s", teststring); // segfault here.
>>>
>>> the algorithm doesn't write to the string, change its length nor does it free/delete it in any way. Just plain slicing and reference passing. Very weird.
> 
> Please replace
>>> doAlgorithm(teststring);
> with
>>> doAlgorithm(teststring.dup);
> 
> I know the behaviour shouldn't change as "the algorithm doesn't write to the string", but ...

I tried that one too. Didn't help in any way. The only assignment-operations I have are the following:

  class Node {
    char[] label;
    this(char[] foo) { label = foo.dup; }
    void shorten(int c) { label = label[c..$].dup; }
  }

Somehow I though that the CoW prevents the misuse of strings. The dchar-version works perfectly without .dup-cloning on any arbitrary long input. The char-version doesn't work, if too many iterations/nodes are created, but works perfectly when run once with short input strings. The maximum input string length is then a constant. I guess it might be valuable to end up with a short test case here. I'll try my best to create one.

-- 
Jari-Matti
March 14, 2006
Jari-Matti Mäkelä wrote:
> Somehow I though that the CoW prevents the misuse of strings. The dchar-version works perfectly without .dup-cloning on any arbitrary long input. The char-version doesn't work, if too many iterations/nodes are created, but works perfectly when run once with short input strings. The maximum input string length is then a constant. I guess it might be valuable to end up with a short test case here. I'll try my best to create one.

Actually the dchar-version doesn't work perfectly either. I tried to use
the code coverage (dmd -cov) and it "failed": (as you can see, it
shouldn't be even possible to run those commented lines.)

       |import std.stdio, suffixnode, abstractsearcher;
       |
       |class SuffixTreeSearcher(T) : AbstractSearcher!(T) {
       |    T[] catenatedStrings;
       |    T[] sentinels;
       |    alias SuffixNode!(T) GenericNode;
       |    GenericNode root;
    130|    this(T[][] strings, T[] sentinels ...)
       |    {
      1|        this.sentinels = sentinels;
      1|        catenatedStrings = strings[0]~sentinels[0]~
strings[1]~sentinels[1];
       |    }
       |    public T[] searchLCS() {
      1|        root = new GenericNode();
    206|        for (int i = 0; i < catenatedStrings.length; ++i) {
    185|            root.addSuffix(catenatedStrings[i..$]);
       |        }
     84|        return null;
    388|        //return root.mark(sentinels);
       |    }
     83|
       |    char[] name() {
      1|        return "Suffix tree";
       |    }
       |}
suffixtreesearcher.d is 100% covered

I works as it should only when I comment the addSuffix-line:

       |import std.stdio, suffixnode, abstractsearcher;
       |
       |class SuffixTreeSearcher(T) : AbstractSearcher!(T) {
       |    T[] catenatedStrings;
       |    T[] sentinels;
       |    alias SuffixNode!(T) GenericNode;
       |    GenericNode root;
       |    this(T[][] strings, T[] sentinels ...)
       |    {
      1|        this.sentinels = sentinels;
      1|        catenatedStrings = strings[0]~sentinels[0]~
strings[1]~sentinels[1];
       |    }
       |    public T[] searchLCS() {
      1|        root = new GenericNode();
    206|        for (int i = 0; i < catenatedStrings.length; ++i) {
       |//            root.addSuffix(catenatedStrings[i..$]);
       |        }
      1|        return null;
       |        //return root.mark(sentinels);
       |    }
       |
       |    char[] name() {
      1|        return "Suffix tree";
       |    }
       |}
suffixtreesearcher.d is 100% covered

Well, the only think I can think of is that the algorithm truly corrupts the runtime code by using array slicing. I also noticed that although it works fine without -cov, running it >=2 times segfaults with -cov.

-- 
Jari-Matti
March 20, 2006
Jari-Matti Mäkelä schrieb am 2006-03-14:
> Jari-Matti Mäkelä wrote:
>> Somehow I though that the CoW prevents the misuse of strings. The dchar-version works perfectly without .dup-cloning on any arbitrary long input. The char-version doesn't work, if too many iterations/nodes are created, but works perfectly when run once with short input strings. The maximum input string length is then a constant. I guess it might be valuable to end up with a short test case here. I'll try my best to create one.
>
> Actually the dchar-version doesn't work perfectly either. I tried to use
> the code coverage (dmd -cov) and it "failed": (as you can see, it
> shouldn't be even possible to run those commented lines.)
>
>        |import std.stdio, suffixnode, abstractsearcher;
>        |
>        |class SuffixTreeSearcher(T) : AbstractSearcher!(T) {
>        |    T[] catenatedStrings;
>        |    T[] sentinels;
>        |    alias SuffixNode!(T) GenericNode;
>        |    GenericNode root;
>     130|    this(T[][] strings, T[] sentinels ...)
>        |    {
>       1|        this.sentinels = sentinels;
>       1|        catenatedStrings = strings[0]~sentinels[0]~
> strings[1]~sentinels[1];
>        |    }
>        |    public T[] searchLCS() {
>       1|        root = new GenericNode();
>     206|        for (int i = 0; i < catenatedStrings.length; ++i) {
>     185|            root.addSuffix(catenatedStrings[i..$]);
>        |        }
>      84|        return null;
>     388|        //return root.mark(sentinels);
>        |    }
>      83|
>        |    char[] name() {
>       1|        return "Suffix tree";
>        |    }
>        |}
> suffixtreesearcher.d is 100% covered
>
> I works as it should only when I comment the addSuffix-line:
>
>        |import std.stdio, suffixnode, abstractsearcher;
>        |
>        |class SuffixTreeSearcher(T) : AbstractSearcher!(T) {
>        |    T[] catenatedStrings;
>        |    T[] sentinels;
>        |    alias SuffixNode!(T) GenericNode;
>        |    GenericNode root;
>        |    this(T[][] strings, T[] sentinels ...)
>        |    {
>       1|        this.sentinels = sentinels;
>       1|        catenatedStrings = strings[0]~sentinels[0]~
> strings[1]~sentinels[1];
>        |    }
>        |    public T[] searchLCS() {
>       1|        root = new GenericNode();
>     206|        for (int i = 0; i < catenatedStrings.length; ++i) {
>        |//            root.addSuffix(catenatedStrings[i..$]);
>        |        }
>       1|        return null;
>        |        //return root.mark(sentinels);
>        |    }
>        |
>        |    char[] name() {
>       1|        return "Suffix tree";
>        |    }
>        |}
> suffixtreesearcher.d is 100% covered
>
> Well, the only think I can think of is that the algorithm truly corrupts the runtime code by using array slicing. I also noticed that although it works fine without -cov, running it >=2 times segfaults with -cov.

The code above isn't complete thus makes testing it a bit difficult. As a start, let's try to find out the real coverage numbers:

replace

>  for (int i = 0; i < catenatedStrings.length; ++i) {
>      root.addSuffix(catenatedStrings[i..$]);
>  }

with something like

>  writefln("LOOP_PRE");
>  for (int i = 0; writefln("LOOP_CHECK"), i < catenatedStrings.length; ++i, writefln("LOOP_INC")) {
>      writefln("LOOP");
>      root.addSuffix(catenatedStrings[i..$]);
>  }
>  writefln("LOOP_POST");

That isn't the kind of code I usually want to touch, but since the RT (and maybe the line numbers in the debug info) might be corrupted...

Thomas


March 21, 2006
Thomas Kuehne wrote:
> The code above isn't complete thus makes testing it a bit difficult. As a start, let's try to find out the real coverage numbers:
> 
> replace
> 
>>>  for (int i = 0; i < catenatedStrings.length; ++i) {
>>>      root.addSuffix(catenatedStrings[i..$]);
>>>  }
> 
> with something like
> 
>>>  writefln("LOOP_PRE");
>>>  for (int i = 0; writefln("LOOP_CHECK"), i < catenatedStrings.length; ++i, writefln("LOOP_INC")) {
>>>      writefln("LOOP");
>>>      root.addSuffix(catenatedStrings[i..$]);
>>>  }
>>>  writefln("LOOP_POST");
> 
> That isn't the kind of code I usually want to touch, but since the RT (and maybe the line numbers in the debug info) might be corrupted...

I'm really grateful that you want to help here. Ok, here are the results:

First I commented the original loop and copy-pasted it beneath the original:

/++
  for (int i = 0; i < catenatedStrings.length; ++i) {
      root.addSuffix(catenatedStrings[i..$]);
  }
++/

  for (int i = 0; i < catenatedStrings.length; ++i) {
      root.addSuffix(catenatedStrings[i..$]);
  }

Guess what, the runtime code segfaults! Then I removed one line from the commented part:

/++
  for (int i = 0; i < catenatedStrings.length; ++i) {
      root.addSuffix(catenatedStrings[i..$]);
++/

  for (int i = 0; i < catenatedStrings.length; ++i) {
      root.addSuffix(catenatedStrings[i..$]);
  }

Still segfaults. Then yet another line:

/++
  for (int i = 0; i < catenatedStrings.length; ++i) {
++/

  for (int i = 0; i < catenatedStrings.length; ++i) {
      root.addSuffix(catenatedStrings[i..$]);
  }

Now it "works"! (But the coverage info is still wrong)

Then I tried the writefln-method:

<snip>
       |    public T[] searchLCS() {
      2|        root = new GenericNode();
       |
      2|        writefln("Pre loop");
    652|        for (int i = 0; writefln("Loop check"), i <
catenatedStrings.length; ++i, writefln("Loop inc")) {
    324|                root.addSuffix(catenatedStrings[i..$]);
    312|        }
      2|        writefln("Post loop");
    312|
   1438|        return root.searchLCS(sentinels);
       |    }
    312|
<snip>

I wasn't able to split the for-statement fully, but here is the best non-segfaulting version:

<snip>
       |    public T[] searchLCS() {
      2|        root = new GenericNode();
       |
      2|        writefln("Pre loop");
    328|        for (int i = 0; writefln("Loop check"), i <
catenatedStrings.length;
    324|        ++i,
    334|        writefln("Loop inc")) {
    324|                root.addSuffix(catenatedStrings[i..$]);
    334|        }
   1550|        writefln("Post loop");
       |
    336|        return root.searchLCS(sentinels);
       |    }
<snip>

I haven't got much spare time to hunt this bug down, but I'll try to create a "minimal" test case this week. Maybe you can check it out then? There are at least two cases here, the one segfaults without code coverage on (char[]-strings) and the other one pops out when coverage is turned on. I've tested these with DMD 0.147, 0.148 & 0.149, but now I'm going to upgrade to 0.150.

-- 
Jari-Matti