Thread overview
[dmd-internals] Rare and pernicious bug in string append
March 16, 2010
This bug ruined a couple of workdays for me. (I'm using dmd 2.042 beta.) I'd appreciate very much if people who know the innards of string append could look into it at their earliest convenience. Currently the safe version is twice as slow as the fast (buggy) version, so I'm looking at 8hrs instead of 4hrs for completing an experiment against 5.75 million HTML files.

The bug is exceedingly rare. It occurs only once every few thousand HTML files. The failing file occurs after 28,000 files have processed successfully.

The code may be further simplified, but not a lot. This is apparently a low-level bug because small changes in the input or the code make the bug manifest differently or not at all.

To reproduce: copy untag.d and data.html to an empty directory. Then compile untag:

$ dmd untag

To run untag without the bug, run:

./untag --bug=0

To run it with bug #1 related to string ~=, run:

./untag --bug=1

You will see:

Invalid UTF sequence: 255

To run it with bug #2 related to string ~, run:

./untag --bug=2

You will see:

Invalid UTF sequence: 252

The three programs should have identical semantics. Characters 255 and 252 are not present in the input file.


Andrei

March 16, 2010
Damn. Sorry.

Andrei

On 03/16/2010 10:14 AM, Steve Schveighoffer wrote:
> files?
>
> -Steve
>
>
>
> ----- Original Message ----
>> From: Andrei Alexandrescu<andrei at erdani.com>
>> To: Discuss the internals of DMD<dmd-internals at puremagic.com>; Walter Bright<walter at digitalmars.com>; Steven Schveighoffer<schveiguy at yahoo.com>; Sean Kelly<sean at invisibleduck.org>
>> Sent: Tue, March 16, 2010 11:04:49 AM
>> Subject: Rare and pernicious bug in string append
>>
>> This bug ruined a couple of workdays for me. (I'm using dmd 2.042 beta.) I'd appreciate very much if people who know the innards of string append could look into it at their earliest convenience. Currently the safe version is twice as slow as the fast (buggy) version, so I'm looking at 8hrs instead of 4hrs for completing an experiment against 5.75 million HTML files.
>
> The bug is
>> exceedingly rare. It occurs only once every few thousand HTML files. The failing file occurs after 28,000 files have processed successfully.
>
> The code may
>> be further simplified, but not a lot. This is apparently a low-level bug because small changes in the input or the code make the bug manifest differently or not at all.
>
> To reproduce: copy untag.d and data.html to an empty directory.
>> Then compile untag:
>
> $ dmd untag
>
> To run untag without the bug,
>> run:
>
> ./untag --bug=0
>
> To run it with bug #1 related to string ~=,
>> run:
>
> ./untag --bug=1
>
> You will see:
>
> Invalid UTF sequence:
>> 255
>
> To run it with bug #2 related to string ~, run:
>
> ./untag
>> --bug=2
>
> You will see:
>
> Invalid UTF sequence: 252
>
> The three
>> programs should have identical semantics. Characters 255 and 252 are not present in the input file.
>
>
> Andrei
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-internals/attachments/20100316/b04438bd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: untag.d
Type: text/x-dsrc
Size: 4530 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/dmd-internals/attachments/20100316/b04438bd/attachment-0001.d>
March 16, 2010
You are using some new function "startsWithConsume" that isn't yet in phobos:

untag.d(68): Error: undefined identifier startsWithConsume
untag.d(68): Error: function expected before (), not startsWithConsume of type int
untag.d(69): Error: undefined identifier startsWithConsume
untag.d(69): Error: function expected before (), not startsWithConsume of type int

-Steve



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> To: Steve Schveighoffer <schveiguy at yahoo.com>; Walter Bright <walter at digitalmars.com>; Sean Kelly <sean at invisibleduck.org>; Discuss the internals of DMD <dmd-internals at puremagic.com>
> Sent: Tue, March 16, 2010 11:29:32 AM
> Subject: Re: Rare and pernicious bug in string append
> 
> Damn. Sorry.

Andrei




March 16, 2010
Pasted below.

/**
If $(D startsWith(r1, r2)), consume the corresponding elements off $(D
r1) and return $(D true). Otherwise, leave $(D r1) unchanged and
return $(D false).
  */
bool startsWithConsume(alias pred = "a == b", R1, R2)(ref R1 r1, R2 r2)
if (is(typeof(binaryFun!pred(r1.front, r2.front))))
{
     auto r = r1; // .save();
     while (!r2.empty && !r.empty && binaryFun!pred(r.front, r2.front))
     {
         r.popFront();
         r2.popFront();
     }
     return r2.empty ? (r1 = r, true) : false;
}

unittest
{
     auto s1 = "Hello world";
     assert(!startsWithConsume(s1, "Ha"));
     assert(s1 == "Hello world");
     assert(startsWithConsume(s1, "Hell") && s1 == "o world");
}


Andrei

On 03/16/2010 10:49 AM, Steve Schveighoffer wrote:
> You are using some new function "startsWithConsume" that isn't yet in phobos:
>
> untag.d(68): Error: undefined identifier startsWithConsume
> untag.d(68): Error: function expected before (), not startsWithConsume of type int
> untag.d(69): Error: undefined identifier startsWithConsume
> untag.d(69): Error: function expected before (), not startsWithConsume of type int
>
> -Steve
>
>
>
> ----- Original Message ----
>> From: Andrei Alexandrescu<andrei at erdani.com>
>> To: Steve Schveighoffer<schveiguy at yahoo.com>; Walter Bright<walter at digitalmars.com>; Sean Kelly<sean at invisibleduck.org>; Discuss the internals of DMD<dmd-internals at puremagic.com>
>> Sent: Tue, March 16, 2010 11:29:32 AM
>> Subject: Re: Rare and pernicious bug in string append
>>
>> Damn. Sorry.
>
> Andrei
>
>
>
>