Jump to page: 1 2 3
Thread overview
DustMite, a D test case minimization tool
May 20, 2011
Vladimir Panteleev
May 22, 2011
KennyTM~
May 22, 2011
Vladimir Panteleev
May 22, 2011
Robert Jacques
May 22, 2011
Vladimir Panteleev
May 23, 2011
Robert Jacques
May 23, 2011
Vladimir Panteleev
May 23, 2011
Robert Jacques
May 23, 2011
Vladimir Panteleev
May 23, 2011
Robert Jacques
May 23, 2011
Vladimir Panteleev
May 23, 2011
Robert Jacques
May 23, 2011
Bernard Helyer
May 23, 2011
Robert Clipsham
May 23, 2011
Vladimir Panteleev
May 23, 2011
Robert Clipsham
May 23, 2011
Robert Clipsham
May 24, 2011
Vladimir Panteleev
May 26, 2011
Don
May 26, 2011
Don
May 28, 2011
Vladimir Panteleev
May 20, 2011
Inspired by Tigris Delta and the "Want to help DMD bugfixing? Write a simple utility" thread from digitalmars.D.learn. I hope the DMD development team will find this useful.

Advantages over Tigris delta:

* Easy to use (takes only two arguments, no need to fiddle with levels)
* Readable output (comments and indentation are preserved)
* Native support for multiple files (accepts a path to an entire directory for input)
* Written for D
* Written in D
* Not written in Perl
* Can recognize constructs such as try/catch, function invariants (in/out/body)
* Only 440 lines of source code

If you've never used delta: this is a tool which attempts to shrink files by deleting fragments iteratively, as long as the file satisfies a user-specified condition (for example, a specific error message when passed through the compiler).

Usage:

1. Formulate a condition command, which should exit with a status code of 0 when DustMite is on the right track, and anything else otherwise.
   Example: dmd test.d 2>&1 | grep -qF "Assertion failed"
2. Place all the files that dustmite is to minimize in a new directory.
3. If you'd like to test your condition command, don't forget to clean up temporary files afterwards.
4. Run: dustmite path/to/directory test-command
5. After a while, dustmite will finish working and create path/to/directory.reduced

I've tested it with a self-induced "bug" in std.datetime, it seems to work great. If you find that it breaks on something, let me know.

https://github.com/CyberShadow/DustMite

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
May 22, 2011
On May 21, 11 06:01, Vladimir Panteleev wrote:
> Inspired by Tigris Delta and the "Want to help DMD bugfixing? Write a
> simple utility" thread from digitalmars.D.learn. I hope the DMD
> development team will find this useful.
>
> Advantages over Tigris delta:
>
> * Easy to use (takes only two arguments, no need to fiddle with levels)
> * Readable output (comments and indentation are preserved)
> * Native support for multiple files (accepts a path to an entire
> directory for input)
> * Written for D
> * Written in D
> * Not written in Perl
> * Can recognize constructs such as try/catch, function invariants
> (in/out/body)
> * Only 440 lines of source code
>
> If you've never used delta: this is a tool which attempts to shrink
> files by deleting fragments iteratively, as long as the file satisfies a
> user-specified condition (for example, a specific error message when
> passed through the compiler).
>
> Usage:
>
> 1. Formulate a condition command, which should exit with a status code
> of 0 when DustMite is on the right track, and anything else otherwise.
> Example: dmd test.d 2>&1 | grep -qF "Assertion failed"
> 2. Place all the files that dustmite is to minimize in a new directory.
> 3. If you'd like to test your condition command, don't forget to clean
> up temporary files afterwards.
> 4. Run: dustmite path/to/directory test-command
> 5. After a while, dustmite will finish working and create
> path/to/directory.reduced
>
> I've tested it with a self-induced "bug" in std.datetime, it seems to
> work great. If you find that it breaks on something, let me know.
>
> https://github.com/CyberShadow/DustMite
>

Nice tool! I tried to use it to reduce bug 6044, but encountered 2 problems:

1. DustMite will load _all_ files, including the _binary_ ones, which
   is seldom in valid UTF-8 encoding, and that causes a UtfException to
   be thrown from 'save.dump' because 'e.header' contains those invalid
   character. (BTW, Andrei, is it really necessary to include the whole
   invalid string in the exception?!)

2. For 6044, DustMite has overdone. It has reduced to an obviously
   invalid program

      void main() {
          alias Maybe A;
      // ok
          A.Impl!int u;       // error
      }

   but I guess it can't be avoided, since its error message is exactly
   the same as the correct one I reported.
May 22, 2011
On Sun, 22 May 2011 11:56:33 +0300, KennyTM~ <kennytm@gmail.com> wrote:

> Nice tool! I tried to use it to reduce bug 6044, but encountered 2 problems:
>
> 1. DustMite will load _all_ files, including the _binary_ ones, which
>     is seldom in valid UTF-8 encoding, and that causes a UtfException to
>     be thrown from 'save.dump' because 'e.header' contains those invalid
>     character. (BTW, Andrei, is it really necessary to include the whole
>     invalid string in the exception?!)

The real question here is why would appender validate UTF when appending a string to a string? This reduces the complexity of whatever a GC allocation COULD be to linear, so for large strings it might be slower than appending to an array. The following comment is in Phobos, but I don't understand it:

        // note, we disable this branch for appending one type of char to
        // another because we can't trust the length portion.

The tool should have been able to handle binary files (it only attempts to reduce them by completely removing them), but I never tested this functionality.
Anyway, I've made it use ubyte[] for the appender type, so there won't be any problems now.

> 2. For 6044, DustMite has overdone. It has reduced to an obviously
>     invalid program
>
>        void main() {
>            alias Maybe A;
>        // ok
>            A.Impl!int u;       // error
>        }
>
>     but I guess it can't be avoided, since its error message is exactly
>     the same as the correct one I reported.

DustMite is only as smart as the test command you specify. You could formulate your test command to check if the source code still includes whatever bits should not be removed.

Generally, though, DustMite is most useful for reducing large programs you don't want to reduce by hand, but especially when the error message is cryptic, makes no indication of the real location (such as some templated functions in Phobos), or is so fragile that removing seemingly unrelated code makes the problem vanish.

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
May 22, 2011
On Sun, 22 May 2011 09:40:19 -0400, Vladimir Panteleev <vladimir@thecybershadow.net> wrote:

> On Sun, 22 May 2011 11:56:33 +0300, KennyTM~ <kennytm@gmail.com> wrote:
>
>> Nice tool! I tried to use it to reduce bug 6044, but encountered 2 problems:
>>
>> 1. DustMite will load _all_ files, including the _binary_ ones, which
>>     is seldom in valid UTF-8 encoding, and that causes a UtfException to
>>     be thrown from 'save.dump' because 'e.header' contains those invalid
>>     character. (BTW, Andrei, is it really necessary to include the whole
>>     invalid string in the exception?!)
>
> The real question here is why would appender validate UTF when appending a string to a string? This reduces the complexity of whatever a GC allocation COULD be to linear, so for large strings it might be slower than appending to an array. The following comment is in Phobos, but I don't understand it:
>
>          // note, we disable this branch for appending one type of char to
>          // another because we can't trust the length portion.

Essentially, this comment is about how you have to decode and then encode anytime one changes the character type. i.e. the fact that 1 dchar != 1 wchar != 1 char. So a 5 dchar string, might require 20 chars to represent. As for performance, using appender is never slower than ~=, as it uses essentially the same code. Furthermore, you actually can not make appender use linear allocation, even when you are doing a transcoding operation, as it always grows by max(needed, newCapacity() ), which gives it a roughly an exponential growth rate. Also, if you're concerned about appender performance, I'd recommend using the patch from Issue 5813.
May 22, 2011
On Mon, 23 May 2011 02:15:49 +0300, Robert Jacques <sandford@jhu.edu> wrote:

>  As for performance, using appender is never slower than ~=, as it uses essentially the same code.

I don't think using ~= when appending a string to a string will validate the UTF. Will it?

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
May 23, 2011
On Sun, 22 May 2011 19:30:58 -0400, Vladimir Panteleev <vladimir@thecybershadow.net> wrote:

> On Mon, 23 May 2011 02:15:49 +0300, Robert Jacques <sandford@jhu.edu> wrote:
>
>>  As for performance, using appender is never slower than ~=, as it uses essentially the same code.
>
> I don't think using ~= when appending a string to a string will validate the UTF. Will it?
>

For string ~= string, appender calls string[] = string, which does a memcopy, iirc.
May 23, 2011
On Mon, 23 May 2011 04:14:32 +0300, Robert Jacques <sandford@jhu.edu> wrote:

> On Sun, 22 May 2011 19:30:58 -0400, Vladimir Panteleev <vladimir@thecybershadow.net> wrote:
>
>> On Mon, 23 May 2011 02:15:49 +0300, Robert Jacques <sandford@jhu.edu> wrote:
>>
>>>  As for performance, using appender is never slower than ~=, as it uses essentially the same code.
>>
>> I don't think using ~= when appending a string to a string will validate the UTF. Will it?
>>
>
> For string ~= string, appender calls string[] = string, which does a memcopy, iirc.

Right, so my complexity rant was BS, but appender will still validate UTF on every append, unlike ~=. Isn't that a bug?

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
May 23, 2011
On Sun, 22 May 2011 21:39:55 -0400, Vladimir Panteleev <vladimir@thecybershadow.net> wrote:
> On Mon, 23 May 2011 04:14:32 +0300, Robert Jacques <sandford@jhu.edu> wrote:
>
>> On Sun, 22 May 2011 19:30:58 -0400, Vladimir Panteleev <vladimir@thecybershadow.net> wrote:
>>
>>> On Mon, 23 May 2011 02:15:49 +0300, Robert Jacques <sandford@jhu.edu> wrote:
>>>
>>>>  As for performance, using appender is never slower than ~=, as it uses essentially the same code.
>>>
>>> I don't think using ~= when appending a string to a string will validate the UTF. Will it?
>>>
>>
>> For string ~= string, appender calls string[] = string, which does a memcopy, iirc.
>
> Right, so my complexity rant was BS, but appender will still validate UTF on every append, unlike ~=. Isn't that a bug?
>

Appender doesn't validate UTF when the character widths are the same.
For example,

    string test = "\&lt;" ~ "\&gt;" ~ "\&Alpha;" ~ "\&Beta;" ~ "\&Gamma;"~ "\&spades;" ~ "\&diams;"~ "\U0001D11E";
    Appender!string app;
    foreach(i;0..ds.length-1) {
        app.put(test[i..$]);
    }

Runs fine, even though at times test[i..$] is an invalid string, because the type of test and appender are both strings. However, if you change Appender to a wstring, then encoding and decoding occur and those routines always validate. Hence, if app is a Appender!wstring, it will throw a UTF validation error.
May 23, 2011
On Mon, 23 May 2011 05:21:04 +0300, Robert Jacques <sandford@jhu.edu> wrote:

> Appender doesn't validate UTF when the character widths are the same.

Yes it does. That's what I've been trying to explain this entire subthread.

import std.array;

void main()
{
	string invalid = "\xAA\xBB\xCC\xDD\xEE\xFF";
	
	string str; str ~= invalid; // OK

	auto app = appender!string(); app.put(str); // throws
}


-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
May 23, 2011
On Sun, 22 May 2011 22:31:51 -0400, Vladimir Panteleev <vladimir@thecybershadow.net> wrote:

> On Mon, 23 May 2011 05:21:04 +0300, Robert Jacques <sandford@jhu.edu> wrote:
>
>> Appender doesn't validate UTF when the character widths are the same.
>
> Yes it does. That's what I've been trying to explain this entire subthread.
>
> import std.array;
>
> void main()
> {
> 	string invalid = "\xAA\xBB\xCC\xDD\xEE\xFF";
> 	
> 	string str; str ~= invalid; // OK
>
> 	auto app = appender!string(); app.put(str); // throws
> }
>
>

Well, all I can say is that it doesn't throw on my install. (Windows, DMD 2.052) for either the patched nor un-patched appender implementation. What version are you using?
« First   ‹ Prev
1 2 3