February 02, 2012
On 01/02/2012 19:35, dsimcha wrote:
> I'd be very
> interested if you could make a small, self-contained test program to use
> as a benchmark.
>


The 'test' is just

/////////////////
import std.xml2;

void main()
{
    string xmlPath = r"test.xml";

    auto document = DocumentBuilder.LoadFile(xmlPath, false, false);
}

/////////////////

It's xmlp that does all the work (and takes all the time).


I'll see about generating a simple test file, but basically:

50000 top level nodes
each one has 6 child nodes
each node has a single attribute, and the child nodes each have a short text value.


Parsing the file with DMD 2.057 takes ~25 seconds

Parsing the file with DMD 2.058(Git) takes ~6.1 seconds

Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds.


For comparison, MSXML6 takes 1.6 seconds to load the same file.
February 02, 2012
Am 02.02.2012, 01:41 Uhr, schrieb Richard Webb <webby@beardmouse.org.uk>:

> Parsing the file with DMD 2.057 takes ~25 seconds
>
> Parsing the file with DMD 2.058(Git) takes ~6.1 seconds
>
> Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds.
>
>
> For comparison, MSXML6 takes 1.6 seconds to load the same file.

Speaking of which, why not also compare the memory consumption (peak working set size)? Memory vs. CPU is the typical trade-off, so it might be interesting from that point, but I also wonder what the overhead for GC managed memory is - assuming that MSXML6 uses only ref counting. And if it is a whole lot more (like >+50%), what methods could apply, that 'waste' less space.

This API function should get the job done: GetProcessMemoryInfo
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683219(v=vs.85).aspx
February 02, 2012
Richard Webb:

> Parsing the file with DMD 2.057 takes ~25 seconds
> 
> Parsing the file with DMD 2.058(Git) takes ~6.1 seconds
> 
> Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds.
> 
> 
> For comparison, MSXML6 takes 1.6 seconds to load the same file.

Not too much time ago Python devs have added an heuristic to the Python GC (that is a reference counter + cycle breaker), it "switches off" if it detects the program is allocating many items in a short time. Is it possible to add something similar to the D GC?

Bye,
bearophile
February 02, 2012
On Thursday, 2 February 2012 at 01:27:44 UTC, bearophile wrote:
> Richard Webb:
>
>> Parsing the file with DMD 2.057 takes ~25 seconds
>> 
>> Parsing the file with DMD 2.058(Git) takes ~6.1 seconds
>> 
>> Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds.
>> 
>> 
>> For comparison, MSXML6 takes 1.6 seconds to load the same file.
>
> Not too much time ago Python devs have added an heuristic to the Python GC (that is a reference counter + cycle breaker), it "switches off" if it detects the program is allocating many items in a short time. Is it possible to add something similar to the D GC?
>
> Bye,
> bearophile

I actually tried to add something like this a while back but I couldn't find a heuristic that worked reasonably well.  The idea was just to create a timeout where the GC can't run for x milliseconds after it just ran.
February 02, 2012
On Wed, 01 Feb 2012 18:40:20 -0600, dsimcha <dsimcha@yahoo.com> wrote:
> On Wednesday, 1 February 2012 at 23:43:24 UTC, H. S. Teoh wrote:
>> Out of curiosity, is there a way to optimize for the "many small
>> allocations" case? E.g., if a function allocates, as temporary
>> storage,
>> a tree with a large number of nodes, which becomes garbage when
>> it
>> returns. Perhaps a way to sweep the entire space used by the
>> tree in one
>> go?
>>
>> Not sure if such a thing is possible.
>>
>>
>> T
>
> My RegionAllocator is probably the best thing for this if the
> lifetime is deterministic as you describe.  I rewrote the Tree1
> benchmark using RegionAllocator a while back just for comparison.
>   D Tree1 + RegionAllocator had comparable speed to a Java version
> of Tree1 run under HotSpot.  (About 6 seconds on my box vs. in
> the low 30s for Tree1 with the 2.058 GC.)
>
> If all the objects are going to die at the same time but not at a
> deterministic time, you could just allocate a big block from the
> GC and place class instances in it using emplace().
>

An XML parser would probably want some kind of stack segment growth schedule, which, IIRC isn't supported by RegionAllocator.
February 02, 2012
Wait a minute, since when do we even have a std.xml2?  I've never heard of it and it's not in the Phobos source tree (I just checked).

On Thursday, 2 February 2012 at 00:41:31 UTC, Richard Webb wrote:
> On 01/02/2012 19:35, dsimcha wrote:
>> I'd be very
>> interested if you could make a small, self-contained test program to use
>> as a benchmark.
>>
>
>
> The 'test' is just
>
> /////////////////
> import std.xml2;
>
> void main()
> {
>    string xmlPath = r"test.xml";
>
>    auto document = DocumentBuilder.LoadFile(xmlPath, false, false);
> }
>
> /////////////////
>
> It's xmlp that does all the work (and takes all the time).
>
>
> I'll see about generating a simple test file, but basically:
>
> 50000 top level nodes
> each one has 6 child nodes
> each node has a single attribute, and the child nodes each have a short text value.
>
>
> Parsing the file with DMD 2.057 takes ~25 seconds
>
> Parsing the file with DMD 2.058(Git) takes ~6.1 seconds
>
> Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds.
>
>
> For comparison, MSXML6 takes 1.6 seconds to load the same file.


February 02, 2012
It looks like someone wrote it a year ago, but it was never added to phobos: http://www.digitalmars.com/d/archives/digitalmars/D/announce/std.xml2_candidate_19804.html .

On Thursday, 2 February 2012 at 04:39:11 UTC, dsimcha wrote:
> Wait a minute, since when do we even have a std.xml2?  I've never heard of it and it's not in the Phobos source tree (I just checked).
>
> On Thursday, 2 February 2012 at 00:41:31 UTC, Richard Webb wrote:
>> On 01/02/2012 19:35, dsimcha wrote:
>>> I'd be very
>>> interested if you could make a small, self-contained test program to use
>>> as a benchmark.
>>>
>>
>>
>> The 'test' is just
>>
>> /////////////////
>> import std.xml2;
>>
>> void main()
>> {
>>  string xmlPath = r"test.xml";
>>
>>  auto document = DocumentBuilder.LoadFile(xmlPath, false, false);
>> }
>>
>> /////////////////
>>
>> It's xmlp that does all the work (and takes all the time).
>>
>>
>> I'll see about generating a simple test file, but basically:
>>
>> 50000 top level nodes
>> each one has 6 child nodes
>> each node has a single attribute, and the child nodes each have a short text value.
>>
>>
>> Parsing the file with DMD 2.057 takes ~25 seconds
>>
>> Parsing the file with DMD 2.058(Git) takes ~6.1 seconds
>>
>> Parsing the file with DMD 2.058, with the GC disabled during the LoadFile call, takes ~2.2 seconds.
>>
>>
>> For comparison, MSXML6 takes 1.6 seconds to load the same file.

February 02, 2012
On 02/02/2012 04:53, a wrote:
> It looks like someone wrote it a year ago, but it was never added to
> phobos:
> http://www.digitalmars.com/d/archives/digitalmars/D/announce/std.xml2_candidate_19804.html
> .


Thats the one -> http://www.dsource.org/projects/xmlp/
(the modules are under std. ).

February 02, 2012
On Thursday, 2 February 2012 at 04:38:49 UTC, Robert Jacques wrote:
> An XML parser would probably want some kind of stack segment growth schedule, which, IIRC isn't supported by RegionAllocator.

I had considered putting that in RegionAllocator but I was skeptical of the benefit, at least assuming we're targeting PCs and not embedded devices.   The default segment size is 4MB.  Trying to make the initial size any smaller won't save much memory.  Four megabytes is also big enough that new segments would be allocated so infrequently that the cost would be negligible.  I concluded that the added complexity wasn't justified.
February 02, 2012
Out of interest, i just tried loading the same file with std.xml, and the performance there is pretty similar in each version, possibly slightly slower in 2.058 (~21-22 seconds in each case).

Disabling the GC during the load gets 9 seconds, though task manager reports a peak memory usage of almost 600 megabytes in that case!

It looks like most of the time here is spent in Gcxmark whereas with xmlp it was in Gcxfullcollect (and fullcollect is the one that is faster in 2.058).
The profiler makes it look like things are spending more time in Gcxmark than they were before. Is that the case?


I'll try to have a go with Tango when i get some more time.