September 06, 2011
On 09/06/2011 09:36 PM, notna wrote:
> Sorry upfront, I didn't read this hole thread, so maybe I'm missing or
> mixing something...
>
> How about a D binding for http://www.xmlsoft.org/ ?
>
> In other words, taking the "curl or sqlite3 path", something like
> /etc/c/xml2

That is about 4 times slower than the Tango XML parser:

http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/


>
> On 06.09.2011 19:54, Walter Bright wrote:
>> On 9/6/2011 7:51 AM, Andrei Alexandrescu wrote:
>>> Let's leave the likes of std.xml and std.json in peace, then pick a
>>> naming convention for the new ones and create whole new modules
>>> replacing them.
>>
>> std.xml2
>>
>> will do fine.
>

September 06, 2011
Mafi Wrote:

> > Along these same lines I'm wondering why not simply call this new module
> > std.io <http://std.io> rather than use the existing name std.stdio?
> >   It'd avoid the code breaking issue and help reflect that this new
> > module isn't based around C's stdio FILE (at least that's what I
> > gather).  Also, the code is written from scratch so that's another
> > reason for why I don't think it should have the same name.  The only
> > reason I can think of is if it provided significant improvements over
> > the existing std.stdio without causing massive breakage.
> >
> > Regards,
> > Brad Anderson
> 
> I think this is a good idea. I think std.io sounds and feels much better.
> 
> Mafi

I think this is a terrific suggestion.

Paul
September 06, 2011
Am 06.09.2011, 22:28 Uhr, schrieb Timon Gehr <timon.gehr@gmx.ch>:

> On 09/06/2011 09:36 PM, notna wrote:
>> Sorry upfront, I didn't read this hole thread, so maybe I'm missing or
>> mixing something...
>>
>> How about a D binding for http://www.xmlsoft.org/ ?
>>
>> In other words, taking the "curl or sqlite3 path", something like
>> /etc/c/xml2
>
> That is about 4 times slower than the Tango XML parser:
>
> http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/

You are so right, Timon. How deep is the trench between Phobos and Tango devs? Tango's XML parser should really make it into Phobos.
September 06, 2011
On Tuesday, September 06, 2011 22:28:05 Timon Gehr wrote:
> On 09/06/2011 09:36 PM, notna wrote:
> > Sorry upfront, I didn't read this hole thread, so maybe I'm missing or mixing something...
> > 
> > How about a D binding for http://www.xmlsoft.org/ ?
> > 
> > In other words, taking the "curl or sqlite3 path", something like /etc/c/xml2
> 
> That is about 4 times slower than the Tango XML parser:

Yeah. Thanks to array slicing, parsing is actually one of the areas that D libraries should be able to generally beat C/C++ libraries in terms of speed.

That being said, creating bindings and wrappers for existing libraries is a great way to increase Phobos' functionality without reiventing the wheel in many cases. But there are definitely cases, where redoing something in D would actually be much better. It all depends on what you're trying to do and what libraries already exist in C or C++.

- Jonathan M Davis
September 06, 2011
On Tuesday, September 06, 2011 23:51:48 Marco Leise wrote:
> Am 06.09.2011, 22:28 Uhr, schrieb Timon Gehr <timon.gehr@gmx.ch>:
> > On 09/06/2011 09:36 PM, notna wrote:
> >> Sorry upfront, I didn't read this hole thread, so maybe I'm missing or mixing something...
> >> 
> >> How about a D binding for http://www.xmlsoft.org/ ?
> >> 
> >> In other words, taking the "curl or sqlite3 path", something like /etc/c/xml2
> > 
> > That is about 4 times slower than the Tango XML parser:
> > 
> > http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs -with-rapidxml/
> You are so right, Timon. How deep is the trench between Phobos and Tango devs? Tango's XML parser should really make it into Phobos.

A new std.xml is already in the works. It'll be range-based, unlike the Tango parser. But there's no reason why Phobos shouldn't be able to have a similarly-fast XML parser. As I understand it, the primary reason that the current std.xml is slow is because it uses delegates quite a bit, but I haven't used it myself, so I don't know all of the details.

- Jonathan M Davis
September 06, 2011
On Sep 6, 2011, at 2:51 PM, Marco Leise wrote:

> Am 06.09.2011, 22:28 Uhr, schrieb Timon Gehr <timon.gehr@gmx.ch>:
> 
>> On 09/06/2011 09:36 PM, notna wrote:
>>> Sorry upfront, I didn't read this hole thread, so maybe I'm missing or mixing something...
>>> 
>>> How about a D binding for http://www.xmlsoft.org/ ?
>>> 
>>> In other words, taking the "curl or sqlite3 path", something like /etc/c/xml2
>> 
>> That is about 4 times slower than the Tango XML parser:
>> 
>> http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/
> 
> You are so right, Timon. How deep is the trench between Phobos and Tango devs? Tango's XML parser should really make it into Phobos.

That will never happen.  Though on a positive note, a major reason the Tango parser is so fast because there's no copying or translation of the underlying data.  Attributes are passed to the user as-is via a slice of the input range.  Most parsers in other languages simply don't work this way.
September 06, 2011
Paul D. Anderson:

> I think this is a terrific suggestion.

I have suggested std.io time ago, but someone doesn't like it: http://d.puremagic.com/issues/show_bug.cgi?id=4718

Bye,
bearophile
September 06, 2011
On Tuesday, September 06, 2011 18:48:24 bearophile wrote:
> Paul D. Anderson:
> > I think this is a terrific suggestion.
> 
> I have suggested std.io time ago, but someone doesn't like it: http://d.puremagic.com/issues/show_bug.cgi?id=4718

It's not enough of an improvement to rename std.stdio to std.io just to rename it. However, if Steven's ultimate changes are different enough that a separate module is needed for a clean migration path, and those changes do get accepted into Phobos, then naming the new module std.io makes good sense.

- Jonathan M Davis
September 07, 2011
Am 07.09.2011, 00:23 Uhr, schrieb Sean Kelly <sean@invisibleduck.org>:

> On Sep 6, 2011, at 2:51 PM, Marco Leise wrote:
>
>> Am 06.09.2011, 22:28 Uhr, schrieb Timon Gehr <timon.gehr@gmx.ch>:
>>
>>> On 09/06/2011 09:36 PM, notna wrote:
>>>> Sorry upfront, I didn't read this hole thread, so maybe I'm missing or
>>>> mixing something...
>>>>
>>>> How about a D binding for http://www.xmlsoft.org/ ?
>>>>
>>>> In other words, taking the "curl or sqlite3 path", something like
>>>> /etc/c/xml2
>>>
>>> That is about 4 times slower than the Tango XML parser:
>>>
>>> http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/
>>
>> You are so right, Timon. How deep is the trench between Phobos and Tango devs? Tango's XML parser should really make it into Phobos.
>
> That will never happen.  Though on a positive note, a major reason the Tango parser is so fast because there's no copying or translation of the underlying data.  Attributes are passed to the user as-is via a slice of the input range.  Most parsers in other languages simply don't work this way.

So in the benchmark neither white-space is collapsed, nor are entities like &amp; converted?
September 07, 2011
On Sep 6, 2011, at 6:49 PM, Marco Leise wrote:

> Am 07.09.2011, 00:23 Uhr, schrieb Sean Kelly <sean@invisibleduck.org>:
> 
>> On Sep 6, 2011, at 2:51 PM, Marco Leise wrote:
>> 
>>> Am 06.09.2011, 22:28 Uhr, schrieb Timon Gehr <timon.gehr@gmx.ch>:
>>> 
>>>> On 09/06/2011 09:36 PM, notna wrote:
>>>>> Sorry upfront, I didn't read this hole thread, so maybe I'm missing or mixing something...
>>>>> 
>>>>> How about a D binding for http://www.xmlsoft.org/ ?
>>>>> 
>>>>> In other words, taking the "curl or sqlite3 path", something like /etc/c/xml2
>>>> 
>>>> That is about 4 times slower than the Tango XML parser:
>>>> 
>>>> http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/
>>> 
>>> You are so right, Timon. How deep is the trench between Phobos and Tango devs? Tango's XML parser should really make it into Phobos.
>> 
>> That will never happen.  Though on a positive note, a major reason the Tango parser is so fast because there's no copying or translation of the underlying data.  Attributes are passed to the user as-is via a slice of the input range.  Most parsers in other languages simply don't work this way.
> 
> So in the benchmark neither white-space is collapsed, nor are entities like &amp; converted?

I don't believe so.  That's expected to be done by the user if he cares about decoding the field.  Compare this to the Xerces (Apache) XML parser that passes in all attributes as wide chars regardless of the input format and you can see why parsing XML in D can be so fast: passing values via array slicing and having Unicode as the native character format.  If the input text is UTF-8 you use XmlParser!char, if it's UTF-16 you use XmlParser!wchar, etc.  I'm actually surprised that more C/C++ parsers don't work this way.