std.xml2 candidate

Dec 11, 2010

Michael Rynn

Dec 12, 2010

Andrei Alexandrescu

Dec 12, 2010

Dec 12, 2010

Dec 12, 2010

Dec 12, 2010

Availability of Updated xml parser for D2, organised very presumptively as std.xml2 Downloadable with SVN. svn co http://svn.dsource.org/projects/xmlp/trunk/std (release 20). This imports a conventional DOM of linked nodes -- std.xmlp.linkdom A Core parser which emits parsed items -- std.xmlp.coreparse. A validating parser including DOCTYPE validation. std.xmlp.domparse. Performance seems not too bad. There are more lines of code, but it does the same work of std.xml in about 65% of the time. Well-formed-ness check is done during the parse, so there is no need to do separate check. It takes string inputs or file inputs in various encodings. The DOMErrorHandler DOM interface is included in the Validating parser for the linkdom. The parsers and DOM have a straight forward interface. There is aso a very nearly compatible version of the DOM used in std.xml. -- std.xmlp.arraydom. The arraydom DocumentParser is also faster than the std.xml, as it uses the std.xmlp.coreparse. Its not complete or final, nor much reviewed. The layout and interfaces seem to be OK. I expect its already more useful than std.xml. Michael Rynn.

On 12/11/10 7:15 PM, Andrei Alexandrescu wrote: > On 12/11/10 7:23 AM, Michael Rynn wrote: >> >> Availability of Updated xml parser for D2, >> organised very presumptively as std.xml2 > [snip] > > Great! Do you plan to submit this to Phobos? One more thing - with XML parsers, I think Tango has definitely set the performance bar where it belongs. Any proposal for Phobos would need to meet it. Andrei

Andrei Alexandrescu wrote: > On 12/11/10 7:15 PM, Andrei Alexandrescu wrote: >> On 12/11/10 7:23 AM, Michael Rynn wrote: >>> >>> Availability of Updated xml parser for D2, >>> organised very presumptively as std.xml2 >> [snip] >> >> Great! Do you plan to submit this to Phobos? > > One more thing - with XML parsers, I think Tango has definitely set the performance bar where it belongs. Any proposal for Phobos would need to meet it. > > Andrei That is considerable. A quick benchmark suggests that a lot of work is needed. If you take into account that tango's xml parser does less validation and that it is up to par with the fastest C++ parsers out there, I suggest lowering the bar a little bit at first. For example, outperforming libxml2.

> If you take into account that tango's xml parser does less validation and > that it is up to par with the fastest C++ parsers out there, I suggest > lowering the bar a little bit at first. For example, outperforming libxml2. There is no reason a D code should perform worse than C++ if you are not using some high level constructs. When it comes to strings/slicing/template, you might actually get performance boost comparing to C++. The C++ parser mentioned here (RapidXML) depends heavily on these. -- Using Opera's revolutionary email client: http://www.opera.com/mail/

so wrote: >> If you take into account that tango's xml parser does less validation and that it is up to par with the fastest C++ parsers out there, I suggest lowering the bar a little bit at first. For example, outperforming libxml2. > > There is no reason a D code should perform worse than C++ if you are not > using some high level constructs. > When it comes to strings/slicing/template, you might actually get > performance boost comparing to C++. > The C++ parser mentioned here (RapidXML) depends heavily on these. > I know, and tango's parser is proof of that. But it can take a lot of work getting to that level. Right now we have an xml library a lot of people don't want to use, has bugs and performs 60 times worse than tango's. Imho it's better to include it if performance is merely acceptable and see if it is possible to improve from there on.

> Imho it's better to include it if performance is merely acceptable and see > if it is possible to improve from there on. On that i absolutely agree. People have this misconception that D should perform worse than said languages, so i had to state the obvious :) -- Using Opera's revolutionary email client: http://www.opera.com/mail/

December 12, 2010

Re: std.xml2 candidate

Posted by Eric Desbiens
in reply to Michael Rynn

Permalink

Eric Desbiens

Posted in reply to Michael Rynn

Permalink

Hello,

It's great to see interest in replacing std.xml. I am also working on a replacement for std.xml, maybe we can collaborate on this and not duplicate effort. We should choose one of our codebase and develop from there a strong alternative.

I propose my codebase for the following 2 reasons:

1.It performs better and scale better with file size. Here's a quick benchmark for dom parsing on my computer. I don't know how well it's performed compare to Tango.

=== XMLP ===
XMLP 1Mb  Parsing time: 0.548 s
XMLP 11Mb Parsing time: 29.570
=== My Alternative* ===
Alt 1Mb  Parsing time: 0.134 s
Alt 11Mb Parsing time: 1.225 s

*This is using XMl1.1 compliant parser.

2. It is more flexible
All parsers are templated and you can choose the degree of conformance, if
namespace are used, the type of entity decoding and support parsing document
fragment. It also parse any type of range wich the element type is some sort of
character.

Your library is more complete tough. It support a Sax like interface, have a validating parser and try to be compatible with std.xml (which I'm not sure is needed). It also normalize attribute, which mine does not. On compliance, I think the 2 libraries are on the same level.

Feel free to talk about your code and show where it is better than mine and if you think it should be better to build on your code instead of mine. Probably a mix of both library will make a better base. I think that if we collaborate on this, we will make a great library.

Code can be downloaded from : https://github.com/olace/experimental

check exp/xml.d

Forums