August 04, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Benji Smith | On Sun, 02 Aug 2009 00:25:20 -0400, Benji Smith <dlanguage@benjismith.net> wrote: > An interface using D ranges for the parser? First you get the parser. There is a lot of prior art, especially java based open source. For instance see http://www.xmlpull.org/history/index.html. Existing XML parsers vary and trade off various features. high level interface flexibility vs speed vs memory vs validation against schemas and DTD vs random access vs single pass. There are different flavours of XML parsers. They are like sets of spanners and tools of varying shapes and capacities, to match up to the job criteria. All methods have to parse through the XML (until completion or search criteria satisfied ) and ensure valid XML, UTF conversion, Entity translation. Then there is namespace support, and now 2 versions of XPath documented by w3c mixed up with XQuery. It would be nice to have well defined interfaces for DOM, SAX and PULL parsers which share some of the base parsing code. The DOM can be partial, as node sets returned from XPath query. Nice how the phobos parser can make a full DOM or just the bits required. The current Tango and Phobos parsers are interesting in having their own special D personality and features, and are reasonably self contained. Its so nice to have choice, and would be nice to have XML parsers of some varying features that are usable with both standard libraries and both D1 and D2, and have adequate documentation. But to achieve just 1 version would be a start. I would hope for a few different of core parser interfaces in different modules, and compare versioned features on them. So the idea of 1 standard xml parser is a bit limiting. There is still a need to continue to support and enhance existing std.xml , even if a more compelling candidate emerges, to replace or to add in parallel. Who is using what for XML parsing in native D now? | |||
August 04, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Michael Rynn | On 2009-08-04 10:01:51 -0400, Michael Rynn <michaelrynn@optushome.com.au> said: > It would be nice to have well defined interfaces for DOM, SAX and > PULL parsers which share some of the base parsing code. The DOM can be > partial, as node sets returned from XPath query. Nice how the phobos > parser can make a full DOM or just the bits required. Exactly what I've been working on: Tokenizer part: http://michelf.com/docs/d/mfr/xmltok.html DOM part: http://michelf.com/docs/d/mfr/xml.html Note that it's still a work in progress. Here are some things I'd like to do: tokenizer: add specialized exception classes to better report various problems, add better checks for valid characters (should be optional), better support for ranges (currently only string because I rely on "a.before(b)" to avoid dynamic allocation), also add support for the internal subset in the doctype (but that's low priority). Writer: replace by a simple template function and a toString function defined for each token type? or a writeTo function (to avoid creating a intermediary string)? XMLForwardRange: allow a template parameter specifying the token types you want to see, skipping all others. This could be done by passing a custom Algebraic type instead of the provided one what can contain all tokens. DOM classes: it's mostly experimental for now. There's no SAX yet, although it should be trivial to add over the existing callback tokenizer. -- Michel Fortin michel.fortin@michelf.com http://michelf.com/ | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply