July 28, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On 2009-07-28 11:38:36 -0400, "Adam D. Ruppe" <destructionator@gmail.com> said: > On Tue, Jul 28, 2009 at 12:23:50PM -0300, Ary Borenszweig wrote: >> But *why* use or make another one when the Tango one is already >> excellent? :( > > Copyright. That, and because there's some fun in doing it. Anyway, this is just practice before writing an HTML5 parser. ;-) -- Michel Fortin michel.fortin@michelf.com http://michelf.com/ | |||
July 28, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | Tue, 28 Jul 2009 11:38:36 -0400, Adam D. Ruppe thusly wrote:
> On Tue, Jul 28, 2009 at 12:23:50PM -0300, Ary Borenszweig wrote:
>> But *why* use or make another one when the Tango one is already excellent? :(
>
> Copyright.
There are most likely several issues that prevent the reuse of that code. First, the indentation, module boundaries, and naming conventions may differ (tabs vs spaces, 4 vs 8 spaces, camelCase vs foo_bar etc.).
Next, does it use the slow object oriented approach like the rest of Tango (and unlike Phobos, which uses a very lightweight procedural model). Are there any benchmark results that show the approach Tango uses is any good, i.e. more performant than the ones for Java and C++ (even with larger xml documents). If it is, then the idea can be copied to Phobos as well.
Finally, the copyright is a problem unless it is handed over to digitalmars. Otherwise it might get troublesome to sell D later for commercial use when Phobos becomes the Standard library for D 2.0.
| |||
July 28, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to language_fan | language_fan wrote: > Tue, 28 Jul 2009 11:38:36 -0400, Adam D. Ruppe thusly wrote: > >> On Tue, Jul 28, 2009 at 12:23:50PM -0300, Ary Borenszweig wrote: >>> But *why* use or make another one when the Tango one is already >>> excellent? :( >> Copyright. > > There are most likely several issues that prevent the reuse of that code. First, the indentation, module boundaries, and naming conventions may differ (tabs vs spaces, 4 vs 8 spaces, camelCase vs foo_bar etc.). > > Next, does it use the slow object oriented approach like the rest of Tango (and unlike Phobos, which uses a very lightweight procedural model). Are there any benchmark results that show the approach Tango uses is any good, i.e. more performant than the ones for Java and C++ (even with larger xml documents). If it is, then the idea can be copied to Phobos as well. Yes, there are: http://dotnot.org/blog/archives/2008/02/ And you can see they are pretty good. The object oriented approach is not a problem. | |||
July 28, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Keep | Daniel Keep Wrote:
> There is already a high-performance one in Tango. There must be some way to avoid duplicating effort.
Isn't it high-performance at the cost of not complaining to the DOM specification?
| |||
July 28, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to language_fan | language_fan wrote: > Tue, 28 Jul 2009 11:38:36 -0400, Adam D. Ruppe thusly wrote: > >> On Tue, Jul 28, 2009 at 12:23:50PM -0300, Ary Borenszweig wrote: >>> But *why* use or make another one when the Tango one is already excellent? :( >> >> Copyright. > > There are most likely several issues that prevent the reuse of that code. First, the indentation, module boundaries, and naming conventions may differ (tabs vs spaces, 4 vs 8 spaces, camelCase vs foo_bar etc.). Naming conventions by Tango is quite similar to the style guidelines that Walter Bright has written, probably closer than phobos. As for formatting, you know, there are tools for that and descent even has the best formatter ever. > Next, does it use the slow object oriented approach like the rest of Tango (and unlike Phobos, which uses a very lightweight procedural model). Are there any benchmark results that show the approach Tango uses is any good, i.e. more performant than the ones for Java and C++ (even with larger xml documents). If it is, then the idea can be copied to Phobos as well. Object-oriented does not mean slow. Tango's XML library outperforms the fastest C++ libraries, here are some benchmarks: http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs- with-rapidxml/ > Finally, the copyright is a problem unless it is handed over to digitalmars. Otherwise it might get troublesome to sell D later for commercial use when Phobos becomes the Standard library for D 2.0. I don't think (and hope) that walter bright & co will sell the standard library commercially, if that's even possible with current copyright owners. All that is needed is a license Walter Bright can live with, such as the boost one. Seems like an excellent opportunity for leveraging open source, no? | |||
July 28, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ary Borenszweig | On 2009-07-28 12:03:47 -0400, Ary Borenszweig <ary@esperanto.org.ar> said: > language_fan wrote: >> Tue, 28 Jul 2009 11:38:36 -0400, Adam D. Ruppe thusly wrote: >> >> Are there any benchmark results that show the approach Tango uses is any good, i.e. more performant than the ones for Java and C++ (even with larger xml documents). If it is, then the idea can be copied to Phobos as well. > > Yes, there are: > > http://dotnot.org/blog/archives/2008/02/ > > And you can see they are pretty good. The object oriented approach is not a problem. That's true, Tango's parser is simple and well done, and it's using final (thus non-virtual) functions. It being object-oriented only has a negligeable impact when you instanciate the parser. I'm not writing my own parser because of any flaw in the Tango parser. I'm aiming at providing some features not found in Tango (like optional checking for well-formness) without compromizing on performance when you don't need them (templates are good for that). I'll also try to outperform Tango with callback parsing, but I expect it can only be done by a tiny margin, if at all. -- Michel Fortin michel.fortin@michelf.com http://michelf.com/ | |||
July 30, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to llee | On Mon, 27 Jul 2009 20:15:46 -0400, llee <llee@jhsph.edu> wrote:
>The std.xml module contains several bugs that need to be fixed. The most important one is that the parser fails to parse empty elements (IE elements that use the <tag name="value" /> format). I'd like to report this bug to the modules' maintainer, but I don't know who to contact. (This is an old bug - it's been around for at least a year and I'm surprised that it has not been fixed).
I did look at the code for the xml module, and posted a suggested bug fix to the empty elements problem. I do not have access rights to updating the source repository, and at the time was too busy for this.
Now I am a state of my work position had been recently made redundant, and I would like to be considered for improving the std xml module, or at least find out what I would have to do to be up to scratch for this.
There are other possibilities of course, if you want a quick and ready xml parser.
I had little trouble in compiling a static library version of the Expat 2.01 ( probably oldish now), and linking D code to this.
I also made an attempt at creating D interfaces to the libxml windows DLLs. Because when I run CodeBlocks in debug mode, with ddbg, their is a crash if another DLL is linked in via an import module.
If not the xml module, then perhaps some lesser D library project to get warmed up on.
I will have some time on my hands now,but perhaps not as much as I might want to think, because my post-redundacy workshops and resume preparation are telling me that finding a job is full time occupation.
In any case I could do something with std.xml ( for D2.0 ).
Pity I seen jobs yet offering for D language programmers.
| |||
July 30, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Michael Rynn | On Thu, 30 Jul 2009 18:03:39 +1000, Michael Rynn <michaelrynn@optushome.com.au> wrote: corrections.. >I had little trouble in compiling a static library version of the Expat 2.01 Whoops, I used an import library to the LibExpat.dll. >Pity I seen jobs yet offering for D language programmers. Any jobs in Sydney Australia for D language programmers..? | |||
July 30, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Michael Rynn | Michael Rynn wrote:
> On Mon, 27 Jul 2009 20:15:46 -0400, llee <llee@jhsph.edu> wrote:
>
>> The std.xml module contains several bugs that need to be fixed. The most important one is that the parser fails to parse empty elements (IE elements that use the <tag name="value" /> format). I'd like to report this bug to the modules' maintainer, but I don't know who to contact. (This is an old bug - it's been around for at least a year and I'm surprised that it has not been fixed).
>
> I did look at the code for the xml module, and posted a suggested bug
> fix to the empty elements problem. I do not have access rights to
> updating the source repository, and at the time was too busy for this.
>
> Now I am a state of my work position had been recently made
> redundant, and I would like to be considered for improving the std xml
> module, or at least find out what I would have to do to be up to
> scratch for this.
>
> There are other possibilities of course, if you want a quick and ready
> xml parser.
>
> I had little trouble in compiling a static library version of the
> Expat 2.01 ( probably oldish now), and linking D code to this.
>
> I also made an attempt at creating D interfaces to the libxml windows
> DLLs. Because when I run CodeBlocks in debug mode, with ddbg, their
> is a crash if another DLL is linked in via an import module.
>
> If not the xml module, then perhaps some lesser D library project to
> get warmed up on.
>
> I will have some time on my hands now,but perhaps not as much as I
> might want to think, because my post-redundacy workshops and resume
> preparation are telling me that finding a job is full time occupation.
>
> In any case I could do something with std.xml ( for D2.0 ).
>
> Pity I seen jobs yet offering for D language programmers.
It would be great if you could contribute to Phobos. Two things I hope from any replacement (a) works with ranges and ideally outputs ranges, (b) uses alias functions instead of delegates if necessary.
Best of luck with your job search. As one looking for a job myself, yeah, it's a lot of work.
Andrei
| |||
July 31, 2009 Re: The XML module in Phobos | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | > Michael Rynn wrote: >> I did look at the code for the xml module, and posted a suggested bug >> fix to the empty elements problem. I do not have access rights to >> updating the source repository, and at the time was too busy for this. Andrei Alexandrescu wrote: > It would be great if you could contribute to Phobos. Two things I hope from any replacement (a) works with ranges and ideally outputs ranges, (b) uses alias functions instead of delegates if necessary. Interesting. Most XML parsers either produce a "Document" object, or they just execute SAX callbacks. If an XML parser returned a range object, how would you use it? Usually, I use something like XPath to extract information from an XML doc. Something liek this: auto doc = parser.parse(xml); auto nodes = doc.select("/root//whatever[0][@id]"); I can see how you might do depth-first or breadth-first traversal of the DOM tree, or inorder traversal of the SAX events, with a range. But that's now how most people use XML. Are there are other range tricks up your sleeve that would support the a DOM or XPath kind of model? --benji | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply