Thread overview
XML D2.x parsing &
Jul 20, 2009
Jesse Phillips
Jul 22, 2009
Stewart Gordon
Jul 22, 2009
Jesse Phillips
Jul 22, 2009
Brad Roberts
Jul 22, 2009
Brad Roberts
Jul 22, 2009
Ary Borenszweig
July 20, 2009
According to the documentation having & in a tag will be turned to &

http://digitalmars.com/d/2.0/phobos/std_xml.html#text

I observe that this is not the case. And if an attribute contains & it is turned into & What is the best way to receive the same output for both. The code that follows outputs

Attr: What & Up
Elem: What & Up



*testfile.xml:*

<?xml version="1.0" encoding="utf-8"?>
<Tests>
	<Test thing="What &amp; Up">What &amp; Up</Test>
</Tests>


*test.d:*

import std.stdio;
import std.xml;

void main() {
	auto file = "testfile.xml";

	auto s = cast(string)std.file.read(file);

	auto xml = new DocumentParser(s);

	xml.onStartTag["Test"] = (ElementParser xml) {
		writeln("Attr: ", xml.tag.attr["thing"]);
	};

	xml.onEndTag["Test"] = (in Element e) {
		writeln("Elem: ", e.text);
	};
	xml.parse();
}
July 22, 2009
Jesse Phillips wrote:
> According to the documentation having &amp; in a tag will be turned to &
> 
> http://digitalmars.com/d/2.0/phobos/std_xml.html#text
> 
> I observe that this is not the case. And if an attribute contains &amp; it is turned into &amp;amp; What is the best way to receive the same output for both. The code that follows outputs
> 
> Attr: What &amp;amp; Up
> Elem: What &amp; Up
> 
> 
> 
> *testfile.xml:*
> 
> <?xml version="1.0" encoding="utf-8"?>
> <Tests>
> 	<Test thing="What &amp; Up">What &amp; Up</Test>
> </Tests>

Clearly std.xml is buggy.  Correct behaviour would be

Attr: What & Up
Elem: What & Up

The best place for bug reports is

http://d.puremagic.com/issues/

Stewart.
July 22, 2009
On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:

> Jesse Phillips wrote:
>> According to the documentation having &amp; in a tag will be turned to &
>> 
>> http://digitalmars.com/d/2.0/phobos/std_xml.html#text
>> 
>> I observe that this is not the case. And if an attribute contains &amp; it is turned into &amp;amp; What is the best way to receive the same output for both. The code that follows outputs
>> 
>> Attr: What &amp;amp; Up
>> Elem: What &amp; Up
>> 
>> 
>> 
>> *testfile.xml:*
>> 
>> <?xml version="1.0" encoding="utf-8"?> <Tests>
>> 	<Test thing="What &amp; Up">What &amp; Up</Test>
>> </Tests>
> 
> Clearly std.xml is buggy.  Correct behaviour would be
> 
> Attr: What & Up
> Elem: What & Up
> 
> The best place for bug reports is
> 
> http://d.puremagic.com/issues/
> 
> Stewart.

http://d.puremagic.com/issues/show_bug.cgi?id=3200 http://d.puremagic.com/issues/show_bug.cgi?id=3201
July 22, 2009
Jesse Phillips wrote:
> On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:
> 
>> Jesse Phillips wrote:
>>> According to the documentation having &amp; in a tag will be turned to &
>>>
>>> http://digitalmars.com/d/2.0/phobos/std_xml.html#text
>>>
>>> I observe that this is not the case. And if an attribute contains &amp; it is turned into &amp;amp; What is the best way to receive the same output for both. The code that follows outputs
>>>
>>> Attr: What &amp;amp; Up
>>> Elem: What &amp; Up
>>>
>>>
>>>
>>> *testfile.xml:*
>>>
>>> <?xml version="1.0" encoding="utf-8"?> <Tests>
>>> 	<Test thing="What &amp; Up">What &amp; Up</Test>
>>> </Tests>
>> Clearly std.xml is buggy.  Correct behaviour would be
>>
>> Attr: What & Up
>> Elem: What & Up
>>
>> The best place for bug reports is
>>
>> http://d.puremagic.com/issues/
>>
>> Stewart.
> 
> http://d.puremagic.com/issues/show_bug.cgi?id=3200 http://d.puremagic.com/issues/show_bug.cgi?id=3201

The xml parsing code in D2 could use some love and care.  It was originally written by Janice who seems to have dropped off the face of the planet.  It's little more than a first draft with serious performance problems and several important bugs.

Anyone want to volunteer to invest some time in improving it?

Later,
Brad
July 22, 2009
On Tue, Jul 21, 2009 at 11:53 PM, Brad Roberts<braddr@puremagic.com> wrote:
> Jesse Phillips wrote:
>> On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:
>>
>>> Jesse Phillips wrote:
>>>> According to the documentation having &amp; in a tag will be turned to &
>>>>
>>>> http://digitalmars.com/d/2.0/phobos/std_xml.html#text
>>>>
>>>> I observe that this is not the case. And if an attribute contains &amp; it is turned into &amp;amp; What is the best way to receive the same output for both. The code that follows outputs
>>>>
>>>> Attr: What &amp;amp; Up
>>>> Elem: What &amp; Up
>>>>
>>>>
>>>>
>>>> *testfile.xml:*
>>>>
>>>> <?xml version="1.0" encoding="utf-8"?> <Tests>
>>>>     <Test thing="What &amp; Up">What &amp; Up</Test>
>>>> </Tests>
>>> Clearly std.xml is buggy.  Correct behaviour would be
>>>
>>> Attr: What & Up
>>> Elem: What & Up
>>>
>>> The best place for bug reports is
>>>
>>> http://d.puremagic.com/issues/
>>>
>>> Stewart.
>>
>> http://d.puremagic.com/issues/show_bug.cgi?id=3200 http://d.puremagic.com/issues/show_bug.cgi?id=3201
>
> The xml parsing code in D2 could use some love and care.  It was originally written by Janice who seems to have dropped off the face of the planet.  It's little more than a first draft with serious performance problems and several important bugs.
>
> Anyone want to volunteer to invest some time in improving it?

I don't mean to shoot down the idea?  But Tango already has three XML parsers which are, like, the fastest.  Ever.

http://dotnot.org/blog/archives/2008/03/04/xml-benchmarks-updated-graphs/

I'm just saying, it'd seem like pointless duplication of effort with such parsers _already available_.  If it could be relicensed, I'd say that's the best route.
July 22, 2009
Jarrett Billingsley wrote:
> On Tue, Jul 21, 2009 at 11:53 PM, Brad Roberts<braddr@puremagic.com> wrote:
>> Jesse Phillips wrote:
>>> On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:
>>>
>>>> Jesse Phillips wrote:
>>>>> According to the documentation having &amp; in a tag will be turned to &
>>>>>
>>>>> http://digitalmars.com/d/2.0/phobos/std_xml.html#text
>>>>>
>>>>> I observe that this is not the case. And if an attribute contains &amp; it is turned into &amp;amp; What is the best way to receive the same output for both. The code that follows outputs
>>>>>
>>>>> Attr: What &amp;amp; Up
>>>>> Elem: What &amp; Up
>>>>>
>>>>>
>>>>>
>>>>> *testfile.xml:*
>>>>>
>>>>> <?xml version="1.0" encoding="utf-8"?> <Tests>
>>>>>     <Test thing="What &amp; Up">What &amp; Up</Test>
>>>>> </Tests>
>>>> Clearly std.xml is buggy.  Correct behaviour would be
>>>>
>>>> Attr: What & Up
>>>> Elem: What & Up
>>>>
>>>> The best place for bug reports is
>>>>
>>>> http://d.puremagic.com/issues/
>>>>
>>>> Stewart.
>>> http://d.puremagic.com/issues/show_bug.cgi?id=3200 http://d.puremagic.com/issues/show_bug.cgi?id=3201
>> The xml parsing code in D2 could use some love and care.  It was originally written by Janice who seems to have dropped off the face of the planet.  It's little more than a first draft with serious performance problems and several important bugs.
>>
>> Anyone want to volunteer to invest some time in improving it?
> 
> I don't mean to shoot down the idea?  But Tango already has three XML parsers which are, like, the fastest.  Ever.
> 
> http://dotnot.org/blog/archives/2008/03/04/xml-benchmarks-updated-graphs/
> 
> I'm just saying, it'd seem like pointless duplication of effort with such parsers _already available_.  If it could be relicensed, I'd say that's the best route.

Relicensed and separable from the rest of Tango.  It's been way too long since I looked at that code in Tango to recall any of its details.

Basically I agree with you on this one. :)
July 22, 2009
Brad Roberts escribió:
> Jarrett Billingsley wrote:
>> On Tue, Jul 21, 2009 at 11:53 PM, Brad Roberts<braddr@puremagic.com> wrote:
>>> Jesse Phillips wrote:
>>>> On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:
>>>>
>>>>> Jesse Phillips wrote:
>>>>>> According to the documentation having &amp; in a tag will be turned to
>>>>>> &
>>>>>>
>>>>>> http://digitalmars.com/d/2.0/phobos/std_xml.html#text
>>>>>>
>>>>>> I observe that this is not the case. And if an attribute contains &amp;
>>>>>> it is turned into &amp;amp; What is the best way to receive the same
>>>>>> output for both. The code that follows outputs
>>>>>>
>>>>>> Attr: What &amp;amp; Up
>>>>>> Elem: What &amp; Up
>>>>>>
>>>>>>
>>>>>>
>>>>>> *testfile.xml:*
>>>>>>
>>>>>> <?xml version="1.0" encoding="utf-8"?> <Tests>
>>>>>>     <Test thing="What &amp; Up">What &amp; Up</Test>
>>>>>> </Tests>
>>>>> Clearly std.xml is buggy.  Correct behaviour would be
>>>>>
>>>>> Attr: What & Up
>>>>> Elem: What & Up
>>>>>
>>>>> The best place for bug reports is
>>>>>
>>>>> http://d.puremagic.com/issues/
>>>>>
>>>>> Stewart.
>>>> http://d.puremagic.com/issues/show_bug.cgi?id=3200
>>>> http://d.puremagic.com/issues/show_bug.cgi?id=3201
>>> The xml parsing code in D2 could use some love and care.  It was originally
>>> written by Janice who seems to have dropped off the face of the planet.  It's
>>> little more than a first draft with serious performance problems and several
>>> important bugs.
>>>
>>> Anyone want to volunteer to invest some time in improving it?
>> I don't mean to shoot down the idea?  But Tango already has three XML
>> parsers which are, like, the fastest.  Ever.
>>
>> http://dotnot.org/blog/archives/2008/03/04/xml-benchmarks-updated-graphs/
>>
>> I'm just saying, it'd seem like pointless duplication of effort with
>> such parsers _already available_.  If it could be relicensed, I'd say
>> that's the best route.
> 
> Relicensed and separable from the rest of Tango.  It's been way too long since I
> looked at that code in Tango to recall any of its details.
> 
> Basically I agree with you on this one. :)

Can't just phobos dissappear? :-(

Like... it must be the first standard library in the world that's developed by 3-5 people.