May 06, 2015
On 2015-05-05 16:04, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang@gmail.com>" wrote:

> In my opinion it is rather difficult to build a good API without also
> using the API in an application in parallel. So it would be a good
> strategy to build a specific DOM along with writing the XML
> infrastructure, like SVG/HTML.

Agree.

> Also, some parsers, like RapidXML only support a subset of XML. So they
> cannot be used for comparisons.

The Tango parser has some limitation as well. In some places it sacrificed correctness for speed. There's a comment claiming the parser might read past the input if it's not well formed.

-- 
/Jacob Carlborg
May 06, 2015
An old friend of mine who was intimate with the microsoft xml parsers was fond of saying, particularly with respect to xml parsers, that if you hadn't finished implementing and testing error handling and negative tests (ie, malformed documents) that your positive benchmarks were fairly meaningless.  A whole lot of work goes into that 'second half' of things that can quickly cost performance.

I didn't dive or don't recall specific details as this was years ago.

The (over-)generalization from there is an old adage: it's easy to write an incorrect program.

On 5/5/2015 11:33 PM, Jacob Carlborg via Digitalmars-d wrote:
> On 2015-05-05 16:04, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?=
> <ola.fosheim.grostad+dlang@gmail.com>" wrote:
>
>> In my opinion it is rather difficult to build a good API without also
>> using the API in an application in parallel. So it would be a good
>> strategy to build a specific DOM along with writing the XML
>> infrastructure, like SVG/HTML.
>
> Agree.
>
>> Also, some parsers, like RapidXML only support a subset of XML. So they
>> cannot be used for comparisons.
>
> The Tango parser has some limitation as well. In some places it
> sacrificed correctness for speed. There's a comment claiming the parser
> might read past the input if it's not well formed.
>
May 06, 2015
On 06/05/2015 07:31, Jacob Carlborg wrote:
> On 2015-05-06 01:38, Walter Bright wrote:
>
>> I haven't read the Tango source code, but the performance of it's xml
>> was supposedly because it did not use the GC, it used slices.
>
> That's only true for the pull parser (not sure about the SAX parser).
> The DOM parser needs to allocate the nodes, but if I recall correctly
> those are allocated in a free list. Not sure which parser was used in
> the test.
>

The direct comparisons were with the DOM parsers (I was playing with a D port of some C++ code at work at the time, and that is DOM based).

xmlp has alternate parsers (event driven etc) which were faster in some simple tests i did, but I don't recall if I did a direct comparison with Tango there.
May 09, 2015
On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:
> On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
>
>> My request: just skip it.  XML is a horrible waste of space for a standard, better D doesn't support it well, anything to discourage it's use.  I'd rather see you spend your time on something worthwhile.  If data formats are your thing, you could help get Ludwig's JSON stuff in, or better yet, enable some nice binary data format.
>
> You two are terrible at motivating people. "Better D doesn't
> support it well" and "JSON is superior through-and-through" is
> overly dismissive. To me it sounds like someone saying replace
> C++ with JavaScript, because C++ is a horrible standard and
> JavaScript is so much superior.  Honestly.

You seem to have missed the point of my post, which was to discourage him from working on an XML module for phobos.  As for "motivating" him, I suggested better alternatives.  And I never said JSON was great, but it's certainly _much_ more readable than XML, which is one of the basic goals of a text format.

> Remember that while JSON is simpler, XML is not just a
> structured container for bool, Number and String data. It
> comes with many official side kicks covering a broad range of
> use cases:
>
> XPath:
>  * allows you to use XML files like a textual database
>  * complex enough to allow for almost any imaginable query
>  * many tools emerged to test XPath expressions against XML documents
>  * also powers XSLT
>    (http://www.liquid-technologies.com/xpath-tutorial.aspx)
>
> XSL (Extensible Stylesheet Language) and
> XSLT (XSL Transformations):
>  * written as XML documents
>  * standard way to transform XML from one structure into another
>  * convert or "compile" data to XHTML or SVG for display in a browser
>  * output to XSL-FO
>
> XSL-FO (XSL formatting objects):
>  * written as XSL
>  * type-setting for XML; a XSL-FO processor is similar to a LaTex processor
>  * reads an XML document (a "Format" document) and outputs to a PDF, RTF or similar format
>
> XML Schema Definition (XSD):
>  * written as XML
>  * linked in by an XML file
>  * defines structure and validates content to some extent
>  * can set constraints on how often an element can occur in a list
>  * can validate data type of values (length, regex, positive, etc.)
>  * database like unique IDs and references

These are all incredibly dumb ideas.  I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)

> I think XML is the most eat-your-own-dog-food language ever
> and nicely covers a wide range of use cases.

The problem is you're still eating dog food. ;)

> In any case there
> are many XML based file formats that we might want to parse.
> Amongst them SVG, OpenDocument (Open/LibreOffics), RSS feeds,
> several US Offices, XMP and other meta data formats.

Sure, and if he has any real need for any of those, who are we to stop him?  But if he's just looking for some way to contribute, there are better ways.

On Monday, 4 May 2015 at 20:44:42 UTC, Jonathan M Davis wrote:
> Also true. Many of us just don't find enough time to work on D, and we don't seem to do a good job of encouraging larger contributions to Phobos, so newcomers don't tend to contribute like that. And there's so much to do all around that the big stuff just falls by the wayside, and it really shouldn't.

This is why I keep asking Walter and Andrei for a list of "big stuff" on the wiki- they don't have to be big, just important- so that newcomers know where help is most needed.  Of course, it doesn't have to be them, it could be any member of the D core team, though whatever the BDFLs push for would have a bit more weight.
May 09, 2015
On Saturday, 9 May 2015 at 10:28:53 UTC, Joakim wrote:
> On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:
>> On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
>>
clip
>
>> Remember that while JSON is simpler, XML is not just a
>> structured container for bool, Number and String data. It
>> comes with many official side kicks covering a broad range of
>> use cases:
>>
>> XPath:
>> * allows you to use XML files like a textual database
>> * complex enough to allow for almost any imaginable query
>> * many tools emerged to test XPath expressions against XML documents
>> * also powers XSLT
>>   (http://www.liquid-technologies.com/xpath-tutorial.aspx)
>>
>> XSL (Extensible Stylesheet Language) and
>> XSLT (XSL Transformations):
>> * written as XML documents
>> * standard way to transform XML from one structure into another
>> * convert or "compile" data to XHTML or SVG for display in a browser
>> * output to XSL-FO
>>
>> XSL-FO (XSL formatting objects):
>> * written as XSL
>> * type-setting for XML; a XSL-FO processor is similar to a LaTex processor
>> * reads an XML document (a "Format" document) and outputs to a PDF, RTF or similar format
>>
>> XML Schema Definition (XSD):
>> * written as XML
>> * linked in by an XML file
>> * defines structure and validates content to some extent
>> * can set constraints on how often an element can occur in a list
>> * can validate data type of values (length, regex, positive, etc.)
>> * database like unique IDs and references
>
> These are all incredibly dumb ideas.  I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)
>
>> I think XML is the most eat-your-own-dog-food language ever
>> and nicely covers a wide range of use cases.
>
> The problem is you're still eating dog food. ;)

I have to agree with Joakim on this.  Having spent much of this past
week trying to get XML generated by gSOAP (project has some legacy
code) to work with JAXB (Java) has reinforced my dislike for XML.

I've used things like XPath and XLST in the past, so I can appreciate
their power, but think the 'jobs' they perform would be better supported
elsewhere (ie. language specific XML frameworks).

In trying to pass data between applications I just want a simple way
of packaging up the data and ideally making serialization/deserialization
easy for me.  At some point the programmer working on these needs
to understand and validate the data anyway.  Sure you can use DTD/XML Schema to
handle the validation part, but it is just easier to deal with that
within you own code - without having to learn a 'whole new language', that
is likely harder to grok than the tools you would have at your disposal
in your language of choice.

Having said all that.  As much as I share Joakim's sentiment that I wish
XML would just go away,  there is a lot of it out there, and I think having good support in Phobos is very valuable so I thank Robert for his efforts.

Craig




May 10, 2015
Am Sat, 09 May 2015 10:28:52 +0000
schrieb "Joakim" <dlang@joakim.fea.st>:

> On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:
>
> > You two are terrible at motivating people. "Better D doesn't
> > support it well" and "JSON is superior through-and-through" is
> > overly dismissive.
> > > 
> You seem to have missed the point of my post, which was to discourage him from working on an XML module for phobos.  As for "motivating" him, I suggested better alternatives.  And I never said JSON was great, but it's certainly _much_ more readable than XML, which is one of the basic goals of a text format.

Well, I was mostly answering to w0rp here. JSON is both
readable and easy to parse, no question.

> > Remember that while JSON is simpler, XML is not just a structured container for bool, Number and String data. It comes with many official side kicks covering a broad range of use cases:
> >
> > XPath:
> > > >
> > XSL and XSLT
> > > >
> > XSL-FO (XSL formatting objects):
> > > >
> > XML Schema Definition (XSD):
> > > 
> These are all incredibly dumb ideas.  I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)

:) One can't really answer this one. But with many hundreds of
published data exchange formats built on XML, it can't have been
too shabby all along.
And sometimes small things matter, like being able to add comments
along with the "payload". JSON doesn't have that.
Or knowing that both sender and receiver will validate the XML the
same way through XSD. So if it doesn't blow up on your end, it will
pass validation on the other end, too.


Am Sat, 09 May 2015 13:04:57 +0000
schrieb "Craig Dillabaugh" <craig.dillabaugh@gmail.com>:

> I have to agree with Joakim on this.  Having spent much of this
> past
> week trying to get XML generated by gSOAP (project has some legacy
> code) to work with JAXB (Java) has reinforced my dislike for XML.
> 
> I've used things like XPath and XLST in the past, so I can
> appreciate
> their power, but think the 'jobs' they perform would be better
> supported
> elsewhere (ie. language specific XML frameworks).
> 
> In trying to pass data between applications I just want a simple
> way
> of packaging up the data and ideally making
> serialization/deserialization
> easy for me.  At some point the programmer working on these needs
> to understand and validate the data anyway.  Sure you can use
> DTD/XML Schema to
> handle the validation part, but it is just easier to deal with
> that
> within you own code - without having to learn a 'whole new
> language', that
> is likely harder to grok than the tools you would have at your
> disposal
> in your language of choice.

You see, the thing is that XSD is _not_ a whole new language,
it is written in XML as well, probably specifically to make it
so. Try to switch the perspective: With XSD (if it is
sufficient for your validation needs) _one_ person needs to
learn and write it and other programmers (inside or outside
the company) just use the XML library of choice to handle
validation via that schema. Once the schema is loaded it is
usually no more than doc.validate();
(There is also good GUI tools to assist in writing XSD.)
What you propose on the other hand is that everyone involved
in the data exchange writes their own validation code in their
language of choice, with either no access to existing sources
or functionality that doesn't translate to their language!

-- 
Marco

May 10, 2015
On Sunday, 10 May 2015 at 07:01:58 UTC, Marco Leise wrote:
> Am Sat, 09 May 2015 10:28:52 +0000
> schrieb "Joakim" <dlang@joakim.fea.st>:
>
>> On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:
>> > Remember that while JSON is simpler, XML is not just a
>> > structured container for bool, Number and String data. It
>> > comes with many official side kicks covering a broad range of
>> > use cases:
>> >
>> > XPath:
>> > >> >
>> > XSL and XSLT
>> > >> >
>> > XSL-FO (XSL formatting objects):
>> > >> >
>> > XML Schema Definition (XSD):
>> > >> 
>> These are all incredibly dumb ideas.  I don't deny that many people may use these things, but then people use hammers for all kinds of things they shouldn't use them for too. :)
>
> :) One can't really answer this one. But with many hundreds of
> published data exchange formats built on XML, it can't have been
> too shabby all along.

It's worse than shabby, it's a horrible, horrible choice.  Not just for data formats, but for _anything_.  XML should not be used.

> And sometimes small things matter, like being able to add comments
> along with the "payload". JSON doesn't have that.
> Or knowing that both sender and receiver will validate the XML the
> same way through XSD. So if it doesn't blow up on your end, it will
> pass validation on the other end, too.

One can do all these things with better formats than either XML or JSON.

But why do we often end up dealing with these two?  Familiarity, that is the only reason.  XML seems familiar to anybody who's written some HTML, and JSON became familiar to web developers initially.  Starting from those two large niches, they've expanded out to become the two most popular data interchange formats, despite XML being a horrible mess and JSON being too simple for many uses.

I'd like to see a move back to binary formats, which is why I mentioned that to Robert.  D would be an ideal language in which to show the superiority of binary to text formats, given its emphasis on efficiency.  Many devs have learned the wrong lessons from past closed binary formats, when open binary formats wouldn't have many of those deficiencies.

There have been some interesting moves back to open binary formats/protocols in recent years, like Hessian (http://hessian.caucho.com/), Thrift (https://thrift.apache.org/), MessagePack (http://msgpack.org/), and Cap'n Proto (from the protobufs guy after he left google - https://capnproto.org/).  I'd rather see phobos support these, which are the future, rather than flash-in-the-pan text formats like XML or JSON.
May 10, 2015
On Sunday, 10 May 2015 at 08:54:09 UTC, Joakim wrote:
> It's worse than shabby, it's a horrible, horrible choice.  Not just for data formats, but for _anything_.  XML should not be used.

I feel the same way about XML, and I also think that having strong  aesthetic internal emotional responses is often necessary to achieve excellence in engineering.

> But why do we often end up dealing with these two?  Familiarity, that is the only reason.  XML seems familiar to anybody who's written some HTML, and JSON became familiar to web developers initially.  Starting from those two large niches, they've expanded out to become the two most popular data interchange formats, despite XML being a horrible mess and JSON being too simple for many uses.

Sometimes you get to pick, but often not.  I can hardly tell the UK Debt Management Office to give up XML and switch to msgpack structs (well, I can, but I am not sure they would listen).  So at the moment for some data series I use a python library via PyD to convert xml files to JSON.  But it would be nice to do it all in D.

I am not sure XML is going away very soon since new protocols keep being created using it.  (Most recent one I heard of is one for allowing hedge funds to achieve full transparency of their portfolio to end investors - not necessarily something that will achieve what people think it will, but one in tune with the times).


Laeeth.
May 10, 2015
On Sunday, 10 May 2015 at 07:01:58 UTC, Marco Leise wrote:
> Well, I was mostly answering to w0rp here. JSON is both
> readable and easy to parse, no question.

JSON is just javascript literals with some silly constraints. As crappy a format as it gets. Even pure Lisp would have been better. And much more powerful!

> :) One can't really answer this one. But with many hundreds of
> published data exchange formats built on XML, it can't have been
> too shabby all along.
> And sometimes small things matter, like being able to add comments
> along with the "payload".

XML is actually great for what it is: eXtensible. It means you can build forward compatible formats and annotate existing formats with metadata without breaking existing (compliant) applications etc... It also means you can datamine files whithout knowing the full format.

> Or knowing that both sender and receiver will validate the XML the
> same way through XSD.

Right, or build a database/archival service that is generic.

XML is not going away until there is something better, and that won't happen anytime soon. It is also one of the few formats that I actually need library and _good_ DOM support for. (JSON can be done in an afternoon, so I don't care if it is supported or not...)
May 11, 2015
Can we please not turn this thread into an XML vs JSON flamewar?

XML is one of the most popular data formats (for better or for worse), so a parser would be a good addition to the standard library.