May 11, 2015
On Monday, 11 May 2015 at 15:20:12 UTC, Alex Parrill wrote:
> Can we please not turn this thread into an XML vs JSON flamewar?

This is not a flamewar, JSON is ad hoc and I use it a lot, but it isn't actually suitable as a file and archival exchange format. It is important that people understand what the point of XML is in order to build something useful.

Full XML support and tooling is very valuable for typed GC-backed batch processing. That means namespaces, entities, XQuery equivalents, DOMs etc

A library backed tooling pipeline would be a valuable asset for D. The value is not in _reading_ or _writing_ XML. The value is all about providing a framework for structured grammar/namespace based _processing_ and _transforms_.
May 12, 2015
On Sunday, 10 May 2015 at 08:54:09 UTC, Joakim wrote:
> One can do all these things with better formats than either XML or JSON.

Hypothetically, yes, though formats better than XML don't exist. I personally find XML perfectly readable.
February 18, 2016
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:
> std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it:
>
> - SAX and DOM parser
> - in-situ / slicing parsing when possible (forward range?)
> - compile time switch (CTS) for lazy attribute parsing
> - CTS for encoding (ubyte(ASCII), char(utf8), ... )
> - CTS for input validating
> - performance
>
> Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2
>
> Please post you feature requests, and please keep the posts DRY and on topic.

I'm looking for a status update.  DUB doesn't seem to have many options posted.  I was thinking about starting a SAXParser implementation.
February 18, 2016
On Thursday, 18 February 2016 at 04:34:13 UTC, Alex Vincent wrote:
> I'm looking for a status update.  DUB doesn't seem to have many options posted.  I was thinking about starting a SAXParser implementation.

I'm working on it, but recently I had to do some major restructuring of the code.
Currently I'm trying to get this merged https://github.com/D-Programming-Language/phobos/pull/3880 because I had some problems with the encoding of test files. XML has a lot of corner cases, it just takes time.

If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.
February 18, 2016
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:
> If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.

also I would like to see this https://github.com/D-Programming-Language/phobos/pull/2995 go in first to be able to accurately measure and compare performance
February 18, 2016
On 02/18/2016 05:49 AM, Robert burner Schadek wrote:
> On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:
>> If you want to on some XML stuff, please join me. It is properly more
>> productive working together than creating two competing implementations.
>
> also I would like to see this
> https://github.com/D-Programming-Language/phobos/pull/2995 go in first
> to be able to accurately measure and compare performance

Would the measuring be possible with 2995 as a dub package? -- Andrei
February 18, 2016
On Thursday, 18 February 2016 at 12:30:29 UTC, Andrei Alexandrescu wrote:
>> also I would like to see this
>> https://github.com/D-Programming-Language/phobos/pull/2995 go in first
>> to be able to accurately measure and compare performance
>
> Would the measuring be possible with 2995 as a dub package? -- Andrei

yes, after have synced the dub package to the PR
February 18, 2016
While working on a new xml implementation I came cross "control characters (CC)". [1]
When trying to validate/convert an utf string these lead to exceptions, because they are not valid utf character.
Unfortunately, some of these characters are allowed to appear in valid xml 1.* documents.

I currently see two option how to go about it:

1. Do not allow non CCs that do not work with existing functionality.
1.Pros
  * easy
1.Cons
  * the resulting xml implementation will not be xml 1.* complete

2. Add special cases to the existing functionality to handle CCs that are allowed in 1.0.
2.Pros
  * the resulting xml implementation will be xml 1.* complete
2.Cons
  * will make utf de/encoding slower as I would need to add additional logic

Any other ideas, feedback?




[1] https://en.wikipedia.org/wiki/C0_and_C1_control_codes

February 18, 2016
On Thursday, 18 February 2016 at 15:56:58 UTC, Robert burner Schadek wrote:
> When trying to validate/convert an utf string these lead to exceptions, because they are not valid utf character.

That means the user didn't encode them properly...

Which one specifically are you thinking of? I'm pretty sure all those control characters have a spot in the Unicode space and can be properly encoded as UTF-8 (though I think even if they are properly encoded, some of them are illegal in XML anyway).

If they appear in another form, it is invalid and/or needs a charset conversion, which should be specified in the XML document itself.
February 18, 2016
for instance, quick often I find <80> in tests that are supposed to be valid xml 1.0. they are invalid xml 1.1 though