February 18, 2016 Re: std.xml2 (collecting features) control character | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert burner Schadek | On Thursday, 18 February 2016 at 16:41:52 UTC, Robert burner Schadek wrote:
> for instance, quick often I find <80> in tests that are supposed to be valid xml 1.0. they are invalid xml 1.1 though
What char encoding does the document declare itself as?
|
February 18, 2016 Re: std.xml2 (collecting features) control character | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On Thursday, 18 February 2016 at 16:47:35 UTC, Adam D. Ruppe wrote:
> On Thursday, 18 February 2016 at 16:41:52 UTC, Robert burner Schadek wrote:
>> for instance, quick often I find <80> in tests that are supposed to be valid xml 1.0. they are invalid xml 1.1 though
>
> What char encoding does the document declare itself as?
It does not, it has no prolog and therefore no EncodingInfo.
unix file says it is a utf8 encoded file, but not BOM is present.
|
February 18, 2016 Re: std.xml2 (collecting features) control character | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert burner Schadek | On Thursday, 18 February 2016 at 16:54:10 UTC, Robert burner Schadek wrote:
> unix file says it is a utf8 encoded file, but not BOM is present.
the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
|
February 18, 2016 Re: std.xml2 (collecting features) control character | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert burner Schadek | On Thursday, 18 February 2016 at 16:54:10 UTC, Robert burner Schadek wrote: > It does not, it has no prolog and therefore no EncodingInfo. In that case, it needs to be valid UTF-8 or valid UTF-16 and it is a fatal error if there's any invalid bytes: https://www.w3.org/TR/REC-xml/#charencoding == It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding. Specifically, it is a fatal error if an entity encoded in UTF-8 contains any ill-formed code unit sequences, as defined in section 3.9 of Unicode [Unicode]. Unless an encoding is determined by a higher-level protocol, it is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16. == |
February 18, 2016 Re: std.xml2 (collecting features) control character | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert burner Schadek | On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner Schadek wrote:
>> unix file says it is a utf8 encoded file, but not BOM is present.
>
> the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
Gah, I should have read this before replying... well, that does appear to be valid utf-8.... why is it throwing an exception then?
I'm pretty sure that byte stream *is* actually well-formed xml 1.0 and should pass utf validation as well as the XML well-formedness check.
|
February 18, 2016 Re: std.xml2 (collecting features) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert burner Schadek | On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:
> If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.
Oh, I absolutely agree, independent implementation is a bad thing. (Someone should rename DRY as "don't repeat yourself or others"... but DRYOO sounds weird.)
Where's your repo?
|
February 18, 2016 Re: std.xml2 (collecting features) control character | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On Thursday, 18 February 2016 at 17:26:30 UTC, Adam D. Ruppe wrote:
> On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner Schadek wrote:
>>> unix file says it is a utf8 encoded file, but not BOM is present.
>>
>> the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
>
> Gah, I should have read this before replying... well, that does appear to be valid utf-8.... why is it throwing an exception then?
>
> I'm pretty sure that byte stream *is* actually well-formed xml 1.0 and should pass utf validation as well as the XML well-formedness check.
Regarding control characters: If you give me a complete sample file, I can run it through Mozilla's UTF stream conversion and/or XML parsing code (via either SAX or DOMParser) to tell you how that reacts as a reference. Mozilla supports XML 1.0, but not 1.1.
|
February 18, 2016 Re: std.xml2 (collecting features) control character | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alex Vincent | On Thursday, 18 February 2016 at 18:28:10 UTC, Alex Vincent wrote: > Regarding control characters: If you give me a complete sample file, I can run it through Mozilla's UTF stream conversion and/or XML parsing code (via either SAX or DOMParser) to tell you how that reacts as a reference. Mozilla supports XML 1.0, but not 1.1. thanks you making the effort https://github.com/burner/std.xml2/blob/master/tests/eduni/xml-1.1/out/010.xml |
February 19, 2016 Re: std.xml2 (collecting features) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert burner Schadek | On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:
> On Thursday, 18 February 2016 at 04:34:13 UTC, Alex Vincent wrote:
>> I'm looking for a status update. DUB doesn't seem to have many options posted. I was thinking about starting a SAXParser implementation.
>
> I'm working on it, but recently I had to do some major restructuring of the code.
> Currently I'm trying to get this merged https://github.com/D-Programming-Language/phobos/pull/3880 because I had some problems with the encoding of test files. XML has a lot of corner cases, it just takes time.
>
> If you want to on some XML stuff, please join me. It is properly more productive working together than creating two competing implementations.
Would you be interested in mentoring a student for the Google Summer of Code to do work on std.xml?
|
February 19, 2016 Re: std.xml2 (collecting features) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Craig Dillabaugh | On Friday, 19 February 2016 at 04:02:02 UTC, Craig Dillabaugh wrote:
> Would you be interested in mentoring a student for the Google Summer of Code to do work on std.xml?
Yes, why not!
|
Copyright © 1999-2021 by the D Language Foundation