January 15, 2020
On Friday, 10 January 2020 at 14:59:23 UTC, berni44 wrote:
> Please vote: should std.xml be deprecated and moved to undeaD?
>
> See https://forum.dlang.org/thread/fnbsikficjsubxrukkae@forum.dlang.org for some lately discussion about this.

No. Not until a replacement has been available in phobos for several years, debugged and is considered a first class replacement.

This was the promise :
"This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term."

There should be a high bar associated with removal of module from standard library.
Not finding an xml library in the standard library will not help the language for new adopters.
Finding an existing xml library deprecated, will not help the language for existing users.
Going to dub to find that there are multiple xml libraries prominently promoted where some are school projects also doesn't help.
Finding that the ones that look reasonable only work a for a few compiler versions back on one compiler that is only available on a few platforms also doesn't help.

I suggest fixing the one that is there, or merge a parellel set of code into std.xml that meets current phobos standards, are the only ways for D to come out looking like something a professional can recommend.
January 15, 2020
On Saturday, 11 January 2020 at 16:44:03 UTC, Ernesto Castellotti wrote:
>
> I believe that removing it will only leave the std without xml, I don't like the idea of ​​moving too many things out of the std.
>

If you actively looking for a XML parsing module, you'll end up avoiding std.xml anyway. Last 2 times I've done this, trying to work with std.xml was just a loss of time, and I ended up with dom.d which is one of the few maintained AND practical parsing libraries out there.

Removing it will make people more productive, you don't loose time evaluating it as an alternative.
January 17, 2020
On Wednesday, 15 January 2020 at 15:17:09 UTC, Guillaume Piolat wrote:
> On Saturday, 11 January 2020 at 16:44:03 UTC, Ernesto Castellotti wrote:
>>
>> I believe that removing it will only leave the std without xml, I don't like the idea of ​​moving too many things out of the std.
>>
>
> If you actively looking for a XML parsing module, you'll end up avoiding std.xml anyway. Last 2 times I've done this, trying to work with std.xml was just a loss of time, and I ended up with dom.d which is one of the few maintained AND practical parsing libraries out there.
>
> Removing it will make people more productive, you don't loose time evaluating it as an alternative.

I spent a couple of days removing std.xml from most of my code.

I now have incorporated a version of dxml by Jonathon M Davis. To get it working I had to remove the use of static foreach, because gdc 8 (packaged on distribution released late 2019) doesn't support it.
Most of the unit tests are commented out and several other modifications to the code were required, including to the standard library to make it work.

It is a lot more full of range type code etc, and doesn't allocate much. I suppose this is 'phobos standards' that std.xml lacks.

It works well, and resulted in large performance increases.

It would be great if something like dxml was in the standard library.

January 17, 2020
On Saturday, 11 January 2020 at 21:10:06 UTC, Dennis wrote:
> Has anyone made a write-up detailing what is exactly wrong with std.xml (and maybe certain other Phobos modules too)? I have heard some general complaints about bad APIs and outdated idioms but other than that I am out of the loop.

I would like to answer this question, but I can't. The reason is simple: I never used std.xml and just tried once to hunt down a bug.

So you may wonder, why I started this anyway. Well, I perceived, that the question of removing some modules from Phobos comes up over and over again. So it's certainly not a good idea to keep the current state and I'd like to push this change a little bit.

On Monday, 13 January 2020 at 11:09:15 UTC, rikki cattermole wrote:

> dxml is the closest and the author isn't keen to go down that right now.

Would be nice to know, why.


All in all, from the discussion up to now, there seem to be two feasable ways to proceed:

a) remove std.xml
b) replace std.xml by dxml

To decide between these two it would be good to have an answer to both questions (why is std.xml considered bad and what are the hindrances to move dxml into Phobos).
January 17, 2020
On Friday, 17 January 2020 at 10:10:19 UTC, berni44 wrote:

> All in all, from the discussion up to now, there seem to be two feasable ways to proceed:
>
> a) remove std.xml
Yes.

> b) replace std.xml by dxml
Why? dxml is available, why should it be part of phobos?
Several people argued that xml should not be in a standard library, and the author doesn't want it to be there. Maybe replace std.xml with a hint where to find a better solution, but that's it.

January 17, 2020
On 1/17/20 2:01 AM, Alex Burton wrote:

> It works well, and resulted in large performance increases.
> 
> It would be great if something like dxml was in the standard library.
> 

I think the biggest stumbling block was something like schema validation. I can't remember the exact details but Jonathan did not want to include it because it's a security concern. Something in Phobos shouldn't ignore a large part of the standard.

-Steve
January 17, 2020
On Fri, Jan 17, 2020 at 09:50:52AM -0500, Steven Schveighoffer via Digitalmars-d wrote:
> On 1/17/20 2:01 AM, Alex Burton wrote:
> 
> > It works well, and resulted in large performance increases.
> > 
> > It would be great if something like dxml was in the standard library.
> > 
> 
> I think the biggest stumbling block was something like schema validation. I can't remember the exact details but Jonathan did not want to include it because it's a security concern. Something in Phobos shouldn't ignore a large part of the standard.
[...]

No, I don't think it was because of security, it was more because of performance, because the current implementation of dxml uses slicing extensively to avoid needless copying of data. But to validate a schema according to spec, esp. some of the more obscure (and convoluted) corners of the spec, you'd need to pre-parse the whole thing and allocate a bunch of stuff before you can run the validation.

The other stumbling block is entity support, which again has some rarely-used corner cases in the spec where they can recursively expand to arbitrarily large content (IIRC it may even involve network access or at least local filesystem access[*]) that may entirely change the meaning of subsequent characters (and resulting parse tree). This would make the current slices-based API impossible, which kinda undermines dxml's entire underlying premise.

([*] Yeah, the XML spec is IMNSHO the epitome of design by committee producing an insanely-overengineered over-complex system, most features of which normal people never use or are even aware of.)

The ironic thing is that the cases that dxml *does* support are the only cases that 99% of XML users would ever actually need. Yet there's that annoying 1% of obscure and insanely-complex corner in the spec that *some* people out there actually expect to work, which prevents us from saying that dxml implements the entire XML spec.  And Phobos being the epitome of perfectionism, this means dxml will likely never make it in. Or if it does, it's almost guaranteed that *somebody* will barge in and complain loudly about how std.dxml doesn't *actually* implement the XML spec.


T

-- 
Designer clothes: how to cover less by paying more.
January 17, 2020
On Friday, January 17, 2020 11:54:21 AM MST H. S. Teoh via Digitalmars-d wrote:
> On Fri, Jan 17, 2020 at 09:50:52AM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
> > On 1/17/20 2:01 AM, Alex Burton wrote:
> > > It works well, and resulted in large performance increases.
> > >
> > > It would be great if something like dxml was in the standard library.
> >
> > I think the biggest stumbling block was something like schema validation. I can't remember the exact details but Jonathan did not want to include it because it's a security concern. Something in Phobos shouldn't ignore a large part of the standard.
>
> [...]
>
> No, I don't think it was because of security, it was more because of performance, because the current implementation of dxml uses slicing extensively to avoid needless copying of data. But to validate a schema according to spec, esp. some of the more obscure (and convoluted) corners of the spec, you'd need to pre-parse the whole thing and allocate a bunch of stuff before you can run the validation.
>
> The other stumbling block is entity support, which again has some rarely-used corner cases in the spec where they can recursively expand to arbitrarily large content (IIRC it may even involve network access or at least local filesystem access[*]) that may entirely change the meaning of subsequent characters (and resulting parse tree). This would make the current slices-based API impossible, which kinda undermines dxml's entire underlying premise.
>
> ([*] Yeah, the XML spec is IMNSHO the epitome of design by committee producing an insanely-overengineered over-complex system, most features of which normal people never use or are even aware of.)

Basically, you're both right. Security was part of the problem, but it wasn't the core problem. Honestly, everything involved with DOCTYPE was a terrible idea, and if the people who thought it up haven't come to that conclusion in the interim, they should do some serious soul searching. E.G. how on earth did anyone think that it was a good idea for a document to tell an application what constituted a valid document? That's total nonsense. It's up to the application to determine whether its input is valid, whereas DOCTYPE basically makes it the input's job to tell the application whether the input is valid. How did anyone think that that was a good idea? And adding what is essentially a #include and macro system to a document format? How on earth is _that_ a good idea? The DOCTYPE section adds a ton of complexity to the XML spec and any XML parser that would attempt to fully implement it, and it's the sort of thing that no one should have anything to do with unless they have no choice (which unfortunately is proably the case for some people).

Both the security concern and the chief reason that dxml does not support the DOCTYPE section beyond parsing past it have to do with entity references. The DOCTYPE section can not only define entity references, but it can point to other documents which then have to be parsed in order to find the definitions of entity references. Those entity references then get replaced with more or less arbitrary chunks of XML based on their definitions. Basically, it's the equivalent of #including files to access macros that are #defined in those files. The fact that you have to worry about going and parsing another document makes it impossible to simply parse an arbitrary XML document. Suddenly, the parser has to care about where the XML file is on disk (so that it can correctly follow any file paths), and it potentially has to download documents from the internet (since arbitrary URLs can be provided) - which is a big security concern. And regardless of whether the entity references are defined in the current document or a separate document, the fact that they can insert more or less arbitrary XML destroys your ability to have the output simply be slices of the input.

One of the core design goals of dxml was that the output type be either the same as the input type or that it be a TakeExactly of the input type. That way, if you give it a string, you get strings back. It's very efficient that way, and it's way more user friendly. I did not want wrappers coming out the other side, because that would pretty much inevitably result in additional memory allocations occuring just to get strings again. And if you're potentially inserting arbitrary text into the middle of your string, you can't just return a slice. So, while I had originally tried to support the DOCTYPE section in spite of thinking that it's a terrible, terrible idea that it even exists, once I figured out that I couldn't do it while returning slices, I dropped all of my code that was trying to deal with the DOCTYPE section, and I will never make dxml support it. It would be making the parser far worse for the common use case just to support the rare use case (or what certainly _should_ be a rare use case).

> The ironic thing is that the cases that dxml *does* support are the only cases that 99% of XML users would ever actually need. Yet there's that annoying 1% of obscure and insanely-complex corner in the spec that *some* people out there actually expect to work, which prevents us from saying that dxml implements the entire XML spec.  And Phobos being the epitome of perfectionism, this means dxml will likely never make it in. Or if it does, it's almost guaranteed that *somebody* will barge in and complain loudly about how std.dxml doesn't *actually* implement the XML spec.

Yeah. std.xml doesn't support the DOCTYPE stuff either, but I'm sure that if I went through the Phobos review process with dxml, there would be some people screaming that if it's in the standard library, it must support the entirety of the spec. If some poor soul wants to actually implement a parser in D that does that, then all the more power to them, but the result is bound to be worse than dxml for those of us who want to parse XML documents that don't use DOCTYPE-specific features - which is almost certainly most of us.

- Jonathan M Davis



January 17, 2020
> On Monday, 13 January 2020 at 11:09:15 UTC, rikki cattermole wrote:
> > dxml is the closest and the author isn't keen to go down that right now.
>
> Would be nice to know, why.

Honestly, the only reason that I would consider putting dxml through the Phobos review process to get it into Phobos would be because we already have an XML parser in there that we want to get rid of, and if we keep it in there until we have a replacement, then we'll need a replacement eventually. However, I really don't think that Phobos is the right place for parsers for document formats. I come from a C++ background, and that strongly colors my view on what belongs in a standard library. XML is a relatively common format, but it's not the sort of thing that your average application is going to be parsing. So, I don't think that a parser for it belongs in the standard library. Personally, I'd much rather that we just move std.xml to undead and leave XML parsing to libraries on code.dlang.org.

However, even if I did want to get dxml into Phobos, there are further improvements that I would like to make to it first (e.g. implementing support for some of the enhancement requests that I've gotten), and I need to find the time to do that. Either way, I certainly don't have the time or energy to push dxml through the Phobos review process right now.

Going through the Phobos review process would result in a lot of bikeshedding over issues like DOCTYPE support and whether the parser should be fully @nogc. I have no interest in doing either, but I'd have to spend a bunch of time arguing about it to get it into Phobos.

Also, once the parser is in Phobos, I lose control of it. Right now, I can make whatever changes I feel make sense. Obviously, I don't want to break existing code that uses it, but I don't have to argue with other people every time that I want to make a change to the code. To get it into Phobos, I'd have to do a lot of arguing to get it in, and then I'd have to do more arguing every time I wanted to make a change. I'd much rather avoid all of that pain - especially since I don't think that an XML parser belongs in a standard library in the first place. If someone wants to use dxml, it's easily fetched using dub, and if someone doesn't want to use dub, it's trivial enough to download the code and integrate it into their project however they feel like it.

- Jonathan M Davis



January 17, 2020
On Fri, Jan 17, 2020 at 02:43:13PM -0700, Jonathan M Davis via Digitalmars-d wrote: [...]
> Honestly, everything involved with DOCTYPE was a terrible idea,
[...]
> adding what is essentially a #include and macro system to a document format?
[...]
> Basically, it's the equivalent of #including files to access macros that are #defined in those files.
[...]
> and it potentially has to download documents from the internet (since
> arbitrary URLs can be provided)
[...]

Wow. Just when I thought my opinion of C preprocessor macros couldn't get any lower, here we have an example of a system that's essentially equal to a preprocessing system where you can:

- #include files from *arbitrary network addresses*, not just the local
  filesystem;
- expand arbitrary macros defined therein into the file being parsed and
  have it potentially *completely alter the parse tree*.

Words fail to describe my ...incredulity... at this ...incredible... design.

It reminds me of this quote:

	"No, John.  I want formats that are actually useful, rather than
	over-featured megaliths that address all questions by piling on
	ridiculous internal links in forms which are hideously
	over-complex." -- Simon St. Laurent on xml-dev

I used to be skeptical of XML once. Now that skepticism has acquired a significant dose of disgust as well.


T

-- 
What do you get if you drop a piano down a mineshaft? A flat minor.