February 12, 2018
On Monday, 12 February 2018 at 14:54:48 UTC, rikki cattermole wrote:
> Just because something is 'good enough' does not make it 'good enough' for Phobos. In the words of Andrei "Good enough is not good enough", we need to aim higher to show what we actually can do.

About 5 years ago (I think, I actually have the link on my other computer but it is 2,000 miles away right now), Andrei said something along the lines of "without the review process, we get junk like std.json".

Ironically, that same review process may be why we still have such "junk". (actually personally, I don't hate std.json).

If std.xml is really so bad and has been for so long, surely we ought to take an opportunity to change that, even if the change isn't perfect.
February 12, 2018
On 12/02/2018 3:08 PM, Adam D. Ruppe wrote:
> On Monday, 12 February 2018 at 14:54:48 UTC, rikki cattermole wrote:
>> Just because something is 'good enough' does not make it 'good enough' for Phobos. In the words of Andrei "Good enough is not good enough", we need to aim higher to show what we actually can do.
> 
> About 5 years ago (I think, I actually have the link on my other computer but it is 2,000 miles away right now), Andrei said something along the lines of "without the review process, we get junk like std.json".
> 
> Ironically, that same review process may be why we still have such "junk". (actually personally, I don't hate std.json).
> 
> If std.xml is really so bad and has been for so long, surely we ought to take an opportunity to change that, even if the change isn't perfect.

It depends.

The implementation does not need to be perfect or full fledged to go into experimental.

But if at the start of the review process it is already well known that the public API would require a complete change to accommodate the intended goal it is unacceptable.

Take std.experimental.allocators as an example. It currently is going through a massive API change, but when it first got PR'd, did we know that we should be RC'ing allocators? No of course not, otherwise we'd have done it.

At this point in time I cannot say that dxml in good faith serves to represent the XML specification for the D community in full. This is unfortunately not about bike shedding.

It is one thing to bike shed features, but when scope does not match the intended goal, we have got to be careful about what goes into Phobos.

All J.M.D. has to do to change this, is make the API match the spec (as close as possible, without writing another parser) and separate out the implementation into a different and very clear module (probably a sub package) which states clearly that it is a subset with the full grammar listed that it supports.

That way everybody is clear and we can later on get a full implementation as part of taking it out of experimental :)
February 12, 2018
On Monday, 12 February 2018 at 14:04:38 UTC, Jonathan M Davis wrote:

> However, if folks as a whole think that Phobos' xml parser needs to support the DTD section to be acceptable, then dxml won't replace std.xml, because dxml is not going to implement DTD support. DTD support fundamentally does not fit in with dxml's design.

Can't you simply give it a name other than std.xml that indicates it doesn't do everything related to xml? It doesn't make sense to not put it into Phobos because of the name, and that should be an easy problem to solve.
February 12, 2018
On Monday, 12 February 2018 at 15:43:59 UTC, bachmeier wrote:
> On Monday, 12 February 2018 at 14:04:38 UTC, Jonathan M Davis wrote:
>
>> However, if folks as a whole think that Phobos' xml parser needs to support the DTD section to be acceptable, then dxml won't replace std.xml, because dxml is not going to implement DTD support. DTD support fundamentally does not fit in with dxml's design.
>
> Can't you simply give it a name other than std.xml that indicates it doesn't do everything related to xml? It doesn't make sense to not put it into Phobos because of the name, and that should be an easy problem to solve.

Hit send too fast. std.xml.base would be reasonable.
February 12, 2018
On Monday, February 12, 2018 15:26:24 rikki cattermole via Digitalmars-d- announce wrote:
> All J.M.D. has to do to change this, is make the API match the spec (as close as possible, without writing another parser) and separate out the implementation into a different and very clear module (probably a sub package) which states clearly that it is a subset with the full grammar listed that it supports.

That literally cannot be done. dxml returns slices (or takeExactly's) of the original input. For it to do otherwise would harm performance and usability, but in order to implement full DTD support, it's impossible to return slices of the original input in the general case, because you have to be able to mutate the data whenever entity references get involved. If the API were entirely string-based, then whether the implementation returned slices or newly allocated strings could be an implementation detail, but as soon as you're dealing with arbitrary ranges of characters, that doesn't work. At that point, you're forced to either return strings for everything (which means allocating for any ranges that aren't strings) or to return a lazy range of characters and thus can't return the original type. And that means that if you pass it a string, you're stuck with a lazy range out the other end instead of a string, and to get a string again, you have to allocate, whereas with what I have now, the parser does almost no allocations, and as long as the input type supports slicing, you get exactly the same type out the other end, which is a huge usabality improvement IMHO.

So, you can't have DTD support with the kind of API that dxml has, and changing the API to something that could work with DTD support would harm the parser for all of the cases where DTD support is unnecessary.

Even if I were going to implement full DTD support, I would do it with another parser, not change the parser that dxml already has. And if dxml ends up in Phobos with the parser that it has, that doesn't prevent another parser from being added for the DTD case later if someone actually decides to put in the time and effort to do it. Either way, for any XML document that doesn't need DTD support, the way that dxml does things is more efficient and user-friendly than one that had DTD support would be, much as that obviously doesn't cut it for those documents that do need DTD support.

In any case, I'm going to finish implementing dxml without any kind of DTD support and then see how things go as far as the Phobos review process goes. If dxml gets rejected, because the majority of folks think that we're better off with std.xml (or no xml parser at all in Phobos) than one that doesn't have DTD support, then oh well. That sucks, but anyone who wants dxml can then use it as a 3rd party library. I think that the D community would be worse off because of that, but it's not ultimately my decision to make, and either way, I have the parser that I need.

- Jonathan M Davis

February 12, 2018
On Monday, February 12, 2018 15:45:50 bachmeier via Digitalmars-d-announce wrote:
> On Monday, 12 February 2018 at 15:43:59 UTC, bachmeier wrote:
> > On Monday, 12 February 2018 at 14:04:38 UTC, Jonathan M Davis
> >
> > wrote:
> >> However, if folks as a whole think that Phobos' xml parser needs to support the DTD section to be acceptable, then dxml won't replace std.xml, because dxml is not going to implement DTD support. DTD support fundamentally does not fit in with dxml's design.
> >
> > Can't you simply give it a name other than std.xml that indicates it doesn't do everything related to xml? It doesn't make sense to not put it into Phobos because of the name, and that should be an easy problem to solve.
>
> Hit send too fast. std.xml.base would be reasonable.

I have no interest in bikeshedding the name right now or even really arguing about Phobos inclusion (I've already said more in this thread about that than I probably should have). That can be left up to the review process, which already tends to be nasty enough that it wouldn't surprise me at all if dxml doesn't get accepted. The only reason that I have any plans to try for Phobos inclusion with dxml is because std.xml needs to be replaced. If Phobos didn't have an XML parser already, I don't expect that I'd bother, since I don't think that it's all that important that a standard library have an XML parser. I just think that it's important that it not have have a bad one. In general, I think that XML is the sort of thing that's perfectly fine as a 3rd party solution.

- Jonathan M Davis

February 12, 2018
On Mon, Feb 12, 2018 at 07:04:38AM -0700, Jonathan M Davis via Digitalmars-d-announce wrote: [...]
> However, if folks as a whole think that Phobos' xml parser needs to support the DTD section to be acceptable, then dxml won't replace std.xml, because dxml is not going to implement DTD support. DTD support fundamentally does not fit in with dxml's design.

Actually, thinking about this, I'm wondering if a combination of preprocessing and/or postprocessing might make it possible to implement DTD support without needing to rewrite the guts of dxml. AIUI, dxml does parse the DTD section correctly, i.e., as an XML directive, but only doesn't look into its internal details. So one way to implement DTD support might be:

- Write an auxiliary parser that's basically a wrapper around dxml,
  forwarding XML events to the caller, except:
- If a DTD event is encountered, eagerly parse it, store DTD
  declarations internally for future reference.
- If there's a DTD that has been seen, perform on-the-fly validation as
  XML events are forwarded.
- In PCDATA sections, if there are entity references to the DTD, expand
  them, possibly inserting more XML events into the stream based on
  what's defined in the DTD. (This may need to reuse some dxml internals
  to parse XML snippets that might be contained in an entity definition,
  for example.)


[...]
> However, std.xml does not support the DTD section, and glancing over it, it doesn't look like it even handles skipping the DTD section properly (it doesn't handle the fact that '>' can appear within quoted sections within the DTD). So, dxml is not worse than std.xml in that regard, and we wouldn't lose any functionality by having dxml replace std.xml. It just wouldn't necessarily do as much as some folks might like.
[...]

If std.xml currently does not support DTDs, then I say dxml is definitely a Phobos candidate.  At the very least, it does not make the current situation worse.  Rejecting dxml because it doesn't support DTDs is basically letting the perfect be the enemy of the good, which is something this community has been plagued with for far too long.  What's worse: a std.dxml that doesn't support DTDs, or a std.xml with fundamental problems that continue to plague us for the next decade while nobody else steps up to implement a suitable replacement?


T

-- 
Ph.D. = Permanent head Damage
February 12, 2018
On 12/02/2018 3:50 PM, Jonathan M Davis wrote:
> In any case, I'm going to finish implementing dxml without any kind of DTD
> support and then see how things go as far as the Phobos review process goes.
> If dxml gets rejected, because the majority of folks think that we're better
> off with std.xml (or no xml parser at all in Phobos) than one that doesn't
> have DTD support, then oh well. That sucks, but anyone who wants dxml can
> then use it as a 3rd party library. I think that the D community would be
> worse off because of that, but it's not ultimately my decision to make, and
> either way, I have the parser that I need.

We are definitely not better off with just std.xml currently.

The problem comes from the word currently. By going into Phobos even if experimental, its going to be around for a while in some form or another. So we need to invest a decent amount of time into not creating more problems for new users expecting the world and not getting it.

If somebody (say a student?) were to write up a proper API and use dxml as a basis for a simpler parser, now that could be a worth while project and definitely could go into Phobos.

I may even consider doing it at some point in the future.
February 12, 2018
On 12/02/2018 3:59 PM, H. S. Teoh wrote:
> If std.xml currently does not support DTDs, then I say dxml is
> definitely a Phobos candidate.  At the very least, it does not make the
> current situation worse.  Rejecting dxml because it doesn't support DTDs
> is basically letting the perfect be the enemy of the good, which is
> something this community has been plagued with for far too long.  What's
> worse: a std.dxml that doesn't support DTDs, or a std.xml with
> fundamental problems that continue to plague us for the next decade
> while nobody else steps up to implement a suitable replacement?

dxml 7.5k LOC
std.xml 3k LOC

dxml would make the situation a lot worse.
February 12, 2018
On Monday, 12 February 2018 at 16:15:54 UTC, rikki cattermole wrote:

> dxml 7.5k LOC
> std.xml 3k LOC
>
> dxml would make the situation a lot worse.

How could it possibly make the situation any worse than it is now? Atm, nobody will ever use std.xml, because it is sub-standard and has no future.

As others have already mentioned: a DTD parser can still be added at a later point. It's like not moving into newly built house, because the winter garden is not yet finished (and you live in Florida :)