Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
July 30, 2012 [dmd-internals] named character entities, the spec, and dmd | ||||
---|---|---|---|---|
| ||||
This is the D spec page on named character entities: http://dlang.org/entity.html From what I can tell, it's essentially (if not exactly) what HTML 4 lists: http://www.htmlhelp.com/reference/html40/entities/ http://www.w3.org/TR/html4/sgml/entities.html However, what dmd seems to be using is essentilaly what HTML 5 lists: http://www.w3.org/TR/html5/named-character-references.html though it appears to have taken its list from here: http://www.w3.org/2003/entities/2007/w3centities-f.ent The problem is (aside from the fact that the D spec and dmd don't match) that we appear to be dealing with a moving target here, since HTML is a moving target. Should the D spec and dmd target a specific version of HTML? If so, can that list change later (e.g. moving from HTML 4 to 5 or 5 to 6 whenever 6 comes along)? And if it's HTML 5, HTML 5 itself isn't finalized yet, so _that_'s potentially a moving target. Or should the D spec just make its own list and stick with that (in which case, it would presumably match one of the HTML specs initially but may not do so in the long run)? I assume that we _don't_ want to take the approach of letting the implementation define whatever entities it feels like, even with the caveat that they're supposed to have come from one of the HTML specs. From what I can tell, two names were redefined (⟨ and &rlang;) in HTML 5, but other than that, it's purely additive. I ran into this, because I'm working on a lexer for D, and I created a unit test to check all of the named entities I had against what dmd did, and those two didn't match. Looking at entity.c in dmd, it's worse than that in that there are far more defined there than in the spec, but regardless, it's clearly a potential implementation issue if anything following the spec is going to match dmd - especially if the spec is a moving target in this case. What _I_ would be tempted to go for is to have the D spec specifically state that it supports the list of named entities that HTML 5 does, giving a link to the current HTML 5 spec and then update that link and dmd (and the changelog) whenever the HTML 5 spec changes and then leave it at the final draft of HTML 5 once that comes around. That way, we don't have to list every single entity in the spec ourselves and it's clearly defined what we currently support. It _does_ present a slightly moving target for the moment that way, but I suspect that the named character entities aren't changing much in the HTML 5 spec, and since dmd is _already_ supporting them, I'm not sure that it's reasonable to say that we're supporting HTML 4 (which is what the spec currently seems to match though it doesn't say so). Thoughts? I'm perfectly willing to go and create whatever pull requests are necessary for dmd and d-programming-language.org to fix this, but we need a decision of some kind on how we want to proceed. - Jonathan M Davis _______________________________________________ dmd-internals mailing list dmd-internals@puremagic.com http://lists.puremagic.com/mailman/listinfo/dmd-internals |
July 30, 2012 Re: [dmd-internals] named character entities, the spec, and dmd | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Monday, July 30, 2012 22:51:58 Jonathan M Davis wrote: > Thoughts? I'm perfectly willing to go and create whatever pull requests are necessary for dmd and d-programming-language.org to fix this, but we need a decision of some kind on how we want to proceed. By the way, it does look like some named entities in HTML 5 are two code points in length rather than just one, and dmd does not support those (and I don't know how it could given that a named entity can be used in a character literal as well as a string literal), so we'll need to say that we don't support those regardless. But that's easy enough to do if we went with the approach of saying in the D spec that we followed the HTML 5 spec - you just say that we support the HTML 5 list of named character entities which are a single code point. But it _does_ mean that we have to say more than that we follow the HTML 5 spec for named character entities. - Jonathan M Davis _______________________________________________ dmd-internals mailing list dmd-internals@puremagic.com http://lists.puremagic.com/mailman/listinfo/dmd-internals |
July 30, 2012 Re: [dmd-internals] named character entities, the spec, and dmd | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On 7/30/2012 11:02 PM, Jonathan M Davis wrote: > On Monday, July 30, 2012 22:51:58 Jonathan M Davis wrote: >> Thoughts? I'm perfectly willing to go and create whatever pull requests are >> necessary for dmd and d-programming-language.org to fix this, but we need a >> decision of some kind on how we want to proceed. > By the way, it does look like some named entities in HTML 5 are two code > points in length rather than just one, and dmd does not support those (and I > don't know how it could given that a named entity can be used in a character > literal as well as a string literal), so we'll need to say that we don't > support those regardless. But that's easy enough to do if we went with the > approach of saying in the D spec that we followed the HTML 5 spec - you just > say that we support the HTML 5 list of named character entities which are a > single code point. But it _does_ mean that we have to say more than that we > follow the HTML 5 spec for named character entities. > Simply say that we support the HTML 5 spec. We can get the two code point ones to work. _______________________________________________ dmd-internals mailing list dmd-internals@puremagic.com http://lists.puremagic.com/mailman/listinfo/dmd-internals |
July 30, 2012 Re: [dmd-internals] named character entities, the spec, and dmd | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Monday, July 30, 2012 23:37:50 Walter Bright wrote: > Simply say that we support the HTML 5 spec. Okay. > We can get the two code point ones to work. How? A dchar is a single code point. We could theoretically make them work in string literals (though that does complicate things a bit), but I don't see how that would be possible with character literals. - Jonathan M Davis _______________________________________________ dmd-internals mailing list dmd-internals@puremagic.com http://lists.puremagic.com/mailman/listinfo/dmd-internals |
July 31, 2012 Re: [dmd-internals] named character entities, the spec, and dmd | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On 31 July 2012 08:43, Jonathan M Davis <jmdavisProg@gmx.com> wrote: > On Monday, July 30, 2012 23:37:50 Walter Bright wrote: >> Simply say that we support the HTML 5 spec. > > Okay. > >> We can get the two code point ones to work. > > How? A dchar is a single code point. We could theoretically make them work in string literals (though that does complicate things a bit), but I don't see how that would be possible with character literals. It just doesn't fit into a dchar. We have that situation already, char x = 'รค', doesn't compile. It's OK. _______________________________________________ dmd-internals mailing list dmd-internals@puremagic.com http://lists.puremagic.com/mailman/listinfo/dmd-internals |
Copyright © 1999-2021 by the D Language Foundation