Thread overview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
August 31, 2020 Newline character set in the D lexer - NEL | ||||
---|---|---|---|---|
| ||||
Would there be any benefit from the following suggestion? Add the character Unicode NEL U+0085 into the set of EndOfLine characters in the lexer ? Cecil Ward. |
August 31, 2020 Re: Newline character set in the D lexer - NEL | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cecil Ward | On Monday, 31 August 2020 at 01:49:06 UTC, Cecil Ward wrote:
> Would there be any benefit from the following suggestion? Add the character Unicode NEL U+0085 into the set of EndOfLine characters in the lexer ?
>
> Cecil Ward.
I personally think we should have these definitions:
/* NUL EM SUB */
EndOfFile = { 0x00 | 0x19 | 0x1A | PhysicalEndOfFile };
/* LF FF CR CR LF NEL LSEP PSEP */
EndOfLine = { 0x0A | 0x0C | 0x0D | 0x0D 0x0A | 0x85 | 0x2028 | 0x2029 | EndOfFile };
/* HT VT SP NBSP NQSP MQSP ENSP EMSP 3/MSP */
WhiteSpace = { 0x09 | 0x0B | 0x20 | 0xA0 | 0x2000 | 0x2001 | 0x2002 | 0x2003 | 0x2004
/* 4/MSP 6/MSP FSP PSP THSP HSP ZWSP NNBSP */
| 0x2005 | 0x2006 | 0x2007 | 0x2008 | 0x2009 | 0x200A | 0x200B | 0x202F
/* MMSP WJ IDSP ZWNBSP */
| 0x205F | 0x2060 | 0x3000 | 0xFEFF | EndOfLine };
The definition of D source files misses quite a lot of them :-(
EM = end of medium (what if not this should end a file?!?)
NEL = New Line
LSEP = Line Separator
PSEP = Paragraph Separator
NBSP = non-braking space
NQSP = ENSP = N-wide space
MQSP = EMSP = M-wide space
3/MSP = 1/3 M-wide space (three spaces together are as wide as an M)
4/MSP = 1/4 M-wide space
6/MSP = 1/6 M-wide space
FSP = figure space
PSP = point space
THSP = thin space
HSP = hair space
ZWSP = zero width space
NNBSP = narrow non-braking space
MMSP = mathematic space
WJ = word joiner (invisible space that separate words for the spelling correction)
IDSP = ideographic space (same width as a chinese character)
ZWNBSP = zero-width non-braking space
|
August 31, 2020 Re: Newline character set in the D lexer - NEL | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cecil Ward | On Monday, 31 August 2020 at 01:49:06 UTC, Cecil Ward wrote:
> Would there be any benefit from the following suggestion? Add the character Unicode NEL U+0085 into the set of EndOfLine characters in the lexer ?
>
> Cecil Ward.
Pardon me but why bother while ascii gives already all we need to put spaces and new lines with fast decode (< 80h) ?
|
August 31, 2020 Re: Newline character set in the D lexer - NEL | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nils Lankila | On Monday, 31 August 2020 at 09:39:12 UTC, Nils Lankila wrote: > On Monday, 31 August 2020 at 01:49:06 UTC, Cecil Ward wrote: >> Would there be any benefit from the following suggestion? Add the character Unicode NEL U+0085 into the set of EndOfLine characters in the lexer ? >> >> Cecil Ward. > > Pardon me but why bother while ascii gives already all we need to put spaces and new lines with fast decode (< 80h) ? D already recognizes some non-ascii characters as spaces and line separators [1], so the decision to "bother" has already been made. [1] https://dlang.org/spec/lex.html#character_set |
September 04, 2020 Re: Newline character set in the D lexer - NEL | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dominikus Dittes Scherkl | I agree with Dominikus Note to earlier poster: NEL was used and just possibly may still be used by IBM mainframe users; XML 1.1 understands NEL iirc; see https://www.w3.org/TR/newline/ and https://www.w3.org/International/questions/qa-controls |
September 04, 2020 Re: Newline character set in the D lexer - NEL | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cecil Ward | On Friday, 4 September 2020 at 00:48:59 UTC, Cecil Ward wrote:
> I agree with Dominikus
>
> Note to earlier poster: NEL was used and just possibly may still be used by IBM mainframe users; XML 1.1 understands NEL iirc;
>
> see https://www.w3.org/TR/newline/ and
>
> https://www.w3.org/International/questions/qa-controls
Given the lack of answers I would suggest to go ahead with a PR or at least open an issue. Lexing is not a big deal but if nobody cares this will never be done.
|
September 08, 2020 Re: Newline character set in the D lexer - NEL | ||||
---|---|---|---|---|
| ||||
Posted in reply to NilsLankila | On Friday, 4 September 2020 at 05:28:47 UTC, NilsLankila wrote:
>
> Given the lack of answers I would suggest to go ahead with a PR or at least open an issue. Lexing is not a big deal but if nobody cares this will never be done.
Agreed, Nils. Mind you someone cared enough to include U+2028 and U+2029 in the lexer spec.
I have no idea how to initiate a "PR". Perhaps someone could help me with this?
|
September 08, 2020 Re: Newline character set in the D lexer - NEL | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cecil Ward | PR = "Pull Request".
Easy way is to fork the project on github, clone your (forked version of the) project, make changes, push back. This could be in ~master on your own fork, or ideally in a separate branch.
Then on github, go to the original project and start a new pull request. It should automagically detect that you've made changes (again ideally in a branch of your fork), and offer to make a pull request with your changes against ~master (or whatever is set as the default branch for the project).
James
On 9/8/20 2:42 AM, Cecil Ward wrote:
> On Friday, 4 September 2020 at 05:28:47 UTC, NilsLankila wrote:
>>
>> Given the lack of answers I would suggest to go ahead with a PR or at least open an issue. Lexing is not a big deal but if nobody cares this will never be done.
>
> Agreed, Nils. Mind you someone cared enough to include U+2028 and U+2029 in the lexer spec.
>
> I have no idea how to initiate a "PR". Perhaps someone could help me with this?
|
Copyright © 1999-2021 by the D Language Foundation