use «chevrons» to represent string literal

Jan 16

barbosso

Jan 16

Richard (Rikki) Andrew Cattermole

Jan 16

barbosso

Jan 16

Richard (Rikki) Andrew Cattermole

Jan 16

barbosso

Jan 16

Richard (Rikki) Andrew Cattermole

Jan 16

barbosso

Jan 16

barbosso

Jan 16

Richard (Rikki) Andrew Cattermole

Jan 16

Jan 20

Jan 16

Jan 17

Feb 12

Feb 12

They are not part of ASCII, they are part of an "extended ASCII" ISO/IEC 8859-1 aka Latin-1. They do not fit in a single byte. C2 AB https://symbl.cc/en/00AB/ For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do.

On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote: > They are not part of ASCII, they are part of an "extended ASCII" ISO/IEC 8859-1 aka Latin-1. > > They do not fit in a single byte. > > C2 AB > > https://symbl.cc/en/00AB/ > > For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do. The extended ASCII has 8 bits, 256 distinguish characters

On 17/01/2025 10:34 AM, barbosso wrote: > On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote: >> They are not part of ASCII, they are part of an "extended ASCII" ISO/ IEC 8859-1 aka Latin-1. >> >> They do not fit in a single byte. >> >> C2 AB >> >> https://symbl.cc/en/00AB/ >> >> For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do. > > The extended ASCII has 8 bits, 256 distinguish characters D files are encoded as UTF-8. Therefore it does not support extended ASCII.

On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote: > On 17/01/2025 10:34 AM, barbosso wrote: >> On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote: >>> They are not part of ASCII, they are part of an "extended ASCII" ISO/ IEC 8859-1 aka Latin-1. >>> >>> They do not fit in a single byte. >>> >>> C2 AB >>> >>> https://symbl.cc/en/00AB/ >>> >>> For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do. >> >> The extended ASCII has 8 bits, 256 distinguish characters > > D files are encoded as UTF-8. > > Therefore it does not support extended ASCII. Do you understand what you wrote?

On 17/01/2025 10:43 AM, barbosso wrote: > On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote: >> On 17/01/2025 10:34 AM, barbosso wrote: >>> On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote: >>>> They are not part of ASCII, they are part of an "extended ASCII" ISO/ IEC 8859-1 aka Latin-1. >>>> >>>> They do not fit in a single byte. >>>> >>>> C2 AB >>>> >>>> https://symbl.cc/en/00AB/ >>>> >>>> For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do. >>> >>> The extended ASCII has 8 bits, 256 distinguish characters >> >> D files are encoded as UTF-8. >> >> Therefore it does not support extended ASCII. > > Do you understand what you wrote? Yes. Extended ASCII is both a character set and an encoding. The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.

On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) Andrew Cattermole wrote: > On 17/01/2025 10:43 AM, barbosso wrote: >> On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote: >>> On 17/01/2025 10:34 AM, barbosso wrote: >>>> On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote: >>>>> [...] >>>> >>>> The extended ASCII has 8 bits, 256 distinguish characters >>> >>> D files are encoded as UTF-8. >>> >>> Therefore it does not support extended ASCII. >> >> Do you understand what you wrote? > > Yes. > > Extended ASCII is both a character set and an encoding. > > The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit. now I see. UTF-8 use 1 byte to represent 128 characters ASCII and 2 bytes for other characters (including «chevrons»). So, what's the problem?

On Thursday, 16 January 2025 at 22:03:25 UTC, barbosso wrote: > On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) Andrew Cattermole wrote: >> On 17/01/2025 10:43 AM, barbosso wrote: >>> On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote: >>>> [...] >>> >>> Do you understand what you wrote? >> >> Yes. >> >> Extended ASCII is both a character set and an encoding. >> >> The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit. > > > now I see. > UTF-8 use 1 byte to represent 128 characters ASCII > and 2 bytes for other characters (including «chevrons»). > So, what's the problem? GCC and Clang can compile identifiers with Unicode symbols.

January 17

Re: use «chevrons» to represent string literal

Posted by Richard (Rikki) Andrew Cattermole
in reply to barbosso

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to barbosso

Permalink

On 17/01/2025 11:16 AM, barbosso wrote:
> On Thursday, 16 January 2025 at 22:03:25 UTC, barbosso wrote:
>> On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>> On 17/01/2025 10:43 AM, barbosso wrote:
>>>> On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>>>> [...]
>>>>
>>>> Do you understand what you wrote?
>>>
>>> Yes.
>>>
>>> Extended ASCII is both a character set and an encoding.
>>>
>>> The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.
>>
>>
>> now I see.
>> UTF-8 use 1 byte to represent 128 characters ASCII
>> and 2 bytes for other characters (including «chevrons»).
>> So, what's the problem?
> 
> GCC and Clang can compile identifiers with Unicode symbols.

I know, I implemented D's UAX31 identifiers.

Better to have the right terminology for this.

However the current stance is that we have possibly too many string types. So far you have proposed new delimiters but not new behaviors (which would be required to add it).

Forums