Cataclysmic decision re: String literals (page 10)

Walter wrote: > Two ways: > > r"string" \" r"more" > "string\"more" Both being totally awful. One would run over many " when outputting any kind of markup or source. 'string' had solved that problem quite neatly. -i.

>Walter wrote: >> Two ways: >> >> r"string" \" r"more" >> "string\"more" The second idea is wrong. In a WYSIWYG string backslash means backslash, not meta-escape. Raw strings should have no meta-escapes. That's one reason I suggested r"string"more"r with a tail delimiter. It alleviates some of the problem. The only way to deal with the whole problem (e.g. now you want to embed "r in the string) is to use a smarter (less greedy) lexer with (possibly, but not ideally) semantic feedback from the parser. That gets messy. I think "r is a reasonable compromise in that it permits embedded quote marks, the most common need, but beyond that point, one should realize that static strings are just a small subset of any program, and a little manual divide-and-conquer is not that hard for the really hairy ones. There aren't that many to begin with. If a lot of them are staring at you, then at that point you should use a Python script to output proper D code, and automate the labor of implementing divide-and-conquer as per Walter idea #1. Mark

>...would have had me talking to God on the big white telephone (I'm learning Australian, and that means "puking"), I think that is "calling God on the big white phone" I'm American and we used that expression back in the 1970's.

Mark Evans wrote: >>Walter wrote: >> >>>Two ways: >>> >>>r"string" \" r"more" >>>"string\"more" > > > The second idea is wrong. In a WYSIWYG string backslash means backslash, not > meta-escape. Raw strings should have no meta-escapes. That's one reason I > suggested r"string"more"r with a tail delimiter. It alleviates some of the > problem. Which is why he didn't USE a raw string. I think using a character terminator is a good recipe for confusing lexing. I think about the consequence of the symbols I write. Characters, no. This should be solved in the text editor. Hit Control-Quote to enter a quote, hit Control-Quote to exit one, and handle control characters automatically. The only real solution at the language level is to put in a count indicator before the string which is then read as raw UTF-8; everything else is just an inferior simulacrum. But the IDE can do it correctly.

Burton Radons wrote: > This should be solved in the text editor. Hit Control-Quote to enter a quote, hit Control-Quote to exit one, and handle control characters automatically. The only real solution at the language level is to put in a count indicator before the string which is then read as raw UTF-8; everything else is just an inferior simulacrum. But the IDE can do it correctly. Ack, ambiguous. I mean that if you put in a control character - an escape, an out-of-range value - while in an enforced quote, the IDE will simply show the string as it really means, not with escapes. It could use any number of schemes for indicating a string; something that would change when the caret is within the string. When saving, this would be unpacked into escaped data. The IDE would also be careful of rendering characters like NUL, which should be drawn with a special symbol. \n would be transformed into a special symbol (typed using Control-Enter?), while a newline would be rendered normally. I think it would work. I'll play with it in dedit.

Oh, I see now, it was a regular string wherein meta-escape is allowed. Thanks Burton. Still that leaves open how you put " into a raw string. Maybe the lexer will be smart enough to search for the outermost feasible close-quote token. >I think using a character terminator is a good recipe for confusing lexing. I think about the consequence of the symbols I write. Characters, no. I don't follow this statement. If it means "I'm a sloppy thinker when I see characters so please no characters as terminator tokens," then why are they valid as initiator tokens. And why not complain about 1.23L in D. I can buy the argument both ways, but not endwise only. Considering that perspective, initiator r" should presumably become raw" instead. Now we're on the path to XML... What might align with good D taste and simpler lexing is, r["string"more"] x["ABCD FFFF 0000"] The important thing to me is not how characters as tokens feel, but what buys maximum code readability and minimum forbidden embeddings for the minimum keystrokes and lexing hassle. Mark

r["string"] conflicts with syntax for associative arrays. To be honest, I really don't give a damn about raw strings. If you want a string in D, run it through a teeny program that escapes it properly and paste it in. The IDE can assist with this. As far as the language goes, I see it as a non-issue, one already solved perfectly well in C. Sean "Mark Evans" <Mark_member@pathlink.com> wrote in message news:bg9lu5$2gpt$1@digitaldaemon.com... > > Oh, I see now, it was a regular string wherein meta-escape is allowed. Thanks > Burton. > > Still that leaves open how you put " into a raw string. Maybe the lexer will be > smart enough to search for the outermost feasible close-quote token. > > >I think using a character > >terminator is a good recipe for confusing lexing. I think about the > >consequence of the symbols I write. Characters, no. > > I don't follow this statement. If it means "I'm a sloppy thinker when I see > characters so please no characters as terminator tokens," then why are they > valid as initiator tokens. And why not complain about 1.23L in D. I can buy > the argument both ways, but not endwise only. Considering that perspective, > initiator r" should presumably become raw" instead. Now we're on the path to > XML... > > What might align with good D taste and simpler lexing is, > > r["string"more"] > x["ABCD FFFF 0000"] > > The important thing to me is not how characters as tokens feel, but what buys > maximum code readability and minimum forbidden embeddings for the minimum keystrokes and lexing hassle. > > Mark

Mark Evans: >Different string types concatenate, too: >myVar = x"0123" r"string"; --> myVar = '\0\1\2\3string'; What should x"0123" be? A byte array like [0x01, 0x23] or like [0x0, 0x1, 0x2, 0x3]? This seems strange this to me. It's not that intuitive. -Dario

"Mark T" <Mark_member@pathlink.com> wrote in message news:bg9g0a$2b7v$1@digitaldaemon.com... > >...would have had me talking to God on the big white telephone (I'm learning Australian, and that means "puking"), > > I think that is "calling God on the big white phone" > I'm American and we used that expression back in the 1970's. Hmm. I always heard it as "praying to the porcelain gods."

July 31, 2003

Re: Cataclysmic decision re: String literals

Posted by Walter
in reply to Mark Evans

Permalink

Walter

Posted in reply to Mark Evans

Permalink

Some good ideas here.

Some nits:
1) D already concatenates strings that are juxtaposed.
2) Embedded unicode in strings will be fully supported in the next release,
no special prefix needed.

"Mark Evans" <Mark_member@pathlink.com> wrote in message news:bg96tv$21cf$1@digitaldaemon.com...
> Walter wrote,
>
> >1) It sticks to the C character set.
> >2) No problems with different fonts.
> >3) Establishes a precedent for new types of special strings.
> >4) Easy to tokenize.
> >5) There's precedent experience with it in other languages, such as
Python.
>
> 6) Permits qualifiers such as n (null), hN (length header of size N
bytes),
> and pN (pad to next Nth byte).  These fine-tuning controls could become important without C's single-quote 'abcd' construct.  Here are some C language translations.
>
> D proposed              ANSI C
>
> r"string"          --> 'string'
> rn"string"         --> 'string\0'
> rh2"string"        --> '\0\6string'
> rh4"string"        --> '\0\0\0\6string'
> rh7"string"        --> '\0\0\0\0\0\0\6string'
> rh4n"string"       --> '\0\0\0\6string\0'
> rp4"string"        --> 'string\0\0'
> rnp4"string"       --> 'string\0\0'
>
> Python also has u for Unicode, which I would simply copy like r.
>
> Maybe going over the top here, I suggest that all of these have command line default settings so that the meaning of r can be set once and forgotten.  The - symbol could be used to override in the source code for special cases.
>
> rn- means turn off null even if defaulted 'on'
> rh- means turn off header
> rp- means turn off padding
> rn-h-p- means turn off all
>
> The b and x strings would address a serious need in embedded work and
could
> benefit from the header and padding qualifiers.
>
> Strings should concatenate by simple juxtaposition.  That behavior enables embedded comments:
>
> // comment
> myVar = x"ABCD 0000"
> // another comment
> x"FFFF BCDA"
> // a final comment
> ;
>
> means
>
> myVar = x"ABCD0000FFFFBCDA";
>
> Word alignment issues would be decided after concatenation, not before.
Strings
> concatenate bit by bit.
>
> Mark
>
>

Forums