Stream and File understanding ~ literal content (page 20)

On Tue, 22 Nov 2005 15:01:11 -0800, Kris <fu@bar.com> wrote: > "Regan Heath" <regan@netwin.co.nz> wrote... >> On Mon, 21 Nov 2005 17:35:26 -0800, Kris <fu@bar.com> wrote: >>> The minor concern I have with >>> this aspect is that the literal content does not play a role, whereas it >>> does with char literals (such as '?', '\x0001', and '\X00000001'). >> >> But that makes sense, right? Character literals i.e. '\X00000001' will >> only _fit_ in certain types, the same is not true for string literals >> which will always _fit_ in all 3 even if the way they end up being >> represented is not exactly what you've typed (or is that the problem?) >> >> If this were to change would it make this an error: >> >> foo(wchar[] foo) {} >> foo("\U00000040"); >> >>> No big >>> deal there, although perhaps it's food for another topic? >> >> Here seems like as good a place as any. > > > Oh, that minor concern was in regard to consistency here also. I realise that. I'm just trying to explore whether they _should_ behave the same, or not, are they both apples or are they apples and oranges. I agree things should behave consistently, provided it makes sense for them to do so. > I have no quibble with the character type being implied by content I didn't think you did. My example above is a string literal, not a character literal. If the string literal type was implied by content would my example above be an error? "\U00000040" is a dchar (sized) character in a string literal. "abc \U00000040 def" could be used also. foo requires a wchar. if the type of the literal is taken to be dchar, based on contents then it does not match wchar and you need the 'w' suffix or similar to resolve it. It seems the real question is, what did the programmer intend? Did they intend for the character to be represented exactly as they typed it? In this case, if it was passed exactly as written it would become 2 wchar code units, did they want that? Or, did they simply want the equivalent character in the resulting encoding. I think the latter is more likely. The former can create illegal UTF sequences. What do you think? The facts: > 1) The type for literal chars is implied by their content ('?', '\u0001', > '\U00000001') > > 2) The type of a numeric literal is implied by the content (0xFF, > 0xFFFFFFFF, 1.234) > > 3) The type for literal strings is not influenced at all by the content. #1 and #2 make sense to me, some char/int literals do not fit in the smaller types. #3 has no such problem. The question remains, why should #3 be consistent with #1 and #2, is it similar enough? or is it in fact different? > Further; both #2 & #3 have suffixes to cement the type, but #1 does not (as far as I'm aware). I'm not aware of any either. Regan

"Regan Heath" <regan@netwin.co.nz> wrote > On Tue, 22 Nov 2005 15:01:11 -0800, Kris <fu@bar.com> wrote: >> "Regan Heath" <regan@netwin.co.nz> wrote... >>> On Mon, 21 Nov 2005 17:35:26 -0800, Kris <fu@bar.com> wrote: >>>> The minor concern I have with >>>> this aspect is that the literal content does not play a role, whereas >>>> it >>>> does with char literals (such as '?', '\x0001', and '\X00000001'). >>> >>> But that makes sense, right? Character literals i.e. '\X00000001' will only _fit_ in certain types, the same is not true for string literals which will always _fit_ in all 3 even if the way they end up being represented is not exactly what you've typed (or is that the problem?) >>> >>> If this were to change would it make this an error: >>> >>> foo(wchar[] foo) {} >>> foo("\U00000040"); >>> >>>> No big >>>> deal there, although perhaps it's food for another topic? >>> >>> Here seems like as good a place as any. >> >> >> Oh, that minor concern was in regard to consistency here also. > > I realise that. I'm just trying to explore whether they _should_ behave the same, or not, are they both apples or are they apples and oranges. I agree things should behave consistently, provided it makes sense for them to do so. > >> I have no quibble with the character type being implied by content > > I didn't think you did. My example above is a string literal, not a character literal. If the string literal type was implied by content would my example above be an error? To clarify: I'm already making the assumption that the compiler changes to eliminate the uncommited aspect of argument literals. That presupposes the "default" type will be char[] (like auto literals). This is a further, and probably minor, question as to whether it might be useful (and consistent) that "default" type be implied by the literal content. Suffix 'typing' and compile-time transcoding are still present and able. I'm not at all sure it would be terribly useful, given that the literal will potentially be transcoded at compile-time anyway. [snip] > I think the latter is more likely. The former can create illegal UTF sequences. > > What do you think? I think I'd be perfectly content once argument-literals lose their uncommited status, and thus behave like auto literals <g>

November 24, 2005

Re: Stream and File understanding ~ literal content

Posted by Regan Heath
in reply to Kris

Permalink

Regan Heath

Posted in reply to Kris

Permalink

On Wed, 23 Nov 2005 13:58:20 -0800, Kris <fu@bar.com> wrote:
> "Regan Heath" <regan@netwin.co.nz> wrote
>> On Tue, 22 Nov 2005 15:01:11 -0800, Kris <fu@bar.com> wrote:
>>> "Regan Heath" <regan@netwin.co.nz> wrote...
>>>> On Mon, 21 Nov 2005 17:35:26 -0800, Kris <fu@bar.com> wrote:
>>>>> The minor concern I have with
>>>>> this aspect is that the literal content does not play a role, whereas
>>>>> it
>>>>> does with char literals (such as '?', '\x0001', and '\X00000001').
>>>>
>>>> But that makes sense, right? Character literals i.e. '\X00000001' will
>>>> only _fit_ in certain types, the same is not true for string literals
>>>> which will always _fit_ in all 3 even if the way they end up being
>>>> represented is not exactly what you've typed (or is that the problem?)
>>>>
>>>> If this were to change would it make this an error:
>>>>
>>>> foo(wchar[] foo) {}
>>>> foo("\U00000040");
>>>>
>>>>> No big
>>>>> deal there, although perhaps it's food for another topic?
>>>>
>>>> Here seems like as good a place as any.
>>>
>>>
>>> Oh, that minor concern was in regard to consistency here also.
>>
>> I realise that. I'm just trying to explore whether they _should_ behave
>> the same, or not, are they both apples or are they apples and oranges. I
>> agree things should behave consistently, provided it makes sense for them
>> to do so.
>>
>>> I have no quibble with the character type being implied by content
>>
>> I didn't think you did. My example above is a string literal, not a
>> character literal. If the string literal type was implied by content would
>> my example above be an error?
>
> To clarify: I'm already making the assumption that the compiler changes to eliminate the uncommited aspect of argument literals. That presupposes the "default" type will be char[] (like auto literals).

Same.

> This is a further, and probably minor, question as to whether it might be
> useful (and consistent) that "default" type be implied by the literal
> content.

Yes, that is what I thought we were doing, questioning whether it would be useful. My current feeling is that it's not, but we'll see...

> Suffix 'typing' and compile-time transcoding are still present and able.

Yep.

> I'm not at all sure it would be terribly useful, given that the
> literal will potentially be transcoded at compile-time anyway.

Like in my first example:

foo(wchar[] foo) {}
foo("\U00000040");

the string containing the dchar content would in fact be transcoded to wchar at compile time to match the one available overload.

So, when wouldn't it be transcoded at compile time? All I can think of is "auto", eg.

auto test = "abc \U00000040 def";

So, if this is the only case where the string contents make a difference I would call that inconsistent, and would instead opt for using the string literal suffix to specify an encoding where required, eg.

auto test = "abc \U00000040 def"d;

Then the statement "all string literals default to char[] unless a the required encoding can be determined at compile time" would be true.

Regan

"Regan Heath" <regan@netwin.co.nz> wrote in message news [snip] > Then the statement "all string literals default to char[] unless a the required encoding can be determined at compile time" would be true. That would be great. Now, will this truly come to pass? <g>

Forums