Jump to page: 1 2
Thread overview
might be a bug in the DMD FrontEnd
Mar 30, 2007
Davidl
Mar 30, 2007
Davidl
Mar 30, 2007
Davidl
Mar 30, 2007
Daniel Keep
Mar 30, 2007
Deewiant
Mar 30, 2007
Daniel Keep
Mar 30, 2007
Deewiant
Mar 30, 2007
Daniel Keep
Mar 30, 2007
Deewiant
Mar 30, 2007
Dan
Re: might be a bug in the DMD FrontEnd [OT]
Mar 30, 2007
Pragma
Apr 02, 2007
Don Clugston
March 30, 2007
i don't see what prevent the following from compiling:
import std.stdio;
char[] ctfe()
{
	wchar[] k=cast(wchar[])"int i;/*asdfasf"~cast(wchar[])("adf"~cast(char)100~cast(char)192~cast(char)250)~cast(wchar[])"dsafj*/ int j;";
	char[] jimmy = cast(char[])(k[0..k.length-1]);
	int i;
	for(i=0;i<jimmy.length;)
	{
		if (jimmy[i]==0)
			jimmy=jimmy[0..i]~jimmy[i+1..jimmy.length];
		else
			i++;
	}
	return jimmy;
	
}
void main()
{
	mixin(ctfe);
	char[] k=ctfe;
        printf ("%s",k.ptr);
}

and by viewing the frontend, i think there might be a bug of slicing wchar[] in compile time.
ilwr, iupr ain't taken care of for wchar[] and dchar[] case

Regards,
David Leon
March 30, 2007
sorry i didn't read the frontend carefully enough. it handles the different cases.
but i still don't get why my little func couldn't be evaulated in compile time
March 30, 2007
err in constfold.c
func cat declare and assign Type t;
but t never used?
	t = es2->type;
	es->type = type;
either es->type = t; or an assertion might be preferred?
March 30, 2007
Davidl wrote:
> i don't see what prevent the following from compiling:

I can see a few things...

> import std.stdio;
> char[] ctfe()
> {
>     wchar[] k=cast(wchar[])"int

What's with the casting? http://www.digitalmars.com/d/lex.html#StringLiteral -- see the part on "Postfix" characters.

I'm fairly certain that cast(wchar[]) doesn't do what you *think* it's doing.  cast(wchar[]) casts an array of chars into an array of wchars... note that I *did not* say "converts" -- UTF-8 and UTF-16 are very different encodings, so you can't just cast between them and expect it to make any sense.

It would be like casting a double pointer to a ushort pointer -- meaningless.

Also, I can't work out why you would want to embed a comment in a mixin string, but it's not especially problematic :P

> i;/*asdfasf"~cast(wchar[])("adf"~cast(char)100~cast(char)192~cast(char)250)~cast(wchar[])"dsafj*/

cast(char)192, whilst technically valid, is really nasty.  For starters, '192' isn't a valid character by itself in UTF-8, which means it can't be printed.  Not to mention the potential byte-order problems.  We have Unicode escape sequences for a reason.  Again, see the section on string literals, but basically, we have "\x12" for ASCII characters, "\u1234" for wide characters, and "\U12345678" for really wide characters.

And, again, that cast doesn't make any sense.

> int j;";
>     char[] jimmy = cast(char[])(k[0..k.length-1]);

Dear lord, why?!  You just spent half your time casting it to a wchar array, and now you're casting it back?!  char[] is perfectly capable of storing Unicode text, if that's what you're worried about.

Also, you're cutting off the last character of the string, which means you're losing that last ";", which means your mixin isn't valid, and will cause compilation to fail.

>     int i;
>     for(i=0;i<jimmy.length;)
>     {
>         if (jimmy[i]==0)
>             jimmy=jimmy[0..i]~jimmy[i+1..jimmy.length];
>         else
>             i++;
>     }

The only reason I can come up with as to why you're doing the above is because all that casting above generates a string with null characters in it... which it wouldn't if you didn't use all the casting.

>     return jimmy;
> 
> }
> void main()
> {
>     mixin(ctfe);
>     char[] k=ctfe;
>         printf ("%s",k.ptr);

Please don't use printf, at least not without passing the string through toStringz.  writefln works perfectly fine.  I mean, you even imported std.stdio...

Ok, let's try rewriting this...

import std.stdio;

wchar[] ctfe()
{
    wchar[] k = "int
i;/*asdfasf"w~("adf"w~cast(wchar)'\x64'~cast(wchar)'\u1234')~"dsafj*/
int j;";
    return k;
}

void main()
{
    mixin(ctfe());
    wchar[] k = ctfe();
    writefln("%s", k);
}

The above works perfectly.  Heck, we could get rid of those cast(wchars) by just using wchar strings: "\x64\u1234"w.

> }
> 
> and by viewing the frontend, i think there might be a bug of slicing
> wchar[] in compile time.
> ilwr, iupr ain't taken care of for wchar[] and dchar[] case
> 
> Regards,
> David Leon

It could well be there's a bug in the frontend.  But this is kind of like setting your house on fire, and then pointing out there's a burn-mark on the wall. :P

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 30, 2007
Daniel Keep wrote:
> Davidl wrote:
>> i don't see what prevent the following from compiling:
> 
> I can see a few things...

<...>

> cast(char)192, whilst technically valid, is really nasty.  For starters, '192' isn't a valid character by itself in UTF-8, which means it can't be printed.

<...>

> Please don't use printf, at least not without passing the string through toStringz.  writefln works perfectly fine.

Perhaps he's using printf because he wants to output the byte 192 without getting an "Error: 4invalid UTF-8 sequence". I've found that, currently, in both Phobos and Tango, the C library is the best way of outputting a character whilst letting the user worry about whether he can see it properly in his locale or not.

The point about toStringz still stands, though.
March 30, 2007

Deewiant wrote:
> Daniel Keep wrote:
>> Davidl wrote:
>>> i don't see what prevent the following from compiling:
>> I can see a few things...
> 
> <...>
> 
>> cast(char)192, whilst technically valid, is really nasty.  For starters, '192' isn't a valid character by itself in UTF-8, which means it can't be printed.
> 
> <...>
> 
>> Please don't use printf, at least not without passing the string through toStringz.  writefln works perfectly fine.
> 
> Perhaps he's using printf because he wants to output the byte 192 without getting an "Error: 4invalid UTF-8 sequence". I've found that, currently, in both Phobos and Tango, the C library is the best way of outputting a character whilst letting the user worry about whether he can see it properly in his locale or not.
> 
> The point about toStringz still stands, though.

True; I hadn't considered that.  The first thing I thought was that he didn't know about Unicode literals, and was trying to manually encode the character in UTF-16.

That said, I personally think that if you need to use printf because writefln is barfing on your string, then that's a bug in your program. char[] is UTF-8: if you're not storing UTF-8, you should be using ubyte[], not char[].

Incidentally, since D source must be either ASCII or some variant of UTF, cast(char)192 isn't a valid character *anyway*, unless it's part of a multibyte code-point, at which point the argument for outputting it literally falls apart since he's using it in a mixin :P

Also, I just realised that the "you can't cast arrays of chars around" is something I should add to my text in D article...

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 30, 2007
Daniel Keep wrote:
> That said, I personally think that if you need to use printf because writefln is barfing on your string, then that's a bug in your program. char[] is UTF-8: if you're not storing UTF-8, you should be using ubyte[], not char[].

I agree. However, both Phobos and Tango use char[] for all their string-processing functions _which also work on non-UTF-8_. This means that to call such a function you need to do, for instance, "std.string.strip(*cast(char[])iso-8859-1-string.ptr);" which gets ugly very quickly.

I hoped that Tango would use ubyte[] in the C standard library, at least, but no. I understand why not (standard; most people use only char[] and don't want to do the cast from char[]* to ubyte[]* as above; good for ASCII anyway), and so I don't complain, but it's still something I'd like.

Perhaps D needs a way to allow implicit conversion:

finally(ubyte[] is char[]) {
	char[] foo(ubyte[] myString) {
		return std.string.strip(myString.dup);
	}
}

<g>
March 30, 2007

Deewiant wrote:
> Daniel Keep wrote:
>> That said, I personally think that if you need to use printf because writefln is barfing on your string, then that's a bug in your program. char[] is UTF-8: if you're not storing UTF-8, you should be using ubyte[], not char[].
> 
> I agree. However, both Phobos and Tango use char[] for all their string-processing functions _which also work on non-UTF-8_. This means that to call such a function you need to do, for instance, "std.string.strip(*cast(char[])iso-8859-1-string.ptr);" which gets ugly very quickly.
> 
> I hoped that Tango would use ubyte[] in the C standard library, at least, but no. I understand why not (standard; most people use only char[] and don't want to do the cast from char[]* to ubyte[]* as above; good for ASCII anyway), and so I don't complain, but it's still something I'd like.
> 
> Perhaps D needs a way to allow implicit conversion:
> 
> finally(ubyte[] is char[]) {
> 	char[] foo(ubyte[] myString) {
> 		return std.string.strip(myString.dup);
> 	}
> }
> 
> <g>

> foreach( dchar c ; some_string )
> {
>     // ...
> }

Would *not* work correctly with the above if your string contains anything outside of the ASCII range.  Yes, the functions might work with non-UTF-8 codepages, but that's more a side-effect of how they are implemented.

I think what Phobos really needs is a character encoding conversion library, even if it's just a paper-thin binding to iconv or something.

	-- Daniel

[1] I hope I've got the right term; I'm liable to get my head chewed off if I'm wrong :P

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 30, 2007
Daniel Keep wrote:
>> foreach( dchar c ; some_string )
>> {
>>     // ...
>> }
> 
> Would *not* work correctly with the above if your string contains anything outside of the ASCII range.  Yes, the functions might work with non-UTF-8 codepages, but that's more a side-effect of how they are implemented.

True. But it's hard to implement (some of) them _without_ supporting non-UTF-8. <g> And there's always the C standard library.

> I think what Phobos really needs is a character encoding conversion library, even if it's just a paper-thin binding to iconv or something.
> 

The problem is that you often aren't told the encoding, and have to work with just bytes. You can guess (and I'm sure some pretty smart heuristics have been developed for this), but it's not perfect, and you still need ASCII whitespace stripping to work, regardless of the encoding.*

* Okay, so if the 0-127 range isn't ASCII, it won't work, but that's practically nonexistent these days (at least on the platforms DMD supports).
March 30, 2007
> int getRandomNumber()
> {
>     return 4; // chosen by fair dice roll.
>               // guaranteed to be random.
> }

^^^ Priceless.  : D

That reminds of the ol' "find x" "here it is ----> x" picture going around.
« First   ‹ Prev
1 2