View mode: basic / threaded / horizontal-split · Log in · Help
March 30, 2007
might be a bug in the DMD FrontEnd
i don't see what prevent the following from compiling:
import std.stdio;
char[] ctfe()
{
	wchar[] k=cast(wchar[])"int  
i;/*asdfasf"~cast(wchar[])("adf"~cast(char)100~cast(char)192~cast(char)250)~cast(wchar[])"dsafj*/  
int j;";
	char[] jimmy = cast(char[])(k[0..k.length-1]);
	int i;
	for(i=0;i<jimmy.length;)
	{
		if (jimmy[i]==0)
			jimmy=jimmy[0..i]~jimmy[i+1..jimmy.length];
		else
			i++;
	}
	return jimmy;
	
}
void main()
{
	mixin(ctfe);
	char[] k=ctfe;
        printf ("%s",k.ptr);
}

and by viewing the frontend, i think there might be a bug of slicing  
wchar[] in compile time.
ilwr, iupr ain't taken care of for wchar[] and dchar[] case

Regards,
David Leon
March 30, 2007
Re: might be a bug in the DMD FrontEnd
sorry i didn't read the frontend carefully enough. it handles the  
different cases.
but i still don't get why my little func couldn't be evaulated in compile  
time
March 30, 2007
Re: might be a bug in the DMD FrontEnd
err in constfold.c
func cat declare and assign Type t;
but t never used?
	t = es2->type;
	es->type = type;
either es->type = t; or an assertion might be preferred?
March 30, 2007
Re: might be a bug in the DMD FrontEnd
Davidl wrote:
> i don't see what prevent the following from compiling:

I can see a few things...

> import std.stdio;
> char[] ctfe()
> {
>     wchar[] k=cast(wchar[])"int

What's with the casting?
http://www.digitalmars.com/d/lex.html#StringLiteral -- see the part on
"Postfix" characters.

I'm fairly certain that cast(wchar[]) doesn't do what you *think* it's
doing.  cast(wchar[]) casts an array of chars into an array of wchars...
note that I *did not* say "converts" -- UTF-8 and UTF-16 are very
different encodings, so you can't just cast between them and expect it
to make any sense.

It would be like casting a double pointer to a ushort pointer --
meaningless.

Also, I can't work out why you would want to embed a comment in a mixin
string, but it's not especially problematic :P

> i;/*asdfasf"~cast(wchar[])("adf"~cast(char)100~cast(char)192~cast(char)250)~cast(wchar[])"dsafj*/

cast(char)192, whilst technically valid, is really nasty.  For starters,
'192' isn't a valid character by itself in UTF-8, which means it can't
be printed.  Not to mention the potential byte-order problems.  We have
Unicode escape sequences for a reason.  Again, see the section on string
literals, but basically, we have "\x12" for ASCII characters, "\u1234"
for wide characters, and "\U12345678" for really wide characters.

And, again, that cast doesn't make any sense.

> int j;";
>     char[] jimmy = cast(char[])(k[0..k.length-1]);

Dear lord, why?!  You just spent half your time casting it to a wchar
array, and now you're casting it back?!  char[] is perfectly capable of
storing Unicode text, if that's what you're worried about.

Also, you're cutting off the last character of the string, which means
you're losing that last ";", which means your mixin isn't valid, and
will cause compilation to fail.

>     int i;
>     for(i=0;i<jimmy.length;)
>     {
>         if (jimmy[i]==0)
>             jimmy=jimmy[0..i]~jimmy[i+1..jimmy.length];
>         else
>             i++;
>     }

The only reason I can come up with as to why you're doing the above is
because all that casting above generates a string with null characters
in it... which it wouldn't if you didn't use all the casting.

>     return jimmy;
>     
> }
> void main()
> {
>     mixin(ctfe);
>     char[] k=ctfe;
>         printf ("%s",k.ptr);

Please don't use printf, at least not without passing the string through
toStringz.  writefln works perfectly fine.  I mean, you even imported
std.stdio...

Ok, let's try rewriting this...

import std.stdio;

wchar[] ctfe()
{
   wchar[] k = "int
i;/*asdfasf"w~("adf"w~cast(wchar)'\x64'~cast(wchar)'\u1234')~"dsafj*/
int j;";
   return k;
}

void main()
{
   mixin(ctfe());
   wchar[] k = ctfe();
   writefln("%s", k);
}

The above works perfectly.  Heck, we could get rid of those cast(wchars)
by just using wchar strings: "\x64\u1234"w.

> }
> 
> and by viewing the frontend, i think there might be a bug of slicing
> wchar[] in compile time.
> ilwr, iupr ain't taken care of for wchar[] and dchar[] case
> 
> Regards,
> David Leon

It could well be there's a bug in the frontend.  But this is kind of
like setting your house on fire, and then pointing out there's a
burn-mark on the wall. :P

	-- Daniel

-- 
int getRandomNumber()
{
   return 4; // chosen by fair dice roll.
             // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 30, 2007
Re: might be a bug in the DMD FrontEnd
Daniel Keep wrote:
> Davidl wrote:
>> i don't see what prevent the following from compiling:
> 
> I can see a few things...

<...>

> cast(char)192, whilst technically valid, is really nasty.  For starters,
> '192' isn't a valid character by itself in UTF-8, which means it can't
> be printed.

<...>

> Please don't use printf, at least not without passing the string through
> toStringz.  writefln works perfectly fine.

Perhaps he's using printf because he wants to output the byte 192 without
getting an "Error: 4invalid UTF-8 sequence". I've found that, currently, in both
Phobos and Tango, the C library is the best way of outputting a character whilst
letting the user worry about whether he can see it properly in his locale or not.

The point about toStringz still stands, though.
March 30, 2007
Re: might be a bug in the DMD FrontEnd
Deewiant wrote:
> Daniel Keep wrote:
>> Davidl wrote:
>>> i don't see what prevent the following from compiling:
>> I can see a few things...
> 
> <...>
> 
>> cast(char)192, whilst technically valid, is really nasty.  For starters,
>> '192' isn't a valid character by itself in UTF-8, which means it can't
>> be printed.
> 
> <...>
> 
>> Please don't use printf, at least not without passing the string through
>> toStringz.  writefln works perfectly fine.
> 
> Perhaps he's using printf because he wants to output the byte 192 without
> getting an "Error: 4invalid UTF-8 sequence". I've found that, currently, in both
> Phobos and Tango, the C library is the best way of outputting a character whilst
> letting the user worry about whether he can see it properly in his locale or not.
> 
> The point about toStringz still stands, though.

True; I hadn't considered that.  The first thing I thought was that he
didn't know about Unicode literals, and was trying to manually encode
the character in UTF-16.

That said, I personally think that if you need to use printf because
writefln is barfing on your string, then that's a bug in your program.
char[] is UTF-8: if you're not storing UTF-8, you should be using
ubyte[], not char[].

Incidentally, since D source must be either ASCII or some variant of
UTF, cast(char)192 isn't a valid character *anyway*, unless it's part of
a multibyte code-point, at which point the argument for outputting it
literally falls apart since he's using it in a mixin :P

Also, I just realised that the "you can't cast arrays of chars around"
is something I should add to my text in D article...

	-- Daniel

-- 
int getRandomNumber()
{
   return 4; // chosen by fair dice roll.
             // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 30, 2007
Re: might be a bug in the DMD FrontEnd
Daniel Keep wrote:
> That said, I personally think that if you need to use printf because
> writefln is barfing on your string, then that's a bug in your program.
> char[] is UTF-8: if you're not storing UTF-8, you should be using
> ubyte[], not char[].

I agree. However, both Phobos and Tango use char[] for all their
string-processing functions _which also work on non-UTF-8_. This means that to
call such a function you need to do, for instance,
"std.string.strip(*cast(char[])iso-8859-1-string.ptr);" which gets ugly very
quickly.

I hoped that Tango would use ubyte[] in the C standard library, at least, but
no. I understand why not (standard; most people use only char[] and don't want
to do the cast from char[]* to ubyte[]* as above; good for ASCII anyway), and so
I don't complain, but it's still something I'd like.

Perhaps D needs a way to allow implicit conversion:

finally(ubyte[] is char[]) {
	char[] foo(ubyte[] myString) {
		return std.string.strip(myString.dup);
	}
}

<g>
March 30, 2007
Re: might be a bug in the DMD FrontEnd
Deewiant wrote:
> Daniel Keep wrote:
>> That said, I personally think that if you need to use printf because
>> writefln is barfing on your string, then that's a bug in your program.
>> char[] is UTF-8: if you're not storing UTF-8, you should be using
>> ubyte[], not char[].
> 
> I agree. However, both Phobos and Tango use char[] for all their
> string-processing functions _which also work on non-UTF-8_. This means that to
> call such a function you need to do, for instance,
> "std.string.strip(*cast(char[])iso-8859-1-string.ptr);" which gets ugly very
> quickly.
> 
> I hoped that Tango would use ubyte[] in the C standard library, at least, but
> no. I understand why not (standard; most people use only char[] and don't want
> to do the cast from char[]* to ubyte[]* as above; good for ASCII anyway), and so
> I don't complain, but it's still something I'd like.
> 
> Perhaps D needs a way to allow implicit conversion:
> 
> finally(ubyte[] is char[]) {
> 	char[] foo(ubyte[] myString) {
> 		return std.string.strip(myString.dup);
> 	}
> }
> 
> <g>

> foreach( dchar c ; some_string )
> {
>     // ...
> }

Would *not* work correctly with the above if your string contains
anything outside of the ASCII range.  Yes, the functions might work with
non-UTF-8 codepages, but that's more a side-effect of how they are
implemented.

I think what Phobos really needs is a character encoding conversion
library, even if it's just a paper-thin binding to iconv or something.

	-- Daniel

[1] I hope I've got the right term; I'm liable to get my head chewed off
if I'm wrong :P

-- 
int getRandomNumber()
{
   return 4; // chosen by fair dice roll.
             // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 30, 2007
Re: might be a bug in the DMD FrontEnd
Daniel Keep wrote:
>> foreach( dchar c ; some_string )
>> {
>>     // ...
>> }
> 
> Would *not* work correctly with the above if your string contains
> anything outside of the ASCII range.  Yes, the functions might work with
> non-UTF-8 codepages, but that's more a side-effect of how they are
> implemented.

True. But it's hard to implement (some of) them _without_ supporting non-UTF-8.
<g> And there's always the C standard library.

> I think what Phobos really needs is a character encoding conversion
> library, even if it's just a paper-thin binding to iconv or something.
> 

The problem is that you often aren't told the encoding, and have to work with
just bytes. You can guess (and I'm sure some pretty smart heuristics have been
developed for this), but it's not perfect, and you still need ASCII whitespace
stripping to work, regardless of the encoding.*

* Okay, so if the 0-127 range isn't ASCII, it won't work, but that's practically
nonexistent these days (at least on the platforms DMD supports).
March 30, 2007
Re: might be a bug in the DMD FrontEnd
> int getRandomNumber()
> {
>     return 4; // chosen by fair dice roll.
>               // guaranteed to be random.
> }

^^^ Priceless.  : D

That reminds of the ol' "find x" "here it is ----> x" picture going around.
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home