July 18, 2010 Re: pu$�le | ||||
---|---|---|---|---|
| ||||
Posted in reply to strtr | On Saturday 17 July 2010 23:01:28 strtr wrote:
> >
> > Cheated? I thought that you were trying to figure out why the code wasn't
> > d oing
> > what you expected it to be doing. So, of course I ran it.
> > Though, it's more likely that I have an x86 emulator in my brain which
> > can run
> > dmd than that I have a D emulator in my brain if I figured this out in my
> > h ead,
> > since I gave you the exact error message that dmd does.
> > - Jonathan M Davis
>
> I don't find it more likely that you have a x86 emulator in your brain
> which then ran dmd to compile some code.
> I might even think that almost impossible ;P
> If you knew the compiler well enough you might be capable of giving that
> error message with only the extra knowledge of where your files recite and
> version and OS infos.
Well, since both are pretty much impossible, I think that it's a moot point. I can believe that someone would know the compiler well enough to know what it was going to do in most situations and that they would have some idea as to what the error message would be, but if you want them to be at all precise, that just takes too much detail for anyone to remember. If they could do that, they'd be an insanely good programmer.
- Jonathan M Davis
|
July 18, 2010 Re: pu$�le | ||||
---|---|---|---|---|
| ||||
Posted in reply to strtr | On Saturday 17 July 2010 22:52:21 strtr wrote:
>
> I think I'll start subject tagging my posts: [D1/D2]
> std.stdio in D1 doesn't mention a write function and feeding the writef
> function an illegal UTF string will result in a UTF exception.
> With this information, what do you think the output should be?
Well, I certainly think that throwing an exception for bad UTF-8 values makes sense, though D2's docs for writef say nothing about exceptions, and on my machine, running Linux, they just fail to print anything. Throwing an exception would likely have been better.
In any case, I would have expected it to increment stash by 2 on the first loop because $ would be valid and would hit both scope(exit) and scope(success). After that... That continue makes me awfulling nervous. You'd expect the scope statements to be run in reverse order with continue and then stash--. However, to run that continue statement would have to skip the other scope statements...
I think that we'll have to lower the body of that foreach loop to have any clue what's going on here. It should come out to something like this, I would think:
const char[] coins = `$�`;
void main()
{
writef(`I made `);
int stash = 0;
scope(exit) writefln(stash,`.`);
scope(failure) stash--;
foreach(coin;coins)
{
try
{
try
{
try
{
try
{
writef(coin);
}
catch
{
continue;
throw;
}
}
catch
{
stash--;
throw;
}
stash++;
}
catch
{
throw;
}
}
finally
{
stash++;
}
}
}
That being the case, the exception from writef() will always get eaten by the continue because the throw that rethrows the exception would never occur. Normally, code like that should result in a compilation error, but it might not given that it's the compiler creating the try-catch block. My guess is that this is a bug in dmd. It makes no sense to me to allow any kind of goto, break, or continue statements in a scope statement's body.
Regardless, that continue would mean that the first stash++ would be skipped, but the second would still happen because it's in a finally block. That means that each of the 3 bad UTF-8 values which make up the euro symbol would each increment stash once. So, the overall result would then be 5.
It's possible that I lowered those scope statements incorrectly, but it looks to me like that's what the code should be doing. Regardless, continue in a scope statement should be an error.
- Jonathan M Davis
|
July 18, 2010 Re: pu$�le | ||||
---|---|---|---|---|
| ||||
On Sunday 18 July 2010 00:38:38 Jonathan M Davis wrote:
> On Saturday 17 July 2010 22:52:21 strtr wrote:
> > I think I'll start subject tagging my posts: [D1/D2]
> > std.stdio in D1 doesn't mention a write function and feeding the writef
> > function an illegal UTF string will result in a UTF exception.
> > With this information, what do you think the output should be?
>
> Well, I certainly think that throwing an exception for bad UTF-8 values makes sense, though D2's docs for writef say nothing about exceptions, and on my machine, running Linux, they just fail to print anything. Throwing an exception would likely have been better.
>
> In any case, I would have expected it to increment stash by 2 on the first loop because $ would be valid and would hit both scope(exit) and scope(success). After that... That continue makes me awfulling nervous. You'd expect the scope statements to be run in reverse order with continue and then stash--. However, to run that continue statement would have to skip the other scope statements...
>
> I think that we'll have to lower the body of that foreach loop to have any clue what's going on here. It should come out to something like this, I would think:
>
> const char[] coins = `$�`;
>
> void main()
> {
> writef(`I made `);
> int stash = 0;
> scope(exit) writefln(stash,`.`);
> scope(failure) stash--;
>
> foreach(coin;coins)
> {
> try
> {
> try
> {
> try
> {
> try
> {
> writef(coin);
> }
> catch
> {
> continue;
> throw;
> }
> }
> catch
> {
> stash--;
> throw;
> }
>
> stash++;
> }
> catch
> {
> throw;
> }
> }
> finally
> {
> stash++;
> }
> }
> }
>
>
> That being the case, the exception from writef() will always get eaten by the continue because the throw that rethrows the exception would never occur. Normally, code like that should result in a compilation error, but it might not given that it's the compiler creating the try-catch block. My guess is that this is a bug in dmd. It makes no sense to me to allow any kind of goto, break, or continue statements in a scope statement's body.
>
> Regardless, that continue would mean that the first stash++ would be skipped, but the second would still happen because it's in a finally block. That means that each of the 3 bad UTF-8 values which make up the euro symbol would each increment stash once. So, the overall result would then be 5.
>
> It's possible that I lowered those scope statements incorrectly, but it looks to me like that's what the code should be doing. Regardless, continue in a scope statement should be an error.
>
> - Jonathan M Davis
Hmm. Well, it seems that throw by itself is not legal D. You have to do something like
catch(Exception e)
{
throw e;
}
But in any case,
catch(Exception e)
{
continue;
throw e;
}
compiles just fine. That seems to me like it shouldn't though, since then throw e; is an unreachable statement. In any case, I'll file a bug report on this.
- Jonathan M Davis
|
July 18, 2010 Re: pu$�le | ||||
---|---|---|---|---|
| ||||
On Sunday 18 July 2010 00:46:36 Jonathan M Davis wrote: > I'll file a bug report > > - Jonathan M Davis Wait. That's not the problem. Or at least, that's not the problem that needs to be reported. The problem is that we're not compiling with -w. If you compile with -w, then statements such as scope(failure) continue; won't compile due to being unreachable statements. But if you compile with -w, then the compiler flags it as an error, and the program fails to compile. So, I filed a bug report on the fact that such warnins aren't reported without -w (though they would still compile since they're warnings rather than errors): http://d.puremagic.com/issues/show_bug.cgi?id=4482 Regardless, what you're trying to do is clearly an error, and compiling with -w will show that. - Jonathan M Davis |
July 18, 2010 Re: pu$�le | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | Jonathan M Davis:
> You should pretty much never deal with each individual char or wchar in a string or wstring. Do the conversion to dchar or dstring if you want to access individual characters. You can also use std.utf.stride() to iterate over to the next code unit which starts a code point, but you're still going to have to make sure that you convert it to a dchar to process it properly. Otherwise, only ASCII characters will work right (since they fit in a single code unit). Fortunately, foreach takes care of all this for is if we specify the element type as dchar.
I am starting to think that for safety the foreach on a string has to yield dchars on default, and to yield chars only on request:
foreach(c; "hello") => dchars
foreach(char c; "hello") => chars
Bye,
bearophile
|
July 18, 2010 Re: pu$�le | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Sunday 18 July 2010 04:13:03 bearophile wrote:
> Jonathan M Davis:
> > You should pretty much never deal with each individual char or wchar in a string or wstring. Do the conversion to dchar or dstring if you want to access individual characters. You can also use std.utf.stride() to iterate over to the next code unit which starts a code point, but you're still going to have to make sure that you convert it to a dchar to process it properly. Otherwise, only ASCII characters will work right (since they fit in a single code unit). Fortunately, foreach takes care of all this for is if we specify the element type as dchar.
>
> I am starting to think that for safety the foreach on a string has to yield
> dchars on default, and to yield chars only on request: foreach(c; "hello")
> => dchars
> foreach(char c; "hello") => chars
>
> Bye,
> bearophile
That's probably a good idea, though for people to write safe string code in the general case, they're really going to have to understand the differences between char, wchar, and dchar as well as what that means for their code. It's just way too easy to shoot yourself in the foot once you start trying to manipulate single characters, and I don't think that there's really a way to fix that unless you forced dchar for everything, which definitely isn' t the D way to do things (though IIRC, that's essentially what Java did). Still, this particular case might be better off defaulting to dchar since dchar is already handled specially in foreach anyhow. My only real problem with that is the fact that while dchar is handled specially, it's done with a conversion, and making foreach over a string default to dchar instead of char breaks how foreach works normally. It seems to me more like a warning would be a better idea. If they really want char, they can specify char, but the warning would warn them so that they'd be aware of the issue and specify the correct type (be it char or dchar or whatever) rather than leaving it blank. That way, foreach retains its normal semantics, and the problem is still averted.
- Jonathan M Davis
|
July 18, 2010 Re: pu$�le | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | == Quote from Jonathan M Davis (jmdavisprog@gmail.com)'s article
> On Sunday 18 July 2010 04:13:03 bearophile wrote:
> > Jonathan M Davis:
> > > You should pretty much never deal with each individual char or wchar in a string or wstring. Do the conversion to dchar or dstring if you want to access individual characters. You can also use std.utf.stride() to iterate over to the next code unit which starts a code point, but you're still going to have to make sure that you convert it to a dchar to process it properly. Otherwise, only ASCII characters will work right (since they fit in a single code unit). Fortunately, foreach takes care of all this for is if we specify the element type as dchar.
> >
> > I am starting to think that for safety the foreach on a string has to yield
> > dchars on default, and to yield chars only on request: foreach(c; "hello")
> > => dchars
> > foreach(char c; "hello") => chars
> >
> > Bye,
> > bearophile
> That's probably a good idea, though for people to write safe string code in the
> general case, they're really going to have to understand the differences between
> char, wchar, and dchar as well as what that means for their code. It's just way
> too easy to shoot yourself in the foot once you start trying to manipulate
> single characters, and I don't think that there's really a way to fix that unless
> you forced dchar for everything, which definitely isn' t the D way to do things
> (though IIRC, that's essentially what Java did). Still, this particular case
> might be better off defaulting to dchar since dchar is already handled specially
> in foreach anyhow. My only real problem with that is the fact that while dchar
> is handled specially, it's done with a conversion, and making foreach over a
> string default to dchar instead of char breaks how foreach works normally. It
> seems to me more like a warning would be a better idea. If they really want
> char, they can specify char, but the warning would warn them so that they'd be
> aware of the issue and specify the correct type (be it char or dchar or
> whatever) rather than leaving it blank. That way, foreach retains its normal
> semantics, and the problem is still averted.
> - Jonathan M Davis
I agree with the warning. A good warning would get people to read up on UTF.
And if you really want to have char you'll need to cast:
foreach(cast(char)c; chars)
|
July 18, 2010 Re: pu$�le | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | == Quote from Jonathan M Davis (jmdavisprog@gmail.com)'s article
> On Sunday 18 July 2010 00:46:36 Jonathan M Davis wrote:
> > I'll file a bug report
> >
> > - Jonathan M Davis
> Wait. That's not the problem. Or at least, that's not the problem that needs to
> be reported. The problem is that we're not compiling with -w. If you compile
> with -w, then statements such as
> scope(failure) continue;
> won't compile due to being unreachable statements. But if you compile with -w,
> then the compiler flags it as an error, and the program fails to compile. So, I
> filed a bug report on the fact that such warnins aren't reported without -w
> (though they would still compile since they're warnings rather than errors):
> http://d.puremagic.com/issues/show_bug.cgi?id=4482
> Regardless, what you're trying to do is clearly an error, and compiling with -w
> will show that.
> - Jonathan M Davis
This should be upped to a error, as -w only shows it as unreachable(without a line
number:(.
I don't think unreachable code is an error. I often have unreachable code when
debugging.
case default:
assert(false);// temp
...
break;
|
July 18, 2010 Re: pu$�le | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | == Quote from Jonathan M Davis (jmdavisprog@gmail.com)'s article
> On Saturday 17 July 2010 23:01:28 strtr wrote:
> > >
> > > Cheated? I thought that you were trying to figure out why the code wasn't
> > > d oing
> > > what you expected it to be doing. So, of course I ran it.
> > > Though, it's more likely that I have an x86 emulator in my brain which
> > > can run
> > > dmd than that I have a D emulator in my brain if I figured this out in my
> > > h ead,
> > > since I gave you the exact error message that dmd does.
> > > - Jonathan M Davis
> >
> > I don't find it more likely that you have a x86 emulator in your brain
> > which then ran dmd to compile some code.
> > I might even think that almost impossible ;P
> > If you knew the compiler well enough you might be capable of giving that
> > error message with only the extra knowledge of where your files recite and
> > version and OS infos.
> Well, since both are pretty much impossible, I think that it's a moot point.
> I
> can believe that someone would know the compiler well enough to know what it was
> going to do in most situations and that they would have some idea as to what the
> error message would be, but if you want them to be at all precise, that just
> takes too much detail for anyone to remember. If they could do that, they'd be
> an insanely good programmer.
> - Jonathan M Davis
The error only needed to be good enough for me to believe it to be generated by a
linux compiler ;)
I can probably give you satisfying errors for my program. Sure, it is only a
fraction of dmd but then again, I'm only a mediocre programmer.
|
July 18, 2010 Re: pu$ᅵle | ||||
---|---|---|---|---|
| ||||
Posted in reply to strtr | On Sunday 18 July 2010 06:16:09 strtr wrote: > I agree with the warning. A good warning would get people to read up on > UTF. And if you really want to have char you'll need to cast: > foreach(cast(char)c; chars) Actually, the cast would be totally unnecessary. Putting foreach(char c; chars) would be enough. Forcing a cast would change how foreach normally works. I'm not even sure that you can legally put a cast there like that. What we'd want to disallow would be foreach(c; chars) As long as the programmer puts the element type, we can assume that they know what they're doing. But warning in cases where they don't put it would catch a large number of errors in iterating over strings and wstrings. In any case, I filed a bug report for it: http://d.puremagic.com/issues/show_bug.cgi?id=4483 |
Copyright © 1999-2021 by the D Language Foundation