Thread overview | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
April 11, 2008 [Issue 1985] New: import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=1985 Summary: import expression should return ubyte[] not string Product: D Version: 1.028 Platform: PC OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: DMD AssignedTo: bugzilla@digitalmars.com ReportedBy: wbaxter@gmail.com The compiler does not convert the encoding of the imported files to utf8, so it should not pretend that it knows the contents of the file are utf8. In fact, probably one of the most practically useful applications of the import expression is to import binary files, which it is impossible for the compiler to put in utf8 format. So the conclusion is that import("foo.dat") should evaluate to ubyte[], not char[]. It can be cast to char[] if the developer happens to know that the file is, in fact, text. Currently the situation is reversed -- the data must be cast to ubyte[] if the developer knows it is not, in fact, utf8 text. -- |
April 11, 2008 Re: [Issue 1985] New: import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | On 11/04/2008, d-bugmail@puremagic.com <d-bugmail@puremagic.com> wrote:
>
> import expression should return ubyte[] not string
Shouldn't it return invariant(ubyte)[] ?
|
April 11, 2008 Re: [Issue 1985] New: import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Janice Caron wrote:
> On 11/04/2008, d-bugmail@puremagic.com <d-bugmail@puremagic.com> wrote:
>> import expression should return ubyte[] not string
>
> Shouldn't it return invariant(ubyte)[] ?
In D2 probably so, but I filed the bug against D1.028.
--bb
|
April 11, 2008 Re: [Issue 1985] New: import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Baxter | On 11/04/2008, Bill Baxter <dnewsgroup@billbaxter.com> wrote:
> > Shouldn't it return invariant(ubyte)[] ?
>
> In D2 probably so, but I filed the bug against D1.028.
Makes sense.
I suppose in D2 it depends on the answer to the following question. If I write
auto a = import("filename");
auto b = import("filename");
do we get two separate copies, or do a and b both point to the same memory? If the latter, it should definitely be invariant(ubyte)[] in D2, though as you say, ubyte[] in D1.
|
April 11, 2008 [Issue 1985] import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=1985 ------- Comment #3 from aarti@interia.pl 2008-04-11 03:24 ------- Yes, I also think that implying that imported file is char[] is not best decision. Maybe even imported type should be void[], so that it must be explicitly casted to proper type. -- |
April 11, 2008 Re: [Issue 1985] import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | On 11/04/2008, d-bugmail@puremagic.com <d-bugmail@puremagic.com> wrote:
> Maybe even imported type should be void[], so that it must be explicitly casted
> to proper type.
No, it should be ubyte. The reason is that void arrays can contain pointers, and ubyte arrays can't. A void array means that the garbage collector has to scan it, looking for anything that looks like it might be an address, and if it finds such a collection of bits by accident, then something will be marked as "in use", that actually isn't.
If the array came from a file, it can't very well have meaningful pointers into RAM, so I agree with the original poster that it should be ubyte[] for D1, and I would add invariant(ubyte)[] for D2.
|
April 11, 2008 [Issue 1985] import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=1985 ------- Comment #5 from aarti@interia.pl 2008-04-11 06:16 ------- Answer to comment #4 This isn't (direct) argument against my proposal. It would be if GC scanning void[] is fundamentally "good thing". But in fact I don't think that in this case it is such kind of design decision. (There were already proposals to change this behavior.) But in case of reading external files, the problem is that compiler just *don't know* format of imported file. So IMHO the best thing to do is to reflect this situation in language, and force user to cast content of file to real type. In case there will stay default cast to some type in import it is really difficult to justify which default behavior is better. I agree with Bill that in most GUI application it would be better to have imported array of bytes. But you can also think about DB framework in which external file is used to define schema of database (see: U++ framework written in C++). In this case some text format is much more natural... -- |
April 11, 2008 [Issue 1985] import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=1985 ------- Comment #6 from wbaxter@gmail.com 2008-04-11 07:36 ------- (In reply to comment #5) > Answer to comment #4 > > This isn't (direct) argument against my proposal. It would be if GC scanning void[] is fundamentally "good thing". But in fact I don't think that in this case it is such kind of design decision. (There were already proposals to change this behavior.) > > But in case of reading external files, the problem is that compiler just *don't know* format of imported file. So IMHO the best thing to do is to reflect this situation in language, and force user to cast content of file to real type. > > In case there will stay default cast to some type in import it is really difficult to justify which default behavior is better. I agree with Bill that in most GUI application it would be better to have imported array of bytes. But you can also think about DB framework in which external file is used to define schema of database (see: U++ framework written in C++). In this case some text format is much more natural... > Your argument is right on, but ubyte[] *is* the type that means "I don't know what the heck this data is, but it's just data, not pointers". void[] means "I don't know what the heck this data is, it might even be full of pointers". I don't see any way that file that is on disk at *compile time* could contain pointers that are relevant to the program later on at *run time*. So ubyte[] is the proper type. -- |
April 11, 2008 Re: [Issue 1985] import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | > Your argument is right on, but ubyte[] *is* the type that means "I don't know
> what the heck this data is, but it's just data, not pointers". void[] means "I
> don't know what the heck this data is, it might even be full of pointers". I don't see any way that file that is on disk at *compile time* could contain
> pointers that are relevant to the program later on at *run time*. So ubyte[]
> is the proper type.
I think I understand your way of thinking. But the problem here is different. With void[] you can not do anything, so you have to cast. With ubyte[] you can use data as they are, because they have in fact specific type. But what if file contains array of int's? In such a case you are in exactly same situation as with char[]. Compiler chooses one type to which it casts by default, exactly like currently with char[]. And this choice can be wrong. If it is good or wrong depends only on application. In GUI ubyte[] is more appropriate (e.g. importing icon), but in DB framework string is better (importing db schema).
Having written this all I slowly get to conclusion that current situation is not so bad :-) char[] would be usually much more appropriate for lower layers in application than ubyte[] (mostly for loading and compiling domain languages). Alternative with void[] is IMHO theoretically better, but practically you will have even more casts in you code... Tough decision... :-)
|
April 11, 2008 Re: [Issue 1985] import expression should return ubyte[] not string | ||||
---|---|---|---|---|
| ||||
Posted in reply to Aarti_pl | Aarti_pl schrieb:
>> Your argument is right on, but ubyte[] *is* the type that means "I don't know
>> what the heck this data is, but it's just data, not pointers". void[] means "I
>> don't know what the heck this data is, it might even be full of pointers". I don't see any way that file that is on disk at *compile time* could contain
>> pointers that are relevant to the program later on at *run time*. So ubyte[]
>> is the proper type.
>
> I think I understand your way of thinking. But the problem here is different. With void[] you can not do anything, so you have to cast. With ubyte[] you can use data as they are, because they have in fact specific type. But what if file contains array of int's? In such a case you are in exactly same situation as with char[]. Compiler chooses one type to which it casts by default, exactly like currently with char[]. And this choice can be wrong. If it is good or wrong depends only on application. In GUI ubyte[] is more appropriate (e.g. importing icon), but in DB framework string is better (importing db schema).
>
> Having written this all I slowly get to conclusion that current situation is not so bad :-) char[] would be usually much more appropriate for lower layers in application than ubyte[] (mostly for loading and compiling domain languages). Alternative with void[] is IMHO theoretically better, but practically you will have even more casts in you code... Tough decision... :-)
foreach( c; import("data")){
// do something
}
- cannot work with void[]
- will always work in a reasonable way with ubyte[]
- might throw UnicodeException with char[]
|
Copyright © 1999-2021 by the D Language Foundation