December 20, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Hauke Duden | Hauke Duden wrote:
> Another solution would be if there was some way to write global conversion functions that are called to do implicit conversions between different types. Such functions could also be useful in many other circumstances, so that might be an idea to think about.
Just to clarify: I meant this in the context of creating a string interface instance from a string constant, not to convert between different string objects (which wouldn't make much sense).
E.g.
interface string
{
...
}
class MyString implements string
{
...
}
void print(string msg)
{
...
}
Without an implicit conversion we'd have to write:
print(new MyString("Hello World"));
With an implicit conversion that'd look like this:
string opConvert(char[] s)
{
return new MyString(s);
}
print("Hello World");
[The last line would translate to print(opConvert("Hello World")) ]
Hauke
|
December 20, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Rupert Millard | The problem with the operater* or operator~ syntax is it is ambiguous. It's also not greppable. "Rupert Millard" <rupertamillard@hotmail.DELETE.THIS.com> wrote in message news:brvr60$2il5$1@digitaldaemon.com... > I agree with you, but we just have to grin and bear it, unless / until Walter changes his mind. I suppose I could have commented my code better though. Hopefully as I become more experienced, I will be a better judge of > these things. > > "Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:brvlj9$29qh$1@digitaldaemon.com... > > Cool beans! Thanks, Rupert! > > > > This brings up a point. The main reason that I do not like opAssign/opAdd > > syntax for operator overloading is that it is not self-documenting that opSlice corresponds to a[x..y] or that opAdd corresponds to a + b or that > > opCatAssign corresponds to a ~= b. This information either has to be present in a comment or you have to go look it up. Yeah, D gurus will > have > > it memorized, but I'd rather there be just one "name" for the function, > and > > it should be the same both in the definition and at the point of call. > > > > Sean > > > > "Rupert Millard" <rupertamillard@hotmail.DELETE.THIS.com> wrote in message > > news:brvghd$21n8$2@digitaldaemon.com... > > > There has been a lot of talk about doing things, but very little has actually happened. Consequently, I have made a string interface and two > > > rough and ready string classes for UTF-8 and UTF-32, which are attached > to > > > this message. > > > > > > Currently they only do a few things, one of which is to provide a > > consistent > > > interface for character manipulation. The UTF-8 class also provides > direct > > > access to the bytes for when the user can do things more efficiently > with > > > these. They can also be appended to each other. In addition, each > provides > > a > > > constructor taking the other one as a parameter. > > > > > > Please bear in mind that I am only an amateur programmer, who knows very > > > little about Unicode and has no experience of programming in the real > > world. > > > Nevertheless, I can appreciate some of the issues here and I hope that > > these > > > classes can be the foundation of something more useful. > > > > > > From, > > > > > > Rupert > > > > > > |
December 20, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | If you say it's ambiguous, I'll take your word for it and if you think being greppable is important, I'm also happy to accept that. My personal opinions are not all that strong - it's only a minor inconvenience to have to check the overload function names. More importantly, what do you think of my request for more opSlice overloads? From, Rupert "Walter" <walter@digitalmars.com> wrote in message news:bs08b8$527$2@digitaldaemon.com... > The problem with the operater* or operator~ syntax is it is ambiguous. It's > also not greppable. > > "Rupert Millard" <rupertamillard@hotmail.DELETE.THIS.com> wrote in message news:brvr60$2il5$1@digitaldaemon.com... > > I agree with you, but we just have to grin and bear it, unless / until Walter changes his mind. I suppose I could have commented my code better though. Hopefully as I become more experienced, I will be a better judge > of > > these things. > > > > "Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:brvlj9$29qh$1@digitaldaemon.com... > > > Cool beans! Thanks, Rupert! > > > > > > This brings up a point. The main reason that I do not like > opAssign/opAdd > > > syntax for operator overloading is that it is not self-documenting that > > > opSlice corresponds to a[x..y] or that opAdd corresponds to a + b or > that > > > opCatAssign corresponds to a ~= b. This information either has to be present in a comment or you have to go look it up. Yeah, D gurus will > > have > > > it memorized, but I'd rather there be just one "name" for the function, > > and > > > it should be the same both in the definition and at the point of call. > > > > > > Sean |
December 20, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | On Thu, 18 Dec 2003 16:05:47 -0800, "Walter" <walter@digitalmars.com> wrote:
>
> "Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:brssrg$135p$1@digitaldaemon.com...
> > So you're saying that char[] means UTF-8, and wchar[] means UTF-16, and dchar[] means UTF-32?
>
> Yes. Exactly.
>
> > Unfortunately then a char won't hold a single Unicode character,
>
> Correct. But a dchar will.
>
A char is defined as a UTF-8 character but does not have enough storage to hold one!?
ubute[4] declares storage for 4 ubytes btytes, but char[4]
The D manual derscribes a char as being a UTF-8 char AND being 8-bits ?
Can't a single UTF-8 character require multiple bytes for representation?
A datatype is some storage and a set of operations that can be done on that storage. In what way are char and ubyte different datatypes?
An array of a datatype is an indexable set of elements of that type. (Isn't it?)
Given
char foo[4];
does foo[2] not represent the third char in foo !!??
I would think that the datatype char would be a UTF-8 character, with no indication of the amount of storage it used. The compiler would be free to represent it internally however it chose. Indexing should work (perhaps inefficiently)
D's datatypes seem to be of two different varieties; names for units of memory
and names for abstract types. Some (ubyte) describe a fixed amount af physical
storage, while others ( ifloat?) describe an abstract datatype whose physical structure
is hidden (or at least irrelevant)
Which is char?
Karl Bochert
|
December 20, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | It would be greppable if it were required that there be no space between the operator and the symbol. (if you use regexp you can get around this) There should be some other way to embed the symbol into the identifier, if it's causing too many lexer problems. Sean "Walter" <walter@digitalmars.com> wrote in message news:bs08b8$527$2@digitaldaemon.com... > The problem with the operater* or operator~ syntax is it is ambiguous. It's > also not greppable. |
December 21, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Karl Bochert | Den Sat, 20 Dec 2003 19:33:59 +0000 skrev Karl Bochert:
> D's datatypes seem to be of two different varieties; names for units of memory
> and names for abstract types. Some (ubyte) describe a fixed amount af physical
> storage, while others ( ifloat?) describe an abstract datatype whose physical
> structure
> is hidden (or at least irrelevant)
> Which is char?
It's a fixed memory type. Look at it as an ubyte, but with some special guarantees (upheld by convention).
By your own question you have pointed out that the name "char" is not very good. But I really should stop pointing this out, or I'll be banned before I even get started with providing any actual value to the project. :-)
Regards
Elias MÃ¥rtenson
|
December 21, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Rupert Millard | "Rupert Millard" <rupertamillard@hotmail.DELETE.THIS.com> wrote in message news:bs1d9b$2033$1@digitaldaemon.com... > More importantly, what do you think of my request for more opSlice overloads? I haven't got that far yet! |
December 21, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Karl Bochert | "Karl Bochert" <kbochert@copper.net> wrote in message news:1103_1071948839@bose... > A char is defined as a UTF-8 character but does not have enough storage to hold one!? Right. > The D manual derscribes a char as being a UTF-8 char AND being 8-bits ? Yes. > Can't a single UTF-8 character require multiple bytes for representation? No. > A datatype is some storage and a set of operations that can be done on that storage. > In what way are char and ubyte different datatypes? Only how they are overloaded, and how string literals are handled. > An array of a datatype is an indexable set of elements of that type. (Isn't it?) > Given > char foo[4]; > > does foo[2] not represent the third char in foo !!?? If it makes more sense, it is the third byte in foo. > I would think that the datatype char would be a UTF-8 character, with no indication of > the amount of storage it used. The compiler would be free to represent it internally > however it chose. Indexing should work (perhaps inefficiently) That would be a higher level view of it, and I suggest a wrapper class around it can provide this. > D's datatypes seem to be of two different varieties; names for units of memory > and names for abstract types. Some (ubyte) describe a fixed amount af physical > storage, while others ( ifloat?) describe an abstract datatype whose physical structure > is hidden (or at least irrelevant) > Which is char? char is a fixed 8 bits of storage. |
December 21, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <walter@digitalmars.com> wrote in message news:bs3pmm$2m0v$2@digitaldaemon.com... > > "Karl Bochert" <kbochert@copper.net> wrote in message news:1103_1071948839@bose... > > A char is defined as a UTF-8 character but does not have enough storage to > hold one!? > > Right. > > > The D manual derscribes a char as being a UTF-8 char AND being 8-bits ? > > Yes. > > > Can't a single UTF-8 character require multiple bytes for representation? > > No. ??? A unicode character can result in up to 6 bytes used, when encoded with UTF-8. Which is what the poster meant to ask, I think. Roald |
December 21, 2003 Re: Unicode discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | > > I would think that the datatype char would be a UTF-8 character, with no > indication of > > the amount of storage it used. The compiler would be free to represent it > internally > > however it chose. Indexing should work (perhaps inefficiently) > > That would be a higher level view of it, and I suggest a wrapper class around it can provide this. On Friday 19th, I posted a class that provides this functionality to this thread. You can see the message here: http://www.digitalmars.com/drn-bin/wwwnews?D/20619 As for the attached file - it does not appear to be accessible to users of the webservice, so I have placed it on the wiki at: http://www.wikiservice.at/wiki4d/wiki.cgi?StringClasses Rupert |
Copyright © 1999-2021 by the D Language Foundation