November 17, 2006
Marcin Kuszczak wrote:
> I think that one thing which is missed in phobos right now is string class
> which encapsulates utf-8/utf-16/utf-32 handling and issues connected with
> utf-8 strings e.g.:
> 
>         char[] foo = "hög";
>         assert(foo.length == 3); // Sorry UTF-8, this is == 4
>         assert(foo[1] == 'ö');   // Not a chance!

My Win version doesn't even start to parse the source file if an ASCII >127 character is present even if it is in a comment!

If I create an a.d (ö is #f6):
void main()
 {
  //hög
 }

And then run:
$ dmd a

The result is:
a.d(3): invalid UTF-8 sequence

I think that would be nice to somehow tell the parser which format is the source file. It could be a command line parameter
-encoding:ANSI|UTF-8|etc.
or the first line of the file should contain that like
//!Encoding: ANSI

Regards,
Nahon
November 17, 2006
Aarti_pl wrote:

> Making size of char equal to 4 bytes, and having string class, which can optimize different encodings would allow to get rid of all magic... :-)

I think that using dchar and the proposed "dstring" struct* would work ?
(not that I think it becomes less magic if you hide it behind a curtain)

We need D "char" to be a 1-byte type for all future, or this won't work:
extern (C) int printf(char *, ...);

--anders

* http://www.dprogramming.com/dstring.php
November 17, 2006
Craig Black wrote:
> Very good.  I never tried it but for some reason always thought it could not be done.  Will Linux be getting this capability say, within a year or so?

Yes.

> Also, is there a utility in phobos to load a DLL at run-time?

Since D can call C functions, you can do it exactly the same way you do it in C for Windows, calling the same functions.
November 17, 2006
"Walter Bright" <newshound@digitalmars.com> wrote in message news:ejl0st$2rd9$1@digitaldaemon.com...
> Craig Black wrote:
>> Very good.  I never tried it but for some reason always thought it could not be done.  Will Linux be getting this capability say, within a year or so?
>
> Yes.

Outstanding!

>> Also, is there a utility in phobos to load a DLL at run-time?
>
> Since D can call C functions, you can do it exactly the same way you do it in C for Windows, calling the same functions.

This is acceptable.  However, it would be nice if there was a high level cross-platform way to do it.  For example, Qt provides the QLibrary class. It provides the same interface for both Windows and Linux..

-Craig


November 17, 2006
Nahon wrote:
> Marcin Kuszczak wrote:
> 
>> I think that one thing which is missed in phobos right now is string class
>> which encapsulates utf-8/utf-16/utf-32 handling and issues connected with
>> utf-8 strings e.g.:
>>
>>         char[] foo = "hög";
>>         assert(foo.length == 3); // Sorry UTF-8, this is == 4
>>         assert(foo[1] == 'ö');   // Not a chance!
> 
> 
> My Win version doesn't even start to parse the source file if an ASCII  >127 character is present even if it is in a comment!
> 
> If I create an a.d (ö is #f6):
> void main()
>  {
>   //hög
>  }
> 
> And then run:
> $ dmd a
> 
> The result is:
> a.d(3): invalid UTF-8 sequence
> 
> I think that would be nice to somehow tell the parser which format is the source file. It could be a command line parameter
> -encoding:ANSI|UTF-8|etc.
> or the first line of the file should contain that like
> //!Encoding: ANSI
> 
> Regards,
> Nahon


http://www.digitalmars.com/d/lex.html
look for BOM

However, I don't known how to put in a BOM.
November 17, 2006

Craig Black wrote:
> ...
>>> Also, is there a utility in phobos to load a DLL at run-time?
>> Since D can call C functions, you can do it exactly the same way you do it in C for Windows, calling the same functions.
> 
> This is acceptable.  However, it would be nice if there was a high level cross-platform way to do it.  For example, Qt provides the QLibrary class. It provides the same interface for both Windows and Linux..
> 
> -Craig

I believe there used to be one, but was taken out due to license concerns.  Perhaps it's time to re-write it?

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
November 17, 2006

BCS wrote:
> ...
> However, I don't known how to put in a BOM.

You can use Notepad to do it.  Yes, the crappy old Notepad that comes with Windows.  When you go File -> Save As, make sure to set the encoding as appropriate.

I'm still very annoyed that Notepad has better Unicode support than GVim
>_<.

<2¢>

Also, I think this whole discussion is highlighting a misunderstanding on how strings work in D.  Some people seem to be looking at D's string support and thinking "Oh, it looks just like a scripting language, so <X> should work the same; what the?!  It doesn't?!  Must be broken!" They don't seem to understand *why* we have char, wchar and dchar.  I think it's time we had an article either in a D manual (do we even, strictly speaking, HAVE a manual for D?[1]) or somewhere on the website so we can say:

  "No, it's not broken; it's just different.  Go here and all shall
become clear."

</2¢>

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
November 17, 2006
BCS wrote:
> However, I don't known how to put in a BOM.

One way is with Notepad - "Save As" and pick UTF-8.
November 18, 2006
Daniel Keep wrote:
> Also, I think this whole discussion is highlighting a misunderstanding
> on how strings work in D.  Some people seem to be looking at D's string
> support and thinking "Oh, it looks just like a scripting language, so
> <X> should work the same; what the?!  It doesn't?!  Must be broken!"
> They don't seem to understand *why* we have char, wchar and dchar.  I
> think it's time we had an article either in a D manual (do we even,
> strictly speaking, HAVE a manual for D?[1]) or somewhere on the website
> so we can say:
> 
>   "No, it's not broken; it's just different.  Go here and all shall
> become clear."

Yes, I think you're right. Once one has a good handle on what UTF-8 is (and UTF-16 and UCS-4), it all makes sense. D provides several different ways of looking at characters (and strings) and none of them are quite like C++ (which essentially has no support for international characters) or like scripting languages (which hide all the details, making them inefficient).

I've thought more than once about writing an article about it, but got distracted by other things.
November 18, 2006
Walter Bright wrote:
> Daniel Keep wrote:
>> Also, I think this whole discussion is highlighting a misunderstanding
>> on how strings work in D.  Some people seem to be looking at D's string
>> support and thinking "Oh, it looks just like a scripting language, so
>> <X> should work the same; what the?!  It doesn't?!  Must be broken!"
>> They don't seem to understand *why* we have char, wchar and dchar.  I
>> think it's time we had an article either in a D manual (do we even,
>> strictly speaking, HAVE a manual for D?[1]) or somewhere on the website
>> so we can say:
>>
>>   "No, it's not broken; it's just different.  Go here and all shall
>> become clear."
> 
> Yes, I think you're right. Once one has a good handle on what UTF-8 is (and UTF-16 and UCS-4), it all makes sense. D provides several different ways of looking at characters (and strings) and none of them are quite like C++ (which essentially has no support for international characters) or like scripting languages (which hide all the details, making them inefficient).

> I've thought more than once about writing an article about it, but got distracted by other things.

I would like to try to use dchar[] as my standard string type, however it doesn't seem to be supported as well by the compiler and library as char[] is.  For instance std.string has basically nothing for dchar[]s.

And there doesn't seem to be a dchar string literal syntax.  At least I couldn't find it.  The section on StringLiterals linked to from the expressions page is non-existant.

--bb