View mode: basic / threaded / horizontal-split · Log in · Help
June 04, 2005
Re: char, wchar and dchar should be supported equally
Derek Parnell wrote:

> The current Phobos routines are heavily biased to char[]. Also, the use of
> templates is not always the best solution because there are some
> optimizations available, depending on the UTF encoding format used.

Not that anyone cares, but templates also have severe problems
on other D platforms such as with the GDC compiler on Mac OS X...

It's getting better, but it's like "the early days of C++" or so.

--anders
June 04, 2005
Re: char, wchar and dchar should be supported equally
Hasan Aljudy wrote:

> I think that toString or any std function that takes a string and 
> processes it, should always take dchar and return dchar.

That's like saying that booleans should always be represented
with "int", and I'm afraid it won't fly around here since we're
obsessed with the size of variables more than processing time :-)

Conversion is a real problem, but at least you can do:
   char[] str; foreach(dchar c; str) { ... }
Plus some ASCII shortcuts, when the high bit isn't set.


Much more on http://prowiki.org/wiki4d/wiki.cgi?CharsAndStrs
(and several other pages on the Wiki4D, like Derek's RFE:
 "FeatureRequestList/ImplicitConversionBetweenUTF")

--anders

PS. You probably meant to say "dchar[]", and not dchar ?
June 04, 2005
Re: char, wchar and dchar should be supported equally
Anders F Björklund wrote:
> Hasan Aljudy wrote:
> 
>> I think that toString or any std function that takes a string and 
>> processes it, should always take dchar and return dchar.
> 
> 
> That's like saying that booleans should always be represented
> with "int", and I'm afraid it won't fly around here since we're
> obsessed with the size of variables more than processing time :-)
> 

No, it's not like representing booleans with ints .. it's actually like 
saying ints should always be represented by doubles.

booleans are not numbers, there is no reason to represent them as 
numbers, and no one should ever store numbers in booleans.

But char, wchar, and dchar are all characters, just with different 
storage space.

I don't really think anybody cares about size, most people who care 
would care most about performance (processing time).

imagine if all std functions used short instead of int ;) that could be 
a serious problem.

> Conversion is a real problem, but at least you can do:
>    char[] str; foreach(dchar c; str) { ... }
> Plus some ASCII shortcuts, when the high bit isn't set.
> 

I don't like having to read the unicode specs to be able to deal with 
simple things like char. Your "ASCII shortcuts" would be low-level stuff 
dealing with how char and dchar are represented in memory.

C'mon people, D is a high level language.
June 04, 2005
Re: char, wchar and dchar should be supported equally
It would be great to resolve this ongoing concern. However, you might
consider trying the ICU project for all your unicode needs ~ it's what Java
uses under the covers:
http://www-306.ibm.com/software/globalization/icu/index.jsp

There's a D interface available over here, along with a well-rounded String
class: http://dsource.org/forums/viewtopic.php?t=148

- Kris

"Hasan Aljudy" <hasan.aljudy@gmail.com> wrote in message
news:d7t8tc$b40$1@digitaldaemon.com...
> Anders F Björklund wrote:
> > Hasan Aljudy wrote:
> >
> >> I think that toString or any std function that takes a string and
> >> processes it, should always take dchar and return dchar.
> >
> >
> > That's like saying that booleans should always be represented
> > with "int", and I'm afraid it won't fly around here since we're
> > obsessed with the size of variables more than processing time :-)
> >
>
> No, it's not like representing booleans with ints .. it's actually like
> saying ints should always be represented by doubles.
>
> booleans are not numbers, there is no reason to represent them as
> numbers, and no one should ever store numbers in booleans.
>
> But char, wchar, and dchar are all characters, just with different
> storage space.
>
> I don't really think anybody cares about size, most people who care
> would care most about performance (processing time).
>
> imagine if all std functions used short instead of int ;) that could be
> a serious problem.
>
> > Conversion is a real problem, but at least you can do:
> >    char[] str; foreach(dchar c; str) { ... }
> > Plus some ASCII shortcuts, when the high bit isn't set.
> >
>
> I don't like having to read the unicode specs to be able to deal with
> simple things like char. Your "ASCII shortcuts" would be low-level stuff
> dealing with how char and dchar are represented in memory.
>
> C'mon people, D is a high level language.
June 04, 2005
Re: char, wchar and dchar should be supported equally
> I don't like having to read the unicode specs to be able to deal with  
> simple things like char. Your "ASCII shortcuts" would be low-level stuff  
> dealing with how char and dchar are represented in memory.
>
> C'mon people, D is a high level language.

Maybe there should be isascii(char) somewhere :)
Would be inlined and self documenting.
June 05, 2005
Re: char, wchar and dchar should be supported equally
Vathix wrote:

>> I don't like having to read the unicode specs to be able to deal with  
>> simple things like char. Your "ASCII shortcuts" would be low-level 
>> stuff  dealing with how char and dchar are represented in memory.
>>
>> C'mon people, D is a high level language.
> 
> Maybe there should be isascii(char) somewhere :)
> Would be inlined and self documenting.

I suggested that enhancement last year, but it wasn't popular...

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/2154

Or maybe it just got lost in this crippled "bug reporting system" ?

--anders
June 05, 2005
Re: char, wchar and dchar should be supported equally
On Sun, 05 Jun 2005 09:25:09 +0200, Anders F Björklund wrote:

> Vathix wrote:
> 
>>> I don't like having to read the unicode specs to be able to deal with  
>>> simple things like char. Your "ASCII shortcuts" would be low-level 
>>> stuff  dealing with how char and dchar are represented in memory.
>>>
>>> C'mon people, D is a high level language.
>> 
>> Maybe there should be isascii(char) somewhere :)
>> Would be inlined and self documenting.
> 
> I suggested that enhancement last year, but it wasn't popular...
> 
> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/2154
> 
> Or maybe it just got lost in this crippled "bug reporting system" ?

You mean like this ...
//---------------------------
//  --- isASCII --
// Returns true if the supplied argument is an ASCII character.
//
// Paramaters:
//      (1)   -- char -- The character to test.
//   (return) -- bool -- 'true' if the character is ASCII otherwise false.
//---------------------------
bool isASCII(char c)
out(result)
{
   assert(result == (UTF8stride[c] == 1));
}
body{
   return (cast(uint)c <= 127U ? true : false);
}
unittest
{
  assert(isASCII('a') == true);
  assert(isASCII('~') == true);
  assert(isASCII('\xFF') == false);
  assert(isASCII('\x80') == false);
  assert(isASCII('\x00') == true);
  assert(isASCII(cast(char) -1) == false);
}
//---------------------------



-- 
Derek Parnell
Melbourne, Australia
5/06/2005 7:13:16 PM
June 05, 2005
Re: char, wchar and dchar should be supported equally
Derek Parnell wrote:

> You mean like this ...
> //---------------------------
> //  --- isASCII --
> // Returns true if the supplied argument is an ASCII character.
> //
> // Paramaters:
> //      (1)   -- char -- The character to test.
> //   (return) -- bool -- 'true' if the character is ASCII otherwise false.
> //---------------------------

Is that the "Natural Docs" format ?

I think I prefer Doxygen, myself:
/// Is the supplied code unit an ASCII character ?
/// @param c    The UTF-8 code unit to test.
/// @return     'true' if the character is ASCII

> bool isASCII(char c)
> out(result)
> {
>     assert(result == (UTF8stride[c] == 1));
> }
> body{
>     return (cast(uint)c <= 127U ? true : false);
> }

But surely this workaround shouldn't be needed ?

If a "bool" function can't return a comparison,
then there's something severly broken somewhere...

--anders
June 05, 2005
Re: char, wchar and dchar should be supported equally
On Sun, 05 Jun 2005 12:09:47 +0200, Anders F Björklund wrote:

> Derek Parnell wrote:
> 
>> You mean like this ...
>> //---------------------------
>> //  --- isASCII --
>> // Returns true if the supplied argument is an ASCII character.
>> //
>> // Paramaters:
>> //      (1)   -- char -- The character to test.
>> //   (return) -- bool -- 'true' if the character is ASCII otherwise false.
>> //---------------------------
> 
> Is that the "Natural Docs" format ?
>
Dunno. What's that ? I just made this up on the spot.

> I think I prefer Doxygen, myself:
> /// Is the supplied code unit an ASCII character ?
> /// @param c    The UTF-8 code unit to test.
> /// @return     'true' if the character is ASCII

Good on ya.

>> bool isASCII(char c)
>> out(result)
>> {
>>     assert(result == (UTF8stride[c] == 1));
>> }
>> body{
>>     return (cast(uint)c <= 127U ? true : false);
>> }
> 
> But surely this workaround shouldn't be needed ?
> 
> If a "bool" function can't return a comparison,
> then there's something severly broken somewhere...

I make a distinction between the machine code that is generated by a
compiler and the source code that is read by a human.

Yes, the compiler is able to work out that a bool is returned from a
comparison, but by writing it out explicitly, we also get a clear and
unambiguous statement of intent by the coder. We get the same machine code
generated and now its also human readable too.

In other words, it is self-documenting and does not rely on the
sophistication of the compiler. 

-- 
Derek Parnell
Melbourne, Australia
5/06/2005 8:39:19 PM
June 05, 2005
Re: char, wchar and dchar should be supported equally
Derek Parnell wrote:

>>Is that the "Natural Docs" format ?
> 
> Dunno. What's that ? I just made this up on the spot.

http://www.naturaldocs.org/

Whatever style is used, it should be parsable ?

> Yes, the compiler is able to work out that a bool is returned from a
> comparison, but by writing it out explicitly, we also get a clear and
> unambiguous statement of intent by the coder. We get the same machine code
> generated and now its also human readable too.

Ah, OK, then it wasn't a compiler bug <phew>.
Just a matter of opinion on readability... :-)

Like: "a < b" versus "(a < b) ? true : false"

--anders
Next ›   Last »
1 2
Top | Discussion index | About this forum | D home