Unicode character module (unichar) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Unicode character module (unichar)

Thread overview

Unicode character module (unichar)

Jun 04, 2004

Jun 04, 2004

Jun 04, 2004

Jun 04, 2004

Jun 05, 2004

Jun 05, 2004

Jun 05, 2004

Jun 05, 2004

Re: Shortcuts for Aliases
Jun 05, 2004 hellcatv
Jun 06, 2004 Walter

Jun 05, 2004

Jun 06, 2004

Carlos Santander B.

More information for conflicting symbols (Re: Unicode character module (unichar))
Jun 07, 2004 J C Calvarese

Jun 05, 2004

Jun 06, 2004

Jun 05, 2004

June 04, 2004

Unicode character module (unichar)

Posted by Hauke Duden

Hauke Duden

Attachments:

unichar.zip

As promised in another thread, here's the unichar module that I've written. It provides basic Unicode character property functions (like charIsDigit, charToLower, etc).

It is documented in doxygen style and the compiled docs are included in the zip file.

Let me know what you think!


Hauke

June 04, 2004

Re: Unicode character module (unichar)

Posted by Walter
in reply to Hauke Duden

Walter

Posted in reply to Hauke Duden

"Hauke Duden" <H.NS.Duden@gmx.net> wrote in message news:c9qlqe$1me1$1@digitaldaemon.com...
> As promised in another thread, here's the unichar module that I've written. It provides basic Unicode character property functions (like charIsDigit, charToLower, etc).
>
> It is documented in doxygen style and the compiled docs are included in the zip file.
>
> Let me know what you think!

Great! Some quick comments:

o Can the enum be changed to enum CHARCATEGORY, and then replace CHARCATEGORY_LETTER, etc., to CHARCATEGORY.LETTER?

o change inout in the foreach in charToTitle to nothing.

o "Descimal" should be "Decimal"

o no need for 'char' prefix on functions, the module name should suffice.

The 2Mb ram at runtime is a little costly, so I think it should remain a separate package from std.ctype.

June 04, 2004

Re: Unicode character module (unichar)

Posted by Hauke Duden
in reply to Walter

Hauke Duden

Posted in reply to Walter

Walter wrote:
>>Let me know what you think!
> 
> 
> Great! Some quick comments:
> 
> o Can the enum be changed to enum CHARCATEGORY, and then replace
> CHARCATEGORY_LETTER, etc., to CHARCATEGORY.LETTER?

Yes. I guess that's just a C++-ism I got used to.

> o change inout in the foreach in charToTitle to nothing.

Hmmm. I don't know that much about the inner workings of foreach but won't that create a copy of the referenced element?

> o "Descimal" should be "Decimal"

Whoops ;).

> o no need for 'char' prefix on functions, the module name should suffice.

As I said in another post, I'm reluctant to change this. Mostly because I want the functions to look different from the ctype ones but also because of D's overloading issue.

> The 2Mb ram at runtime is a little costly, so I think it should remain a
> separate package from std.ctype.

I agree.

Hauke

June 04, 2004

Re: Unicode character module (unichar)

Posted by Walter
in reply to Hauke Duden

Walter

Posted in reply to Hauke Duden

"Hauke Duden" <H.NS.Duden@gmx.net> wrote in message news:c9r0ri$26gi$1@digitaldaemon.com...
> > o change inout in the foreach in charToTitle to nothing.
> Hmmm. I don't know that much about the inner workings of foreach but won't that create a copy of the referenced element?

Yes.

> > o no need for 'char' prefix on functions, the module name should
suffice.
>
> As I said in another post, I'm reluctant to change this. Mostly because I want the functions to look different from the ctype ones but also because of D's overloading issue.

I just don't understand what the D overloading issue is. D has much tighter control over overloading than C++ has, overloads from one module aren't going to be mistaken for another one if both are imported. One reason for the package/module system in D is to pitch the C-ism of decorating names with a pseudo-package name into the ash heap of history <g>.

June 05, 2004

Re: Unicode character module (unichar)

Posted by Hauke Duden
in reply to Walter

Hauke Duden

Posted in reply to Walter

Walter wrote:
 >>>o no need for 'char' prefix on functions, the module name should
> 
> suffice.
> 
>>As I said in another post, I'm reluctant to change this. Mostly because
>>I want the functions to look different from the ctype ones but also
>>because of D's overloading issue.
> 
> 
> I just don't understand what the D overloading issue is. D has much tighter
> control over overloading than C++ has, overloads from one module aren't
> going to be mistaken for another one if both are imported. One reason for
> the package/module system in D is to pitch the C-ism of decorating names
> with a pseudo-package name into the ash heap of history <g>.

Here's an example of what I mean:

module unichar:
bool isSeparator(dchar chr);

module funkyMenu:
bool isSeparator(MenuItem item);

module myApp:

import unichar;
import funkyMenu;

void foo(MenuItem item)
{
	if(isSeparator(item))
		....
}

This will cause a compiler error because D stops looking for more overloads as soon as it finds unichar.isSeparator and never finds funkyMenu.isSeparator. And to make matters worse, the error message will not even tell you that there is some kind of conflict. No, the compiler will tell you that there is no isSeparator(MenuItem) even though there most certainly is.

In C++ there'd be no such problem because the call is not actually ambiguous! It is perfectly clear that the MenuItem version is the one that should be called.

That's what I mean and that's the reason why I don't want to define any global functions with names that may also occur in other contexts. Otherwise there may be weird effects for the library's user like working code failing to compile once another import is added - even though there is no ambiguity.

Hauke

June 05, 2004

Re: Unicode character module (unichar)

Posted by Walter
in reply to Hauke Duden

Walter

Posted in reply to Hauke Duden

"Hauke Duden" <H.NS.Duden@gmx.net> wrote in message news:c9t8un$2jfc$1@digitaldaemon.com...
> > I just don't understand what the D overloading issue is. D has much
tighter
> > control over overloading than C++ has, overloads from one module aren't going to be mistaken for another one if both are imported. One reason
for
> > the package/module system in D is to pitch the C-ism of decorating names with a pseudo-package name into the ash heap of history <g>.
>
> Here's an example of what I mean:
>
> module unichar:
> bool isSeparator(dchar chr);
>
> module funkyMenu:
> bool isSeparator(MenuItem item);
>
>
> module myApp:
>
> import unichar;
> import funkyMenu;
>
>
> void foo(MenuItem item)
> {
> if(isSeparator(item))
> ....
> }
>
>
> This will cause a compiler error because D stops looking for more overloads as soon as it finds unichar.isSeparator and never finds funkyMenu.isSeparator.

No, that isn't what happens. What happens is that isSeparator appears in multiple modules, and the compiler doesn't know which one to use, so issues an error. You'll find the same error if you use a dchar argument for isSeparator.

Next, overloading does NOT happen across modules. Overloading happens AFTER the symbol lookup. Only functions in the same scope are overloadable.

> And to make matters worse, the error message will
> not even tell you that there is some kind of conflict. No, the compiler
> will tell you that there is no isSeparator(MenuItem) even though there
> most certainly is.

The error message I get is:
unichar.d(2): function isSeparator conflicts with funkyMenu.isSeparator at
funkyMenu.d(3)

> In C++ there'd be no such problem because the call is not actually ambiguous! It is perfectly clear that the MenuItem version is the one that should be called.
>
> That's what I mean and that's the reason why I don't want to define any global functions with names that may also occur in other contexts. Otherwise there may be weird effects for the library's user like working code failing to compile once another import is added - even though there is no ambiguity.

There is an ambiguity, and the compiler issues an error for it. The reason it behaves this way is to avoid the C++ global namespace pollution problem, where two completely unrelated functions in two unrelated source files happen to have the same name, and inadvertantly overload against each other causing some very strange errors. This doesn't happen in D, if you want two names in different modules to overload against each other, a specific action is required to make it happen (an alias declaration). It will NOT happen by default. You'll get the "conflicts" error above.

Next, instead of the C++ 'fix' for this problem by adding a pseudo-package
name to each global symbol, in D you can just use the module name for it,
i.e.:
    unichar.isSeparator()
    funkyMenu.isSeparator()

which is better than the C++ unichar_isSeparator(), isn't it?

June 05, 2004

Re: Unicode character module (unichar)

Posted by Sean Kelly
in reply to Hauke Duden

Sean Kelly

Posted in reply to Hauke Duden

Hauke Duden wrote:
>
> In C++ there'd be no such problem because the call is not actually ambiguous! It is perfectly clear that the MenuItem version is the one that should be called.
> 
> That's what I mean and that's the reason why I don't want to define any global functions with names that may also occur in other contexts. Otherwise there may be weird effects for the library's user like working code failing to compile once another import is added - even though there is no ambiguity.

I prefer to think of modules as C++ namespaces.  And in C++ I rarely import symbols with a "using" declaration, but rather fully qualify them: std::cout, etc.  So why not the same thing here?  unichar.toLower, etc.  Or come up with a shorter module name if that one is too long.

One thing I haven't tried... is it possible to import a package and still be required to provide module names when referring to symbols stored within each module in that package?  That would be ideal.

Sean

June 05, 2004

Re: Unicode character module (unichar)

Posted by Hauke Duden
in reply to Walter

Hauke Duden

Posted in reply to Walter

I have updated the unichar module incorporating (most of) Walter's suggestions and also written a utype module as a drop-in replacement for ctype.

Available here:

http://www.hazardarea.com/unichar.zip

Hauke

June 05, 2004

Re: Unicode character module (unichar)

Posted by Hauke Duden
in reply to Walter

Hauke Duden

Posted in reply to Walter

Walter wrote:
> Next, overloading does NOT happen across modules. Overloading happens AFTER
> the symbol lookup. Only functions in the same scope are overloadable.
> 
> 
>>And to make matters worse, the error message will
>>not even tell you that there is some kind of conflict. No, the compiler
>>will tell you that there is no isSeparator(MenuItem) even though there
>>most certainly is.
> 
> 
> The error message I get is:
> unichar.d(2): function isSeparator conflicts with funkyMenu.isSeparator at
> funkyMenu.d(3)

My apologies. I now get the same error.

I distictly remember getting a much more misleading error when I experimented with overloads some time ago, though. Was there a related compiler error in earlier DMD versions?

> There is an ambiguity, and the compiler issues an error for it. The reason
> it behaves this way is to avoid the C++ global namespace pollution problem,
> where two completely unrelated functions in two unrelated source files
> happen to have the same name, and inadvertantly overload against each other
> causing some very strange errors.

What kind of strange errors are these? It seems to me that overloads with different argument types are unproblematic. You can sometimes have ambiguous calls, for example, if one function takes the base class type the other function's parameter but that'd simply cause a compiler error. Nothing that I would call "strange".

Hauke

June 05, 2004

Re: Unicode character module (unichar)

Posted by Walter
in reply to Hauke Duden

Walter

Posted in reply to Hauke Duden

"Hauke Duden" <H.NS.Duden@gmx.net> wrote in message news:c9thhq$2urf$1@digitaldaemon.com...
> I distictly remember getting a much more misleading error when I experimented with overloads some time ago, though. Was there a related compiler error in earlier DMD versions?

That's possible. I don't remember.

> > There is an ambiguity, and the compiler issues an error for it. The
reason
> > it behaves this way is to avoid the C++ global namespace pollution
problem,
> > where two completely unrelated functions in two unrelated source files happen to have the same name, and inadvertantly overload against each
other
> > causing some very strange errors.
> What kind of strange errors are these? It seems to me that overloads with different argument types are unproblematic. You can sometimes have ambiguous calls, for example, if one function takes the base class type the other function's parameter but that'd simply cause a compiler error. Nothing that I would call "strange".

Suppose, in file 'a.h', you have:
    void output(int);
    void output(long);
which sends its argument to stdout. You download 'b.h' off the net, which
has:
    void output(char);
buried in it somewhere which writes its argument out to the serial port.
Now,
    #include "a.h"
    output('c');
and all is fine. Now,
   #include "a.h"
    #include "b.h"
    output('c');
and your program breaks at runtime, possibly in invisible ways.

In D, this would break in an obvious manner at compile time. Much more reliable.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation