March 30, 2003
Mark Evans <Mark_member@pathlink.com> writes:
> Unicode with its variable-byte-length encodings makes disentangling strings from arrays more urgent still.  D should promote strings to primitive type status with dedicated constructs.  Icon had it right a long time ago.  D is struggling with Unicode because the confused C model is not amenable to Unicode.  As a primitive type, Unicode string complexities would vanish under the hood.  I'm not holding my breath, but if you ask me, that is how to do strings right.  (Those in love with C strings could still declare arrays of char.)

One thing D needs for working Unicode strings is a decent foreach (or iterator) construct, which I suppose is "coming soon". Doing C-like

for (int i = 0; i < string.length; ++i)
  process(string[i]);

is out of the question if the internal representation is, for example, UTF-8.

Implementing String as a class would not be totally impossible either, at least if assignment operator could be overloaded.

Of course, the best would probably to have a string concept built into the language.  As of now, the array type seems to have gathered lot of the functionality that would in normal circumstances be part of the string class (how often do you concatenate other arrays than strings, for example?)

> http://www.toolsofcomputing.com/IconHandbook/ http://unicon.sourceforge.net/index.html

While Icon is said to be adept at string processing, it's unfortunate that it doesn't support Unicode either:

---
B3. Is there a Unicode version of Icon?

No. Icon is defined in terms of 8-bit characters, and changing this presents several design challenges that would likely break existing programs.
---

-Antti
March 30, 2003
>While Icon is said to be adept at string processing, it's unfortunate
>that it doesn't support Unicode either:
>-Antti

Icon is recognized worldwide as the king of string processing languages.  Its development ceased before Unicode came into favor.  Incidentally, if you know of any language that supports native Unicode strings I am all ears.

One of the Unicon testimonials has it right: "Other languages have minimal data structures. Most of our programming is in C. Quite often, I need a list of objects. In C, it is (as you know) a royal pain to declare a structure with a pointer to itself, and malloc them, free them, and walk the chain. Why can't a language just have a 'list' datatype, and be done with it? Why can't a language provide the constructs we all need, instead of providing nearly-assembly- language constructs and letting us develop the rest ourselves?"

Mark


March 30, 2003
Depends what you mean by Unicode?

Java and C# (and Verifiable Balderdash) all use Unicode UCS-16 as their
native type.

I'm not aware of any programming language - XML is not a programming language, all you soap suds! - that works with UTF-8 (or 7). Maybe that's what you meant?

"Mark Evans" <Mark_member@pathlink.com> wrote in message news:b65see$qsr$1@digitaldaemon.com...
>
> >While Icon is said to be adept at string processing, it's unfortunate
> >that it doesn't support Unicode either:
> >-Antti
>
> Icon is recognized worldwide as the king of string processing languages.
Its
> development ceased before Unicode came into favor.  Incidentally, if you
know of
> any language that supports native Unicode strings I am all ears.
>
> One of the Unicon testimonials has it right: "Other languages have minimal
data
> structures. Most of our programming is in C. Quite often, I need a list of objects. In C, it is (as you know) a royal pain to declare a structure
with a
> pointer to itself, and malloc them, free them, and walk the chain. Why
can't a
> language just have a 'list' datatype, and be done with it? Why can't a
language
> provide the constructs we all need, instead of providing nearly-assembly- language constructs and letting us develop the rest ourselves?"
>
> Mark
>
>


March 30, 2003
Matthew Wilson says...
>Java and C# ... all use Unicode ... as their native type.

And some languages have seen Unicode retrofits. http://www.reportlab.com/i18n/python_unicode_tutorial.html http://rf.net/~james/perli18n.html#Q4

To clarify the remark, I was considering languages that offer Unicode strings as primitives (not merely characters) and are fast string processors (in the C speed range).  Maybe C# fits the bill.  Python and Java are not 'fast' and Java's String is not a primitive anyway.

Other languages aside, the point is that D needs a Unicode string primitive.

Mark

cter
March 30, 2003
Mark Evans <Mark_member@pathlink.com> wrote in news:b60d0o$2sf7$1@digitaldaemon.com:

> Farmer says...
>>In D, there are many ways to express concepts, e.g. functions, classes, D-structs, templates. I believe that non-zero based arrays are not really required to express concepts in away that suitable for the problems, programmers have to solve.
> 
> Functional languages owe much of their fabulous productivity to array (list) handling capabilities.  An array can hold virtually anything, not just numbers. It can have more than one dimension to associate objects on different axes en masse.  C++ folks unfamiliar with such paradigms know little of what they're missing, so I understand these counteroffers, but there is no substitute.  The ability to pick apart, rearrange, index, map across, thread, and otherwise sling arrays around - and morph them into new forms - is a truly expressive and compact way to write tons of code with performance results comparable to C and even better, depending on your C programmer and his available time for optimizing nested inner loops and chasing down off-by-one errors.
> 
> Mark
> 

You are right. I don't know about functional programming, but I know that
doing complex work with arrays is a pain in C++, C#, Java or D (not to
mention C or Pascal).
D arrays really shine when used for system level programming tasks, like
getting memory from the GC, copying a memory block or working with a rather
fixed set of objects/values.
Non-zero based arrays for such tasks have few benefits, but pose the risk
of harder to maintain code: Some/May people will use different array-bases,
in the same language, for the same project, for similar concepts.

I think, that D arrays would better stay a low-level, implementation determined feature. Putting more features to them, would further increase the confusion about them. But (separate) features to the D language and/or Phobos, that enables programmers to work with arrays(lists) in a productive, safe and reasonably fast manner could be a worthwhile addition to D.


Farmer.
March 31, 2003
"Mark Evans" <Mark_member@pathlink.com> wrote in message news:b660it$tn4$1@digitaldaemon.com...
> Other languages aside, the point is that D needs a Unicode string
primitive.

It does already. In D, a char[] is really a utf-8 array.


March 31, 2003
Walter wrote:
> "Mark Evans" <Mark_member@pathlink.com> wrote in message
> news:b660it$tn4$1@digitaldaemon.com...
> 
>>Other languages aside, the point is that D needs a Unicode string
> 
> primitive.
> 
> It does already. In D, a char[] is really a utf-8 array.

Er, no...

    void main ()
    {
        char[] foo;

        foo = "\uFF00";
    }

"cannot implicitly convert wchar[1] to char[]".  Putting in an explicit cast results in a foo with length 1 and value 0.

March 31, 2003
Ok, I'll fix it. -Walter

"Burton Radons" <loth@users.sourceforge.net> wrote in message news:b6a7vb$voq$1@digitaldaemon.com...
> Walter wrote:
> > "Mark Evans" <Mark_member@pathlink.com> wrote in message news:b660it$tn4$1@digitaldaemon.com...
> >
> >>Other languages aside, the point is that D needs a Unicode string
> >
> > primitive.
> >
> > It does already. In D, a char[] is really a utf-8 array.
>
> Er, no...
>
>      void main ()
>      {
>          char[] foo;
>
>          foo = "\uFF00";
>      }
>
> "cannot implicitly convert wchar[1] to char[]".  Putting in an explicit cast results in a foo with length 1 and value 0.
>


1 2 3 4 5 6
Next ›   Last »