Jump to page: 1 28  
Page
Thread overview
Java String vs wchar[] Was: Re: inner classes
May 31, 2005
Andrew Fedoniouk
May 31, 2005
Derek Parnell
May 31, 2005
Andrew Fedoniouk
May 31, 2005
Walter
May 31, 2005
Andrew Fedoniouk
May 31, 2005
Andrew Fedoniouk
May 31, 2005
Kris
May 31, 2005
Sean Kelly
May 31, 2005
Brad Beveridge
May 31, 2005
Andrew Fedoniouk
May 31, 2005
Sean Kelly
May 31, 2005
Andrew Fedoniouk
Jun 01, 2005
Walter
Jun 01, 2005
Derek Parnell
Jun 01, 2005
Ben Hinkle
Jun 01, 2005
kris
Jun 06, 2005
Andrew Fedoniouk
Jun 06, 2005
Ben Hinkle
Jun 06, 2005
Andrew Fedoniouk
Jun 06, 2005
Ben Hinkle
Jun 06, 2005
Andrew Fedoniouk
Jun 06, 2005
Andrew Fedoniouk
May 31, 2005
U.Baumanis
May 31, 2005
U.Baumanis
May 31, 2005
U.Baumanis
May 31, 2005
Eugene Pelekhay
Jun 01, 2005
Walter
Jun 01, 2005
Andrew Fedoniouk
Jun 02, 2005
Kramer
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Thomas Kuehne
Jun 02, 2005
kris
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Thomas Kuehne
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Brad Beveridge
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Sean Kelly
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Sean Kelly
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Regan Heath
Jun 02, 2005
Brad Beveridge
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Regan Heath
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Regan Heath
Jun 02, 2005
Brad Beveridge
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Brad Beveridge
Jun 02, 2005
Regan Heath
Jun 02, 2005
Brad Beveridge
Jun 02, 2005
Regan Heath
Jun 02, 2005
Derek Parnell
Jun 02, 2005
Regan Heath
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Tom S
Jun 02, 2005
Sean Kelly
Jun 02, 2005
Sean Kelly
Jun 02, 2005
Thomas Kuehne
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Thomas Kuehne
Jun 02, 2005
Andrew Fedoniouk
Jun 02, 2005
Eugene Pelekhay
May 31, 2005
Derek Parnell
May 31, 2005
Andrew Fedoniouk
Jun 03, 2005
Jan-Eric Duden
Jun 02, 2005
Derek Parnell
Jun 02, 2005
Andrew Fedoniouk
May 31, 2005
>> Are you going to have string constants castable to String, BTW? Or any other class? That would be nice...

> Walter asks:

> What advantage does java.lang.String have? Why does string need to be a class?

string does not need to be a class.
It is nice to be able to declare methods for it though.
At least for the sake of Java-2-D tool or so.

java.lang.String class has a) methods b) String owns buffer - it controls
buffer.

In D is possible:
int[char[]] map;
char[] s = "something";
map[s] = 1;
s[0] = '?'; // I have no idea what result will be. sure not good.

And you can bump into such problem quite easily in D. I personally did many times. And too hard to find source sometimes.

In Java such collision is not possible in principle: String is final and
immutable.
Java strings more (I would say - too) greedy but more robust. In D I tried
to create
something like String but declarations like str = new String("real string");
or with structs str = String("mmmm");
are just boring and aestheticlly disastrous.

For Java guys such D strings will be just a source of permanent errors.

To prevent collisions Mango library (nice one!)
uses two versions of classes e.g. Dictionary/MutableDictionary -
a bit overkill, imho, but works.

Ideally in D it should be possible to reproduce
at least std::string. (I am yet silent about copy-on-write version)
I tried four times - did not find yet reliable
solution. I am pretty sure - it is impossible to implement the same
abstarction in D and with the same
overhead.
struct was good candiadate for such wrapper but no copying ctor. class needs
allocation.
Only dup so far. But solution with dup is even worse than in Java. See:

class Url
{
   char[] _hostname;
   ...
   char[] hostname() { return _hostname.dup; } // Doh!
}

if( url.hostname == "terrainformatica.com" )
// 32 bytes less in memory, just to compare it!
  ....



Ideal from many points of view would be a solution with const

class Url {
  char[] _hostname;

  const char[] hostname() { return _hostname; } // Yep! this exactly what we
need.

}

I think that it would be just enough to be able to declare as const
variables of simple
types - array and pointers.

Generally speaking const does not imply better assembler code. But const helps to build optimal and fast systems where GC spends 1% of time and not 20%.

Andrew.


May 31, 2005
On Mon, 30 May 2005 22:50:29 -0700, Andrew Fedoniouk wrote:


[snip]

> In D is possible:
> int[char[]] map;
> char[] s = "something";
> map[s] = 1;
> s[0] = '?'; // I have no idea what result will be. sure not good.
> 
> And you can bump into such problem quite easily in D. I personally did many times. And too hard to find source sometimes.

I'm sure you already know this, but for the benefit of others, you can avoid this trap by coding ...

 int[char[]] map;
 char[] s = "something";
 map[s.dup] = 1; // NB: .dup call.
 s[0] = '?'; // Does not mess up the index to map.

-- 
Derek
Melbourne, Australia
31/05/2005 4:05:26 PM
May 31, 2005
"Derek Parnell" <derek@psych.ward> wrote in message news:1p5feg14mh412.1x6qgemuugouf$.dlg@40tude.net...
> On Mon, 30 May 2005 22:50:29 -0700, Andrew Fedoniouk wrote:
>
>
> [snip]
>
>> In D is possible:
>> int[char[]] map;
>> char[] s = "something";
>> map[s] = 1;
>> s[0] = '?'; // I have no idea what result will be. sure not good.
>>
>> And you can bump into such problem quite easily in D. I personally did many times. And too hard to find source sometimes.
>
> I'm sure you already know this, but for the benefit of others, you can avoid this trap by coding ...
>
> int[char[]] map;
> char[] s = "something";
> map[s.dup] = 1; // NB: .dup call.
> s[0] = '?'; // Does not mess up the index to map.

Thanks, Derek.
But shall I put you recomendation into comments for each function returning
string?
Don't store, don't modify, etc?
If you put my string into your map always do its dup, etc.

This is not that I am considering as technically
correct solution.

Andrew.


May 31, 2005
"Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d7gtvf$qs0$1@digitaldaemon.com...
> java.lang.String class has a) methods b) String owns buffer - it controls
> buffer.
>
> In D is possible:
> int[char[]] map;
> char[] s = "something";
> map[s] = 1;
> s[0] = '?'; // I have no idea what result will be. sure not good.
>
> And you can bump into such problem quite easily in D. I personally did many times. And too hard to find source sometimes.
>
> In Java such collision is not possible in principle: String is final and immutable.

A number of languages use the immutable string idiom, and its corollary "always implicitly copy the string when writing to it". They all share another common characteristic - they're slow, and they're slow in a manner that is *not fixable*. And they're not just slower by a factor, many algorithms run *exponentially* slower because of the copying.

D must be fast, and the only way to be fast with strings (and arrays) is to not have the language implicitly copy them, but to allow the programmer the flexibility to copy or not copy. To know when to copy, use the Copy On Write principle (COW). That is, if you're not *sure* you've got the only copy of a string, .dup it before modifying it.

So why isn't that just as bad as the languages that implicitly copy on write? The answer is that often, you know that you are the sole owner, such as:

    char[] s = new char[10];
    for (i = 0; i < 10; i++)
        s[i] = 'c';

Those other languages are doomed to make 10 copies of s. The D programmer needs to make 0 copies.

As to your example above, when you pass a reference to a string to an associative array, then you aren't the sole owner of that string anymore. Don't change it. .dup it.


May 31, 2005
"Walter" <newshound@digitalmars.com> wrote in message news:d7h4rf$1345$1@digitaldaemon.com...
>
> "Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d7gtvf$qs0$1@digitaldaemon.com...
>> java.lang.String class has a) methods b) String owns buffer - it controls
>> buffer.
>>
>> In D is possible:
>> int[char[]] map;
>> char[] s = "something";
>> map[s] = 1;
>> s[0] = '?'; // I have no idea what result will be. sure not good.
>>
>> And you can bump into such problem quite easily in D. I personally did many times. And too hard to find source sometimes.
>>
>> In Java such collision is not possible in principle: String is final and immutable.
>
> A number of languages use the immutable string idiom, and its corollary "always implicitly copy the string when writing to it". They all share another common characteristic - they're slow, and they're slow in a manner that is *not fixable*. And they're not just slower by a factor, many algorithms run *exponentially* slower because of the copying.
>
> D must be fast, and the only way to be fast with strings (and arrays) is
> to
> not have the language implicitly copy them, but to allow the programmer
> the
> flexibility to copy or not copy. To know when to copy, use the Copy On
> Write
> principle (COW). That is, if you're not *sure* you've got the only copy of
> a
> string, .dup it before modifying it.
>
> So why isn't that just as bad as the languages that implicitly copy on
> write? The answer is that often, you know that you are the sole owner,
> such
> as:
>
>    char[] s = new char[10];
>    for (i = 0; i < 10; i++)
>        s[i] = 'c';
>
> Those other languages are doomed to make 10 copies of s. The D programmer needs to make 0 copies.
>
> As to your example above, when you pass a reference to a string to an associative array, then you aren't the sole owner of that string anymore. Don't change it. .dup it.
>
>

Gotcha.

And what will be your advice then for:

class Url {
  char[] _hostname;
  char[] hostname() { return _hostname; }
}

_hostname should not be changeable nor intentionally
nor accidentally.
hostname access pattern is primarily read. But it could possibly be
passed in some third party functions.

I am serious. I really want to know how to design it better.

I've made an ugly

struct string {
   wchar[] chars;
   bool      mutable;
}

But this not working in 15% of cases.

I am remebering old good days of C programming with these char[]s. Damned fast but not maintainable.

In C++ I have my own nice tool::string with reliable copy-on-write..... sigh.

Andrew.


May 31, 2005
How about immutable final String for general stuff end StringBuffer (or whatever) for performance needs.

ubau

In article <d7h4rf$1345$1@digitaldaemon.com>, Walter says...
>
>
>"Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d7gtvf$qs0$1@digitaldaemon.com...
>> java.lang.String class has a) methods b) String owns buffer - it controls
>> buffer.
>>
>> In D is possible:
>> int[char[]] map;
>> char[] s = "something";
>> map[s] = 1;
>> s[0] = '?'; // I have no idea what result will be. sure not good.
>>
>> And you can bump into such problem quite easily in D. I personally did many times. And too hard to find source sometimes.
>>
>> In Java such collision is not possible in principle: String is final and immutable.
>
>A number of languages use the immutable string idiom, and its corollary "always implicitly copy the string when writing to it". They all share another common characteristic - they're slow, and they're slow in a manner that is *not fixable*. And they're not just slower by a factor, many algorithms run *exponentially* slower because of the copying.
>
>D must be fast, and the only way to be fast with strings (and arrays) is to not have the language implicitly copy them, but to allow the programmer the flexibility to copy or not copy. To know when to copy, use the Copy On Write principle (COW). That is, if you're not *sure* you've got the only copy of a string, .dup it before modifying it.
>
>So why isn't that just as bad as the languages that implicitly copy on write? The answer is that often, you know that you are the sole owner, such as:
>
>    char[] s = new char[10];
>    for (i = 0; i < 10; i++)
>        s[i] = 'c';
>
>Those other languages are doomed to make 10 copies of s. The D programmer needs to make 0 copies.
>
>As to your example above, when you pass a reference to a string to an associative array, then you aren't the sole owner of that string anymore. Don't change it. .dup it.
>
>


May 31, 2005
Andrew Fedoniouk wrote:

> And what will be your advice then for:
> 
> class Url {
>   char[] _hostname;
>   char[] hostname() { return _hostname; }
> }
> 
> _hostname should not be changeable nor intentionally
> nor accidentally.
> hostname access pattern is primarily read. But it could possibly be
> passed in some third party functions.
> 
> I am serious. I really want to know how to design it better.

Magic Eight Ball says:
               ___
              /   \
             /     \
            /  ASK  \
           /  AGAIN  \
          /   LATER   \
          \___________/

My own prediction is that we argue about it for a few months more,
and then Walter caves in and adds a "readonly" keyword to D... :-)

For the time being, I think returning the string and asking
others to be nice is better than using a Class or a struct ?

> I am remebering old good days of C programming with these char[]s.
> Damned fast but not maintainable.

That's where we are at now, I suppose.

I've already run into some things regarding string literals.
And that was even before any potential class library user...

Copy on Write is currently just a Gentlemen's Agreement.
And it needs the client using Url.hostname to play along.

--anders
May 31, 2005
U.Baumanis wrote:

> How about immutable final String for general stuff end StringBuffer (or
> whatever) for performance needs.

If you want some Java-like string classes, I hacked some stuff together:
http://www.algonet.se/~afb/d/dcaf/html/class_string.html
http://www.algonet.se/~afb/d/dcaf/html/class_string_buffer.html

That doesn't change the "readonly" (was: const) needs of the built-in
string types of D (code unit arrays) ? Just something of a workaround.
Kris has a much nicer wrapper (with ICU features) under the Mango Tree.

--anders
May 31, 2005
In article <d7hagf$194p$1@digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
>
>U.Baumanis wrote:
>
>> How about immutable final String for general stuff end StringBuffer (or whatever) for performance needs.
>
>If you want some Java-like string classes, I hacked some stuff together: http://www.algonet.se/~afb/d/dcaf/html/class_string.html http://www.algonet.se/~afb/d/dcaf/html/class_string_buffer.html
>
>That doesn't change the "readonly" (was: const) needs of the built-in
>string types of D (code unit arrays) ? Just something of a workaround.
>Kris has a much nicer wrapper (with ICU features) under the Mango Tree.
>
>--anders

Thanks! It would be nice to have it in std.string.
Well, better somewhere than nowhere. :-)

--
ubau


May 31, 2005
Walter wrote:
> "Andrew Fedoniouk" <news@terrainformatica.com> wrote in message
> news:d7gtvf$qs0$1@digitaldaemon.com...
> 
>>java.lang.String class has a) methods b) String owns buffer - it controls
>>buffer.
>>
>>In D is possible:
>>int[char[]] map;
>>char[] s = "something";
>>map[s] = 1;
>>s[0] = '?'; // I have no idea what result will be. sure not good.
>>
>>And you can bump into such problem quite easily in D. I personally
>>did many times. And too hard to find source sometimes.
>>
>>In Java such collision is not possible in principle: String is final and
>>immutable.
> 
> 
> A number of languages use the immutable string idiom, and its corollary
> "always implicitly copy the string when writing to it". They all share
> another common characteristic - they're slow, and they're slow in a manner
> that is *not fixable*. And they're not just slower by a factor, many
> algorithms run *exponentially* slower because of the copying.
> 
> D must be fast, and the only way to be fast with strings (and arrays) is to
> not have the language implicitly copy them, but to allow the programmer the
> flexibility to copy or not copy. To know when to copy, use the Copy On Write
> principle (COW). That is, if you're not *sure* you've got the only copy of a
> string, .dup it before modifying it.
> 
> So why isn't that just as bad as the languages that implicitly copy on
> write? The answer is that often, you know that you are the sole owner, such
> as:
> 
>     char[] s = new char[10];
>     for (i = 0; i < 10; i++)
>         s[i] = 'c';

May be I'm dummy, but I don't see in this example why this other languages must copy it 10 times. For my implementation of reference counted string in my C++ project, copy will be performed also 0 times. And if there is more then 1 reference to instance exsits it's only one copy operation will be performed. I see only one advantage in current implementation of string - not need to check or increment/decrement reference counter, but instead of this string duplication is required

> 
> Those other languages are doomed to make 10 copies of s. The D programmer
> needs to make 0 copies.
> 
> As to your example above, when you pass a reference to a string to an
> associative array, then you aren't the sole owner of that string anymore.
> Don't change it. .dup it.
> 
> 
« First   ‹ Prev
1 2 3 4 5 6 7 8