View mode: basic / threaded / horizontal-split · Log in · Help
May 31, 2005
Java String vs wchar[] Was: Re: inner classes
>> Are you going to have string constants castable to String, BTW?
>> Or any other class? That would be nice...

> Walter asks:

> What advantage does java.lang.String have? Why does string need to be a
> class?

string does not need to be a class.
It is nice to be able to declare methods for it though.
At least for the sake of Java-2-D tool or so.

java.lang.String class has a) methods b) String owns buffer - it controls
buffer.

In D is possible:
int[char[]] map;
char[] s = "something";
map[s] = 1;
s[0] = '?'; // I have no idea what result will be. sure not good.

And you can bump into such problem quite easily in D. I personally
did many times. And too hard to find source sometimes.

In Java such collision is not possible in principle: String is final and 
immutable.
Java strings more (I would say - too) greedy but more robust. In D I tried 
to create
something like String but declarations like str = new String("real string");
or with structs str = String("mmmm");
are just boring and aestheticlly disastrous.

For Java guys such D strings will be just a source of permanent errors.

To prevent collisions Mango library (nice one!)
uses two versions of classes e.g. Dictionary/MutableDictionary -
a bit overkill, imho, but works.

Ideally in D it should be possible to reproduce
at least std::string. (I am yet silent about copy-on-write version)
I tried four times - did not find yet reliable
solution. I am pretty sure - it is impossible to implement the same 
abstarction in D and with the same
overhead.
struct was good candiadate for such wrapper but no copying ctor. class needs 
allocation.
Only dup so far. But solution with dup is even worse than in Java. See:

class Url
{
  char[] _hostname;
  ...
  char[] hostname() { return _hostname.dup; } // Doh!
}

if( url.hostname == "terrainformatica.com" )
// 32 bytes less in memory, just to compare it!
 ....



Ideal from many points of view would be a solution with const

class Url {
 char[] _hostname;

 const char[] hostname() { return _hostname; } // Yep! this exactly what we 
need.

}

I think that it would be just enough to be able to declare as const 
variables of simple
types - array and pointers.

Generally speaking const does not imply better assembler code.
But const helps to build optimal and fast systems where GC spends
1% of time and not 20%.

Andrew.
May 31, 2005
Re: Java String vs wchar[] Was: Re: inner classes
On Mon, 30 May 2005 22:50:29 -0700, Andrew Fedoniouk wrote:


[snip]

> In D is possible:
> int[char[]] map;
> char[] s = "something";
> map[s] = 1;
> s[0] = '?'; // I have no idea what result will be. sure not good.
> 
> And you can bump into such problem quite easily in D. I personally
> did many times. And too hard to find source sometimes.

I'm sure you already know this, but for the benefit of others, you can
avoid this trap by coding ...

int[char[]] map;
char[] s = "something";
map[s.dup] = 1; // NB: .dup call.
s[0] = '?'; // Does not mess up the index to map.

-- 
Derek
Melbourne, Australia
31/05/2005 4:05:26 PM
May 31, 2005
Re: Java String vs wchar[] Was: Re: inner classes
"Derek Parnell" <derek@psych.ward> wrote in message 
news:1p5feg14mh412.1x6qgemuugouf$.dlg@40tude.net...
> On Mon, 30 May 2005 22:50:29 -0700, Andrew Fedoniouk wrote:
>
>
> [snip]
>
>> In D is possible:
>> int[char[]] map;
>> char[] s = "something";
>> map[s] = 1;
>> s[0] = '?'; // I have no idea what result will be. sure not good.
>>
>> And you can bump into such problem quite easily in D. I personally
>> did many times. And too hard to find source sometimes.
>
> I'm sure you already know this, but for the benefit of others, you can
> avoid this trap by coding ...
>
> int[char[]] map;
> char[] s = "something";
> map[s.dup] = 1; // NB: .dup call.
> s[0] = '?'; // Does not mess up the index to map.

Thanks, Derek.
But shall I put you recomendation into comments for each function returning 
string?
Don't store, don't modify, etc?
If you put my string into your map always do its dup, etc.

This is not that I am considering as technically
correct solution.

Andrew.
May 31, 2005
Re: Java String vs wchar[] Was: Re: inner classes
"Andrew Fedoniouk" <news@terrainformatica.com> wrote in message
news:d7gtvf$qs0$1@digitaldaemon.com...
> java.lang.String class has a) methods b) String owns buffer - it controls
> buffer.
>
> In D is possible:
> int[char[]] map;
> char[] s = "something";
> map[s] = 1;
> s[0] = '?'; // I have no idea what result will be. sure not good.
>
> And you can bump into such problem quite easily in D. I personally
> did many times. And too hard to find source sometimes.
>
> In Java such collision is not possible in principle: String is final and
> immutable.

A number of languages use the immutable string idiom, and its corollary
"always implicitly copy the string when writing to it". They all share
another common characteristic - they're slow, and they're slow in a manner
that is *not fixable*. And they're not just slower by a factor, many
algorithms run *exponentially* slower because of the copying.

D must be fast, and the only way to be fast with strings (and arrays) is to
not have the language implicitly copy them, but to allow the programmer the
flexibility to copy or not copy. To know when to copy, use the Copy On Write
principle (COW). That is, if you're not *sure* you've got the only copy of a
string, .dup it before modifying it.

So why isn't that just as bad as the languages that implicitly copy on
write? The answer is that often, you know that you are the sole owner, such
as:

   char[] s = new char[10];
   for (i = 0; i < 10; i++)
       s[i] = 'c';

Those other languages are doomed to make 10 copies of s. The D programmer
needs to make 0 copies.

As to your example above, when you pass a reference to a string to an
associative array, then you aren't the sole owner of that string anymore.
Don't change it. .dup it.
May 31, 2005
Re: Java String vs wchar[] Was: Re: inner classes
"Walter" <newshound@digitalmars.com> wrote in message 
news:d7h4rf$1345$1@digitaldaemon.com...
>
> "Andrew Fedoniouk" <news@terrainformatica.com> wrote in message
> news:d7gtvf$qs0$1@digitaldaemon.com...
>> java.lang.String class has a) methods b) String owns buffer - it controls
>> buffer.
>>
>> In D is possible:
>> int[char[]] map;
>> char[] s = "something";
>> map[s] = 1;
>> s[0] = '?'; // I have no idea what result will be. sure not good.
>>
>> And you can bump into such problem quite easily in D. I personally
>> did many times. And too hard to find source sometimes.
>>
>> In Java such collision is not possible in principle: String is final and
>> immutable.
>
> A number of languages use the immutable string idiom, and its corollary
> "always implicitly copy the string when writing to it". They all share
> another common characteristic - they're slow, and they're slow in a manner
> that is *not fixable*. And they're not just slower by a factor, many
> algorithms run *exponentially* slower because of the copying.
>
> D must be fast, and the only way to be fast with strings (and arrays) is 
> to
> not have the language implicitly copy them, but to allow the programmer 
> the
> flexibility to copy or not copy. To know when to copy, use the Copy On 
> Write
> principle (COW). That is, if you're not *sure* you've got the only copy of 
> a
> string, .dup it before modifying it.
>
> So why isn't that just as bad as the languages that implicitly copy on
> write? The answer is that often, you know that you are the sole owner, 
> such
> as:
>
>    char[] s = new char[10];
>    for (i = 0; i < 10; i++)
>        s[i] = 'c';
>
> Those other languages are doomed to make 10 copies of s. The D programmer
> needs to make 0 copies.
>
> As to your example above, when you pass a reference to a string to an
> associative array, then you aren't the sole owner of that string anymore.
> Don't change it. .dup it.
>
>

Gotcha.

And what will be your advice then for:

class Url {
 char[] _hostname;
 char[] hostname() { return _hostname; }
}

_hostname should not be changeable nor intentionally
nor accidentally.
hostname access pattern is primarily read. But it could possibly be
passed in some third party functions.

I am serious. I really want to know how to design it better.

I've made an ugly

struct string {
  wchar[] chars;
  bool      mutable;
}

But this not working in 15% of cases.

I am remebering old good days of C programming with these char[]s.
Damned fast but not maintainable.

In C++ I have my own nice tool::string with reliable
copy-on-write..... sigh.

Andrew.
May 31, 2005
Re: Java String vs wchar[] Was: Re: inner classes
How about immutable final String for general stuff end StringBuffer (or
whatever) for performance needs.

ubau

In article <d7h4rf$1345$1@digitaldaemon.com>, Walter says...
>
>
>"Andrew Fedoniouk" <news@terrainformatica.com> wrote in message
>news:d7gtvf$qs0$1@digitaldaemon.com...
>> java.lang.String class has a) methods b) String owns buffer - it controls
>> buffer.
>>
>> In D is possible:
>> int[char[]] map;
>> char[] s = "something";
>> map[s] = 1;
>> s[0] = '?'; // I have no idea what result will be. sure not good.
>>
>> And you can bump into such problem quite easily in D. I personally
>> did many times. And too hard to find source sometimes.
>>
>> In Java such collision is not possible in principle: String is final and
>> immutable.
>
>A number of languages use the immutable string idiom, and its corollary
>"always implicitly copy the string when writing to it". They all share
>another common characteristic - they're slow, and they're slow in a manner
>that is *not fixable*. And they're not just slower by a factor, many
>algorithms run *exponentially* slower because of the copying.
>
>D must be fast, and the only way to be fast with strings (and arrays) is to
>not have the language implicitly copy them, but to allow the programmer the
>flexibility to copy or not copy. To know when to copy, use the Copy On Write
>principle (COW). That is, if you're not *sure* you've got the only copy of a
>string, .dup it before modifying it.
>
>So why isn't that just as bad as the languages that implicitly copy on
>write? The answer is that often, you know that you are the sole owner, such
>as:
>
>    char[] s = new char[10];
>    for (i = 0; i < 10; i++)
>        s[i] = 'c';
>
>Those other languages are doomed to make 10 copies of s. The D programmer
>needs to make 0 copies.
>
>As to your example above, when you pass a reference to a string to an
>associative array, then you aren't the sole owner of that string anymore.
>Don't change it. .dup it.
>
>
May 31, 2005
Re: Java String vs wchar[] Was: Re: inner classes
Andrew Fedoniouk wrote:

> And what will be your advice then for:
> 
> class Url {
>   char[] _hostname;
>   char[] hostname() { return _hostname; }
> }
> 
> _hostname should not be changeable nor intentionally
> nor accidentally.
> hostname access pattern is primarily read. But it could possibly be
> passed in some third party functions.
> 
> I am serious. I really want to know how to design it better.

Magic Eight Ball says:
               ___
              /   \
             /     \
            /  ASK  \
           /  AGAIN  \
          /   LATER   \
          \___________/

My own prediction is that we argue about it for a few months more,
and then Walter caves in and adds a "readonly" keyword to D... :-)

For the time being, I think returning the string and asking
others to be nice is better than using a Class or a struct ?

> I am remebering old good days of C programming with these char[]s.
> Damned fast but not maintainable.

That's where we are at now, I suppose.

I've already run into some things regarding string literals.
And that was even before any potential class library user...

Copy on Write is currently just a Gentlemen's Agreement.
And it needs the client using Url.hostname to play along.

--anders
May 31, 2005
Re: Java String vs wchar[] Was: Re: inner classes
U.Baumanis wrote:

> How about immutable final String for general stuff end StringBuffer (or
> whatever) for performance needs.

If you want some Java-like string classes, I hacked some stuff together:
http://www.algonet.se/~afb/d/dcaf/html/class_string.html
http://www.algonet.se/~afb/d/dcaf/html/class_string_buffer.html

That doesn't change the "readonly" (was: const) needs of the built-in
string types of D (code unit arrays) ? Just something of a workaround.
Kris has a much nicer wrapper (with ICU features) under the Mango Tree.

--anders
May 31, 2005
Re: Java String vs wchar[] Was: Re: inner classes
In article <d7hagf$194p$1@digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
>
>U.Baumanis wrote:
>
>> How about immutable final String for general stuff end StringBuffer (or
>> whatever) for performance needs.
>
>If you want some Java-like string classes, I hacked some stuff together:
>http://www.algonet.se/~afb/d/dcaf/html/class_string.html
>http://www.algonet.se/~afb/d/dcaf/html/class_string_buffer.html
>
>That doesn't change the "readonly" (was: const) needs of the built-in
>string types of D (code unit arrays) ? Just something of a workaround.
>Kris has a much nicer wrapper (with ICU features) under the Mango Tree.
>
>--anders

Thanks! It would be nice to have it in std.string.
Well, better somewhere than nowhere. :-)

--
ubau
May 31, 2005
Re: Java String vs wchar[] Was: Re: inner classes
Walter wrote:
> "Andrew Fedoniouk" <news@terrainformatica.com> wrote in message
> news:d7gtvf$qs0$1@digitaldaemon.com...
> 
>>java.lang.String class has a) methods b) String owns buffer - it controls
>>buffer.
>>
>>In D is possible:
>>int[char[]] map;
>>char[] s = "something";
>>map[s] = 1;
>>s[0] = '?'; // I have no idea what result will be. sure not good.
>>
>>And you can bump into such problem quite easily in D. I personally
>>did many times. And too hard to find source sometimes.
>>
>>In Java such collision is not possible in principle: String is final and
>>immutable.
> 
> 
> A number of languages use the immutable string idiom, and its corollary
> "always implicitly copy the string when writing to it". They all share
> another common characteristic - they're slow, and they're slow in a manner
> that is *not fixable*. And they're not just slower by a factor, many
> algorithms run *exponentially* slower because of the copying.
> 
> D must be fast, and the only way to be fast with strings (and arrays) is to
> not have the language implicitly copy them, but to allow the programmer the
> flexibility to copy or not copy. To know when to copy, use the Copy On Write
> principle (COW). That is, if you're not *sure* you've got the only copy of a
> string, .dup it before modifying it.
> 
> So why isn't that just as bad as the languages that implicitly copy on
> write? The answer is that often, you know that you are the sole owner, such
> as:
> 
>     char[] s = new char[10];
>     for (i = 0; i < 10; i++)
>         s[i] = 'c';

May be I'm dummy, but I don't see in this example why this other 
languages must copy it 10 times. For my implementation of reference 
counted string in my C++ project, copy will be performed also 0 times. 
And if there is more then 1 reference to instance exsits it's only one 
copy operation will be performed. I see only one advantage in current 
implementation of string - not need to check or increment/decrement 
reference counter, but instead of this string duplication is required

> 
> Those other languages are doomed to make 10 copies of s. The D programmer
> needs to make 0 copies.
> 
> As to your example above, when you pass a reference to a string to an
> associative array, then you aren't the sole owner of that string anymore.
> Don't change it. .dup it.
> 
>
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home