Jump to page: 1 2 3
Thread overview
string concatenation idea.
Jun 15, 2004
Regan Heath
Jun 15, 2004
Matthew
Jun 15, 2004
Regan Heath
Jun 15, 2004
Stewart Gordon
Jun 15, 2004
Hauke Duden
Jun 15, 2004
Regan Heath
Jun 16, 2004
Arcane Jill
Jun 16, 2004
Hauke Duden
Jun 16, 2004
Arcane Jill
Jun 16, 2004
J Anderson
Jun 16, 2004
Ben Hinkle
Jun 15, 2004
J Anderson
Jun 15, 2004
Derek Parnell
Jun 15, 2004
J Anderson
Jun 15, 2004
Regan Heath
Jun 15, 2004
Regan Heath
Jun 15, 2004
J Anderson
Jun 15, 2004
Ivan Senji
Jun 15, 2004
J Anderson
Jun 15, 2004
Regan Heath
Jun 15, 2004
J Anderson
Jun 15, 2004
J Anderson
Jun 15, 2004
Ivan Senji
Jun 15, 2004
Regan Heath
Jun 15, 2004
Regan Heath
Jun 16, 2004
J Anderson
June 15, 2004
The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays.

Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow.

I tried this:

char[] p = "regan"

p.length = 10;
p ~= "fred";

and ended up with a string containing

'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

which was not what I was after :)

I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go:

char[] p = "regan";

p.reserve = 10;
p ~= "fred";

and end up with a string containing

'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this.

Thoughts?

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 15, 2004
It means adding another member to slices, and would complicate it in other ways also, since one would have to distinguish between slices that own and slices that view.

I prefer a language based approach, similar to that for Java, whereby the elements in a single concatenation statement are actually appended to a StringBuffer. This boosts string concatenation performance enormously.

I showed how to achieve this in a similar vein, and with similarly significant performance benefits, for C++ in my recent (June's CUJ) article "Fast, Non-intrusive String Concatenation". Walter was one of the reviewers, so he's au fait with the technique.

I suggest something similar can be done for D, by transcribing ~ sequences into calls to an underlying implementation class/API. It wouldn't be hard to do, and the restriction would just be the same for Java and C++ (using my fast_string_concatenator<>), in that it would only work for a single statement. Not that that's a particularly onerous restriction, of course.

The alternative is just to have a StringBuilder class, which doesn't seem too much of a burden either.

"Regan Heath" <regan@netwin.co.nz> wrote in message news:opr9l56hws5a2sq9@digitalmars.com...
> The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays.
>
> Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow.
>
> I tried this:
>
> char[] p = "regan"
>
> p.length = 10;
> p ~= "fred";
>
> and ended up with a string containing
>
> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
>
> which was not what I was after :)
>
> I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go:
>
> char[] p = "regan";
>
> p.reserve = 10;
> p ~= "fred";
>
> and end up with a string containing
>
> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'
>
> and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this.
>
> Thoughts?
>
> -- 
> Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/


June 15, 2004
Regan Heath wrote:

> The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays.
>
> Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow.
>
> I tried this:
>
> char[] p = "regan"
>
> p.length = 10;
> p ~= "fred";
>
> and ended up with a string containing
>
> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
>
> which was not what I was after :)
>
> I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go:
>
> char[] p = "regan";
>
> p.reserve = 10;
> p ~= "fred";
>
> and end up with a string containing
>
> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'
>
> and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this.
>
> Thoughts?
>
I think this would work:
char[] p = "regan"
p.length = 10;
p.length = 5;
p ~= "fred";

as D doesn't clean up the memory straight away.

-- 
-Anderson: http://badmama.com.au/~anderson/
June 15, 2004
On Tue, 15 Jun 2004 12:31:48 +1000, Matthew <admin@stlsoft.dot.dot.dot.dot.org> wrote:
> It means adding another member to slices, and would complicate it in other ways
> also, since one would have to distinguish between slices that own and slices that
> view.

Why do you need to add another member to slices?
Don't we already have to distinguish between slices that own and slices that view?

> I prefer a language based approach, similar to that for Java, whereby the
> elements in a single concatenation statement are actually appended to a
> StringBuffer. This boosts string concatenation performance enormously.
>
> I showed how to achieve this in a similar vein, and with similarly significant
> performance benefits, for C++ in my recent (June's CUJ) article "Fast,
> Non-intrusive String Concatenation". Walter was one of the reviewers, so he's au
> fait with the technique.
>
> I suggest something similar can be done for D, by transcribing ~ sequences into
> calls to an underlying implementation class/API. It wouldn't be hard to do, and
> the restriction would just be the same for Java and C++ (using my
> fast_string_concatenator<>), in that it would only work for a single statement.
> Not that that's a particularly onerous restriction, of course.
>
> The alternative is just to have a StringBuilder class, which doesn't seem too
> much of a burden either.

I thought the point of adding strings (i.e. char[]) to D was to avoid having other String classes?

Regan

> "Regan Heath" <regan@netwin.co.nz> wrote in message
> news:opr9l56hws5a2sq9@digitalmars.com...
>> The thread "string performance issues" by "Daniel Horn" got me thinking of
>> an idea for a change to arrays.
>>
>> Bascially, concatenation can be slow, as it causes reallocations of the
>> array. If you could pre-allocate the array then it wouldn't be as slow.
>>
>> I tried this:
>>
>> char[] p = "regan"
>>
>> p.length = 10;
>> p ~= "fred";
>>
>> and ended up with a string containing
>>
>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
>>
>> which was not what I was after :)
>>
>> I remembered a thread on arrays requesting renaming the 'length' property
>> to 'reserve' or something like that, and the idea for the addition of a
>> reserve property that simply allocated memory to the array without
>> changing it's length came to me. If we could go:
>>
>> char[] p = "regan";
>>
>> p.reserve = 10;
>> p ~= "fred";
>>
>> and end up with a string containing
>>
>> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'
>>
>> and a length of 9. Then we could do fast concatenation. Otherwise we're
>> left writing a String class that achieves this by setting length and using
>> memcpy. I thought a design goal of D was to avoid this.
>>
>> Thoughts?
>>
>> --
>> Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
>
>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 15, 2004
On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

> Regan Heath wrote:
> 
>> The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays.
>>
>> Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow.
>>
>> I tried this:
>>
>> char[] p = "regan"
>>
>> p.length = 10;
>> p ~= "fred";
>>
>> and ended up with a string containing
>>
>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
>>
>> which was not what I was after :)
>>
>> I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go:
>>
>> char[] p = "regan";
>>
>> p.reserve = 10;
>> p ~= "fred";
>>
>> and end up with a string containing
>>
>> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'
>>
>> and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this.
>>
>> Thoughts?
>>
> I think this would work:
> char[] p = "regan"
> p.length = 10;
> p.length = 5;
> p ~= "fred";
> 
> as D doesn't clean up the memory straight away.

How about ...

 char[] p = "regan"
 p.length = 10;
 p[5..9] = "fred";

-- 
Derek
Melbourne, Australia
15/Jun/04 1:04:56 PM
June 15, 2004
Derek Parnell wrote:

>On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:
>
>  
>
>>Regan Heath wrote:
>>    
>>
>>>Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow.
>>>
>>>I tried this:
>>>
>>>char[] p = "regan"
>>>
>>>p.length = 10;
>>>p ~= "fred";
>>>
>>>and ended up with a string containing
>>>
>>>'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
>>>
>>>      
>>>
>>I think this would work:
>>char[] p = "regan"
>>p.length = 10;
>>p.length = 5;
>>p ~= "fred";
>>
>>as D doesn't clean up the memory straight away.
>>    
>>
>
>How about ...
>
> char[] p = "regan"
> p.length = 10;
> p[5..9] = "fred";
>   
>
I didn't think of that, neat.

-- 
-Anderson: http://badmama.com.au/~anderson/
June 15, 2004
On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek@psych.ward> wrote:
> On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:
>
>> Regan Heath wrote:
>>
>>> The thread "string performance issues" by "Daniel Horn" got me
>>> thinking of an idea for a change to arrays.
>>>
>>> Bascially, concatenation can be slow, as it causes reallocations of
>>> the array. If you could pre-allocate the array then it wouldn't be as
>>> slow.
>>>
>>> I tried this:
>>>
>>> char[] p = "regan"
>>>
>>> p.length = 10;
>>> p ~= "fred";
>>>
>>> and ended up with a string containing
>>>
>>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
>>>
>>> which was not what I was after :)
>>>
>>> I remembered a thread on arrays requesting renaming the 'length'
>>> property to 'reserve' or something like that, and the idea for the
>>> addition of a reserve property that simply allocated memory to the
>>> array without changing it's length came to me. If we could go:
>>>
>>> char[] p = "regan";
>>>
>>> p.reserve = 10;
>>> p ~= "fred";
>>>
>>> and end up with a string containing
>>>
>>> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'
>>>
>>> and a length of 9. Then we could do fast concatenation. Otherwise
>>> we're left writing a String class that achieves this by setting length
>>> and using memcpy. I thought a design goal of D was to avoid this.
>>>
>>> Thoughts?
>>>
>> I think this would work:
>> char[] p = "regan"
>> p.length = 10;
>> p.length = 5;
>> p ~= "fred";
>>
>> as D doesn't clean up the memory straight away.
>
> How about ...
>
>  char[] p = "regan"
>  p.length = 10;
>  p[5..9] = "fred";

This works, but see the other thread "string performance issues" for an example of the real problem.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 15, 2004
On Tue, 15 Jun 2004 11:15:16 +0800, J Anderson <REMOVEanderson@badmama.com.au> wrote:
> Derek Parnell wrote:
>
>> On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:
>>
>>
>>> Regan Heath wrote:
>>>
>>>> Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow.
>>>>
>>>> I tried this:
>>>>
>>>> char[] p = "regan"
>>>>
>>>> p.length = 10;
>>>> p ~= "fred";
>>>>
>>>> and ended up with a string containing
>>>>
>>>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
>>>>
>>>>
>>> I think this would work:
>>> char[] p = "regan"
>>> p.length = 10;
>>> p.length = 5;
>>> p ~= "fred";
>>>
>>> as D doesn't clean up the memory straight away.
>>>
>>
>> How about ...
>>
>> char[] p = "regan"
>> p.length = 10;
>> p[5..9] = "fred";
>>
> I didn't think of that, neat.

Yes, but not quite as neat as it could be, see the real problem in the thread "string performance issues" by "Daniel Horn".

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 15, 2004
Regan Heath wrote:

> On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek@psych.ward> wrote:
>
>> On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:
>>
>>> Regan Heath wrote:
>>>
>>>> The thread "string performance issues" by "Daniel Horn" got me
>>>> thinking of an idea for a change to arrays.
>>>>
>>>> Bascially, concatenation can be slow, as it causes reallocations of
>>>> the array. If you could pre-allocate the array then it wouldn't be as
>>>> slow.
>>>>
>>>> I tried this:
>>>>
>>>> char[] p = "regan"
>>>>
>>>> p.length = 10;
>>>> p ~= "fred";
>>>>
>>>> and ended up with a string containing
>>>>
>>>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
>>>>
>>>> which was not what I was after :)
>>>>
>>>> I remembered a thread on arrays requesting renaming the 'length'
>>>> property to 'reserve' or something like that, and the idea for the
>>>> addition of a reserve property that simply allocated memory to the
>>>> array without changing it's length came to me. If we could go:
>>>>
>>>> char[] p = "regan";
>>>>
>>>> p.reserve = 10;
>>>> p ~= "fred";
>>>>
>>>> and end up with a string containing
>>>>
>>>> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'
>>>>
>>>> and a length of 9. Then we could do fast concatenation. Otherwise
>>>> we're left writing a String class that achieves this by setting length
>>>> and using memcpy. I thought a design goal of D was to avoid this.
>>>>
>>>> Thoughts?
>>>>
>>> I think this would work:
>>> char[] p = "regan"
>>> p.length = 10;
>>> p.length = 5;
>>> p ~= "fred";
>>>
>>> as D doesn't clean up the memory straight away.
>>
>>
>> How about ...
>>
>>  char[] p = "regan"
>>  p.length = 10;
>>  p[5..9] = "fred";
>
>
> This works, but see the other thread "string performance issues" for an example of the real problem.
>
> Regan
>
Personally I'd use block allocation for a problem like that.  That's what you'd do in C.  Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand).

I don't think strings should be treated as a specific type of array.  Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course.

Adding a reserve property could increase the string overhead, unless all it did was:

template reserveT(T)
{
  void reserve(inout char [] array, uint length)
  {
      int oldlen = array.length;
      array.length = length;
      array.length = oldlen;
  }
}

alias reserveT!(char).reserve reserve;

Hay, what do you know - I just solved your problem *grin*.

Now you can write:

array.reserve(10);


-- 
-Anderson: http://badmama.com.au/~anderson/
June 15, 2004
"J Anderson" <REMOVEanderson@badmama.com.au> wrote in message news:calte9$fm0$1@digitaldaemon.com...
> Regan Heath wrote:
>
> > On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek@psych.ward> wrote:
> >
> >> On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:
> >>
> >>> Regan Heath wrote:
> >>>
> >>>> The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays.
> >>>>
> >>>> Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow.
> >>>>
> >>>> I tried this:
> >>>>
> >>>> char[] p = "regan"
> >>>>
> >>>> p.length = 10;
> >>>> p ~= "fred";
> >>>>
> >>>> and ended up with a string containing
> >>>>
> >>>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
> >>>>
> >>>> which was not what I was after :)
> >>>>
> >>>> I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go:
> >>>>
> >>>> char[] p = "regan";
> >>>>
> >>>> p.reserve = 10;
> >>>> p ~= "fred";
> >>>>
> >>>> and end up with a string containing
> >>>>
> >>>> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'
> >>>>
> >>>> and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting
length
> >>>> and using memcpy. I thought a design goal of D was to avoid this.
> >>>>
> >>>> Thoughts?
> >>>>
> >>> I think this would work:
> >>> char[] p = "regan"
> >>> p.length = 10;
> >>> p.length = 5;
> >>> p ~= "fred";
> >>>
> >>> as D doesn't clean up the memory straight away.
> >>
> >>
> >> How about ...
> >>
> >>  char[] p = "regan"
> >>  p.length = 10;
> >>  p[5..9] = "fred";
> >
> >
> > This works, but see the other thread "string performance issues" for an example of the real problem.
> >
> > Regan
> >
> Personally I'd use block allocation for a problem like that.  That's what you'd do in C.  Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand).
>
> I don't think strings should be treated as a specific type of array. Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course.
>
> Adding a reserve property could increase the string overhead, unless all it did was:
>
> template reserveT(T)
> {
>    void reserve(inout char [] array, uint length)

Did you mean to write:

    void reserve(inout T [] array, uint length)
:)

>    {
>        int oldlen = array.length;
>        array.length = length;
>        array.length = oldlen;
>    }
> }
>
> alias reserveT!(char).reserve reserve;
>
> Hay, what do you know - I just solved your problem *grin*.
>
> Now you can write:
>
> array.reserve(10);
>

Nice :)


> --
> -Anderson: http://badmama.com.au/~anderson/


« First   ‹ Prev
1 2 3