Thread overview | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
June 15, 2004 string concatenation idea. | ||||
---|---|---|---|---|
| ||||
The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays.
Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow.
I tried this:
char[] p = "regan"
p.length = 10;
p ~= "fred";
and ended up with a string containing
'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'
which was not what I was after :)
I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go:
char[] p = "regan";
p.reserve = 10;
p ~= "fred";
and end up with a string containing
'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'
and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this.
Thoughts?
--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
|
June 15, 2004 Re: string concatenation idea. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Regan Heath | It means adding another member to slices, and would complicate it in other ways also, since one would have to distinguish between slices that own and slices that view. I prefer a language based approach, similar to that for Java, whereby the elements in a single concatenation statement are actually appended to a StringBuffer. This boosts string concatenation performance enormously. I showed how to achieve this in a similar vein, and with similarly significant performance benefits, for C++ in my recent (June's CUJ) article "Fast, Non-intrusive String Concatenation". Walter was one of the reviewers, so he's au fait with the technique. I suggest something similar can be done for D, by transcribing ~ sequences into calls to an underlying implementation class/API. It wouldn't be hard to do, and the restriction would just be the same for Java and C++ (using my fast_string_concatenator<>), in that it would only work for a single statement. Not that that's a particularly onerous restriction, of course. The alternative is just to have a StringBuilder class, which doesn't seem too much of a burden either. "Regan Heath" <regan@netwin.co.nz> wrote in message news:opr9l56hws5a2sq9@digitalmars.com... > The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. > > Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. > > I tried this: > > char[] p = "regan" > > p.length = 10; > p ~= "fred"; > > and ended up with a string containing > > 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' > > which was not what I was after :) > > I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: > > char[] p = "regan"; > > p.reserve = 10; > p ~= "fred"; > > and end up with a string containing > > 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' > > and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. > > Thoughts? > > -- > Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ |
June 15, 2004 Re: string concatenation idea. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Regan Heath | Regan Heath wrote: > The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. > > Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. > > I tried this: > > char[] p = "regan" > > p.length = 10; > p ~= "fred"; > > and ended up with a string containing > > 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' > > which was not what I was after :) > > I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: > > char[] p = "regan"; > > p.reserve = 10; > p ~= "fred"; > > and end up with a string containing > > 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' > > and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. > > Thoughts? > I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away. -- -Anderson: http://badmama.com.au/~anderson/ |
June 15, 2004 Re: string concatenation idea. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthew | On Tue, 15 Jun 2004 12:31:48 +1000, Matthew <admin@stlsoft.dot.dot.dot.dot.org> wrote: > It means adding another member to slices, and would complicate it in other ways > also, since one would have to distinguish between slices that own and slices that > view. Why do you need to add another member to slices? Don't we already have to distinguish between slices that own and slices that view? > I prefer a language based approach, similar to that for Java, whereby the > elements in a single concatenation statement are actually appended to a > StringBuffer. This boosts string concatenation performance enormously. > > I showed how to achieve this in a similar vein, and with similarly significant > performance benefits, for C++ in my recent (June's CUJ) article "Fast, > Non-intrusive String Concatenation". Walter was one of the reviewers, so he's au > fait with the technique. > > I suggest something similar can be done for D, by transcribing ~ sequences into > calls to an underlying implementation class/API. It wouldn't be hard to do, and > the restriction would just be the same for Java and C++ (using my > fast_string_concatenator<>), in that it would only work for a single statement. > Not that that's a particularly onerous restriction, of course. > > The alternative is just to have a StringBuilder class, which doesn't seem too > much of a burden either. I thought the point of adding strings (i.e. char[]) to D was to avoid having other String classes? Regan > "Regan Heath" <regan@netwin.co.nz> wrote in message > news:opr9l56hws5a2sq9@digitalmars.com... >> The thread "string performance issues" by "Daniel Horn" got me thinking of >> an idea for a change to arrays. >> >> Bascially, concatenation can be slow, as it causes reallocations of the >> array. If you could pre-allocate the array then it wouldn't be as slow. >> >> I tried this: >> >> char[] p = "regan" >> >> p.length = 10; >> p ~= "fred"; >> >> and ended up with a string containing >> >> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' >> >> which was not what I was after :) >> >> I remembered a thread on arrays requesting renaming the 'length' property >> to 'reserve' or something like that, and the idea for the addition of a >> reserve property that simply allocated memory to the array without >> changing it's length came to me. If we could go: >> >> char[] p = "regan"; >> >> p.reserve = 10; >> p ~= "fred"; >> >> and end up with a string containing >> >> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' >> >> and a length of 9. Then we could do fast concatenation. Otherwise we're >> left writing a String class that achieves this by setting length and using >> memcpy. I thought a design goal of D was to avoid this. >> >> Thoughts? >> >> -- >> Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ > > -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ |
June 15, 2004 Re: string concatenation idea. | ||||
---|---|---|---|---|
| ||||
Posted in reply to J Anderson | On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote: > Regan Heath wrote: > >> The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. >> >> Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. >> >> I tried this: >> >> char[] p = "regan" >> >> p.length = 10; >> p ~= "fred"; >> >> and ended up with a string containing >> >> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' >> >> which was not what I was after :) >> >> I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: >> >> char[] p = "regan"; >> >> p.reserve = 10; >> p ~= "fred"; >> >> and end up with a string containing >> >> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' >> >> and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. >> >> Thoughts? >> > I think this would work: > char[] p = "regan" > p.length = 10; > p.length = 5; > p ~= "fred"; > > as D doesn't clean up the memory straight away. How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred"; -- Derek Melbourne, Australia 15/Jun/04 1:04:56 PM |
June 15, 2004 Re: string concatenation idea. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Derek Parnell | Derek Parnell wrote: >On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote: > > > >>Regan Heath wrote: >> >> >>>Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. >>> >>>I tried this: >>> >>>char[] p = "regan" >>> >>>p.length = 10; >>>p ~= "fred"; >>> >>>and ended up with a string containing >>> >>>'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' >>> >>> >>> >>I think this would work: >>char[] p = "regan" >>p.length = 10; >>p.length = 5; >>p ~= "fred"; >> >>as D doesn't clean up the memory straight away. >> >> > >How about ... > > char[] p = "regan" > p.length = 10; > p[5..9] = "fred"; > > I didn't think of that, neat. -- -Anderson: http://badmama.com.au/~anderson/ |
June 15, 2004 Re: string concatenation idea. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Derek Parnell | On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek@psych.ward> wrote: > On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote: > >> Regan Heath wrote: >> >>> The thread "string performance issues" by "Daniel Horn" got me >>> thinking of an idea for a change to arrays. >>> >>> Bascially, concatenation can be slow, as it causes reallocations of >>> the array. If you could pre-allocate the array then it wouldn't be as >>> slow. >>> >>> I tried this: >>> >>> char[] p = "regan" >>> >>> p.length = 10; >>> p ~= "fred"; >>> >>> and ended up with a string containing >>> >>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' >>> >>> which was not what I was after :) >>> >>> I remembered a thread on arrays requesting renaming the 'length' >>> property to 'reserve' or something like that, and the idea for the >>> addition of a reserve property that simply allocated memory to the >>> array without changing it's length came to me. If we could go: >>> >>> char[] p = "regan"; >>> >>> p.reserve = 10; >>> p ~= "fred"; >>> >>> and end up with a string containing >>> >>> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' >>> >>> and a length of 9. Then we could do fast concatenation. Otherwise >>> we're left writing a String class that achieves this by setting length >>> and using memcpy. I thought a design goal of D was to avoid this. >>> >>> Thoughts? >>> >> I think this would work: >> char[] p = "regan" >> p.length = 10; >> p.length = 5; >> p ~= "fred"; >> >> as D doesn't clean up the memory straight away. > > How about ... > > char[] p = "regan" > p.length = 10; > p[5..9] = "fred"; This works, but see the other thread "string performance issues" for an example of the real problem. Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ |
June 15, 2004 Re: string concatenation idea. | ||||
---|---|---|---|---|
| ||||
Posted in reply to J Anderson | On Tue, 15 Jun 2004 11:15:16 +0800, J Anderson <REMOVEanderson@badmama.com.au> wrote: > Derek Parnell wrote: > >> On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote: >> >> >>> Regan Heath wrote: >>> >>>> Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. >>>> >>>> I tried this: >>>> >>>> char[] p = "regan" >>>> >>>> p.length = 10; >>>> p ~= "fred"; >>>> >>>> and ended up with a string containing >>>> >>>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' >>>> >>>> >>> I think this would work: >>> char[] p = "regan" >>> p.length = 10; >>> p.length = 5; >>> p ~= "fred"; >>> >>> as D doesn't clean up the memory straight away. >>> >> >> How about ... >> >> char[] p = "regan" >> p.length = 10; >> p[5..9] = "fred"; >> > I didn't think of that, neat. Yes, but not quite as neat as it could be, see the real problem in the thread "string performance issues" by "Daniel Horn". Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ |
June 15, 2004 Re: string concatenation idea. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Regan Heath | Regan Heath wrote: > On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek@psych.ward> wrote: > >> On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote: >> >>> Regan Heath wrote: >>> >>>> The thread "string performance issues" by "Daniel Horn" got me >>>> thinking of an idea for a change to arrays. >>>> >>>> Bascially, concatenation can be slow, as it causes reallocations of >>>> the array. If you could pre-allocate the array then it wouldn't be as >>>> slow. >>>> >>>> I tried this: >>>> >>>> char[] p = "regan" >>>> >>>> p.length = 10; >>>> p ~= "fred"; >>>> >>>> and ended up with a string containing >>>> >>>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' >>>> >>>> which was not what I was after :) >>>> >>>> I remembered a thread on arrays requesting renaming the 'length' >>>> property to 'reserve' or something like that, and the idea for the >>>> addition of a reserve property that simply allocated memory to the >>>> array without changing it's length came to me. If we could go: >>>> >>>> char[] p = "regan"; >>>> >>>> p.reserve = 10; >>>> p ~= "fred"; >>>> >>>> and end up with a string containing >>>> >>>> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' >>>> >>>> and a length of 9. Then we could do fast concatenation. Otherwise >>>> we're left writing a String class that achieves this by setting length >>>> and using memcpy. I thought a design goal of D was to avoid this. >>>> >>>> Thoughts? >>>> >>> I think this would work: >>> char[] p = "regan" >>> p.length = 10; >>> p.length = 5; >>> p ~= "fred"; >>> >>> as D doesn't clean up the memory straight away. >> >> >> How about ... >> >> char[] p = "regan" >> p.length = 10; >> p[5..9] = "fred"; > > > This works, but see the other thread "string performance issues" for an example of the real problem. > > Regan > Personally I'd use block allocation for a problem like that. That's what you'd do in C. Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array. Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course. Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length) { int oldlen = array.length; array.length = length; array.length = oldlen; } } alias reserveT!(char).reserve reserve; Hay, what do you know - I just solved your problem *grin*. Now you can write: array.reserve(10); -- -Anderson: http://badmama.com.au/~anderson/ |
June 15, 2004 Re: string concatenation idea. | ||||
---|---|---|---|---|
| ||||
Posted in reply to J Anderson | "J Anderson" <REMOVEanderson@badmama.com.au> wrote in message news:calte9$fm0$1@digitaldaemon.com... > Regan Heath wrote: > > > On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek@psych.ward> wrote: > > > >> On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote: > >> > >>> Regan Heath wrote: > >>> > >>>> The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. > >>>> > >>>> Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. > >>>> > >>>> I tried this: > >>>> > >>>> char[] p = "regan" > >>>> > >>>> p.length = 10; > >>>> p ~= "fred"; > >>>> > >>>> and ended up with a string containing > >>>> > >>>> 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' > >>>> > >>>> which was not what I was after :) > >>>> > >>>> I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: > >>>> > >>>> char[] p = "regan"; > >>>> > >>>> p.reserve = 10; > >>>> p ~= "fred"; > >>>> > >>>> and end up with a string containing > >>>> > >>>> 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' > >>>> > >>>> and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length > >>>> and using memcpy. I thought a design goal of D was to avoid this. > >>>> > >>>> Thoughts? > >>>> > >>> I think this would work: > >>> char[] p = "regan" > >>> p.length = 10; > >>> p.length = 5; > >>> p ~= "fred"; > >>> > >>> as D doesn't clean up the memory straight away. > >> > >> > >> How about ... > >> > >> char[] p = "regan" > >> p.length = 10; > >> p[5..9] = "fred"; > > > > > > This works, but see the other thread "string performance issues" for an example of the real problem. > > > > Regan > > > Personally I'd use block allocation for a problem like that. That's what you'd do in C. Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). > > I don't think strings should be treated as a specific type of array. Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course. > > Adding a reserve property could increase the string overhead, unless all it did was: > > template reserveT(T) > { > void reserve(inout char [] array, uint length) Did you mean to write: void reserve(inout T [] array, uint length) :) > { > int oldlen = array.length; > array.length = length; > array.length = oldlen; > } > } > > alias reserveT!(char).reserve reserve; > > Hay, what do you know - I just solved your problem *grin*. > > Now you can write: > > array.reserve(10); > Nice :) > -- > -Anderson: http://badmama.com.au/~anderson/ |
Copyright © 1999-2021 by the D Language Foundation