string concatenation idea. (page 3)

In article <cambih$16cs$1@digitaldaemon.com>, Hauke Duden says... > >I don't think it means that. It has already been mentioned that > >s.reserve(x) > >is the same as > >s.length=x; >s.length=oldLength; > >The compiler could simply rewrite it. I think it's more like: > if (x > s.length) > { > uint t = s.length; > s.length = x; > s.length = t; > } I'm not convinced, though, that this is defined behavior. It seems possible to me that future (or even third-party) D-compilers would be at liberty to NOT make this equivalence, unless it were actually part of the spec. Also, it seems to me that a really supercharged optimizer might decide that the above code leaves everything unchanged, and so could be removed altogether. (Optimizers don't always have the same notion as we coders of what is intended and what is merely a side-effect). Finally, note that you can do: > s = s[0..0]; instead of: > s = null; To set a string's length to zero without freeing the allocated memory. But again, not only is this not documented, but the documentation actually implies that this SHOULDN'T work. There is debate over whether this is a feature or a bug. Arcane Jill PS. It occurs to me that the statement: char[] s = a ~ b ~ c ~ d ~ e ~ f ~ g ~ h; would run a lot faster if it were rewritten as: ((((((((s ~= a) ~= b) ~= c) ~= d) ~= e) ~= f) ~= g) ~= h); (Sorry about the brackets - couldn't remember the associativity order of ~=).

Arcane Jill wrote: > In article <cambih$16cs$1@digitaldaemon.com>, Hauke Duden says... > >>I don't think it means that. It has already been mentioned that >> >>s.reserve(x) >> >>is the same as >> >>s.length=x; >>s.length=oldLength; >> >>The compiler could simply rewrite it. > > > > I think it's more like: > > >>if (x > s.length) >>{ >> uint t = s.length; >> s.length = x; >> s.length = t; >>} > > > I'm not convinced, though, that this is defined behavior. It seems possible to > me that future (or even third-party) D-compilers would be at liberty to NOT make > this equivalence, unless it were actually part of the spec. I agree that it should be that way. But Walter himself has recommended the length-juggling technique in this newsgroup. I don't like it either. That's why I think there should be a reserve property. The compiler is the best choice for deciding how it should be implemented. If arrays can never shrink fine, it can be implemented by changing the length property. Otherwise the compiler has to figure out the best way, depending on the implementation. If there is no reserve property then people will simply use the undocumented behaviour because it's the simplest way. Hauke

June 16, 2004

Re: string concatenation idea.

Posted by J Anderson
in reply to Regan Heath

Permalink

J Anderson

Posted in reply to Regan Heath

Permalink

Regan Heath wrote:

> On Tue, 15 Jun 2004 15:12:59 +0800, J Anderson <REMOVEanderson@badmama.com.au> wrote:
>
>> Regan Heath wrote:
>>
>>> Yes, the poster was simply stating that it should be easier than that in D, after all that is the point of having a concatenation operator.
>>>
>>>> Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand).
>>>>
>>>> I don't think strings should be treated as a specific type of array.
>>>
>>>
>>>
>>> Neither do I, nor was I suggesting that. I want this property on all arrays, the example I gave used a char[] as that is what the original poster wanted it for.
>>>
>>>> Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course.
>>>
>>>
>>>
>>> Agreed.
>>>
>>>> Adding a reserve property could increase the string overhead, unless all it did was:
>>>>
>>>> template reserveT(T)
>>>> {
>>>>    void reserve(inout char [] array, uint length)
>>>>    {
>>>>        int oldlen = array.length;
>>>>        array.length = length;
>>>>        array.length = oldlen;
>>>>    }
>>>> }
>>>>
>>>> alias reserveT!(char).reserve reserve;
>>>>
>>>> Hay, what do you know - I just solved your problem *grin*.
>>>>
>>>> Now you can write:
>>>>
>>>> array.reserve(10);
>>>
>>>
>>>
>>> This should work. :)
>>>
>>> So why not add this to arrays then?
>>>
>>> Let me re-iterate what I am suggesting...
>>>
>>> I think all arrays should have a property which you can use to specify the # of elements worth of memory for it to grab right now, this should be different to the length property which specifies the number of elements in the array.
>>
>>
>> You can make your own class to do that.
>
>
> I know I can, I have.
>
>> The reason against that approach is that it requires more overhead (requiring both memory and CPU).
>
>
> I dont think so. For example current array behaviour if you go.
>
> a.length = 1000;
> a.length = 1;
>
> is to allocate space for 1000 entries and not free it.
>
> Meaning the array already stores (or calculates) it's size in memory in a seperate place to the length.
>
> Meaning all the reserve property needs to do, is the same thing the above 2 lines already does.
>
> With one possible exception, I am not sure when/if the array frees the un-used memory, so it's possible that after those lines sometime in the middle of 1000 or so concatentation ops it decides to free the excess and I'd be back at square one (almost).

I already mentioned that way of doing it without overhead.  After some time though, that memory will be re-claimed, which I don't mind.

>> Its not to hard to write an array class that has a reserve (10 lines or so), but it would be much harder the other way round.
>>
>>> Example..
>>>
>>> char[] p;
>>> p.length = 10;
>>>
>>> the above adds 10 elements to the array, initializing them to the default value.
>>> This is the current behaviour.
>>>
>>> char[] p;
>>> p.reserve = 10;
>>>
>>> the above grabs enough memory for 10 elements, but does not add/initialize any.
>>
>>
>> Initialization is pretty cheap for arrays.  That is initialization only take a fraction of the time it requires to allocate the memory for the array.  So whatever the time saved with initialization, the user will most probably not notice it.
>
>
> Everything counts if you do it enough times. Why do something you do not need?

Because the more rules you have the more complex things become.

-- 
-Anderson: http://badmama.com.au/~anderson/

In article <caovi5$23qt$1@digitaldaemon.com>, Hauke Duden says... >But Walter himself has recommended the length-juggling technique in this newsgroup. I haven't heard Walter say that myself, but if he did, then he should know better. Nobody should EVER rely on undocumented behavior, and it would be extremely bad practice to encourage others so to do. What Walter says on this newsgroup does not constitute the D standard. Only the published manual may make that claim. Every time the DMD compiler deviates from the documentation, it ceases to be standards-compliant. (Of course, Walter has the power to change the standard as well as the implementation, but he still has to actually DO it, not just talk about doing it). There is nothing to stop anyone from writing a second D compiler tomorrow, and I do not wish to go back to the bad old days of BASIC, when every implementation of BASIC was entirely non-portable to any other because of various stupid little quirks and features unique to each. If it were officially documented that decreasing the length of a array is guaranteed not to change the address of the string, and is further guaranteed not to release any of the memory within the array's former bounds, then, and only then, could this trick be expected to work on all present and future standards-compliant D compilers. Arcane Jill (PS. Walter - sorry if this sounds rude. That wasn't my intention, I just come across that way sometimes by accident).

Arcane Jill wrote: >In article <caovi5$23qt$1@digitaldaemon.com>, Hauke Duden says... > > > >>But Walter himself has recommended the length-juggling technique in this newsgroup. >> >> > >I haven't heard Walter say that myself, but if he did, then he should know >better. Nobody should EVER rely on undocumented behavior, and it would be >extremely bad practice to encourage others so to do. > >What Walter says on this newsgroup does not constitute the D standard. Only the >published manual may make that claim. Every time the DMD compiler deviates from >the documentation, it ceases to be standards-compliant. (Of course, Walter has >the power to change the standard as well as the implementation, but he still has >to actually DO it, not just talk about doing it). > >There is nothing to stop anyone from writing a second D compiler tomorrow, and I >do not wish to go back to the bad old days of BASIC, when every implementation >of BASIC was entirely non-portable to any other because of various stupid little >quirks and features unique to each. > >If it were officially documented that decreasing the length of a array is >guaranteed not to change the address of the string, and is further guaranteed >not to release any of the memory within the array's former bounds, then, and >only then, could this trick be expected to work on all present and future >standards-compliant D compilers. > >Arcane Jill > >(PS. Walter - sorry if this sounds rude. That wasn't my intention, I just come >across that way sometimes by accident). > > Good points. That is why I said reserve should go into phobos so vendors can optimise there own and users can also write their own versions. -- -Anderson: http://badmama.com.au/~anderson/

Arcane Jill wrote: > In article <cambih$16cs$1@digitaldaemon.com>, Hauke Duden says... >> >>I don't think it means that. It has already been mentioned that >> >>s.reserve(x) >> >>is the same as >> >>s.length=x; >>s.length=oldLength; >> >>The compiler could simply rewrite it. > > > I think it's more like: > >> if (x > s.length) >> { >> uint t = s.length; >> s.length = x; >> s.length = t; >> } > > I'm not convinced, though, that this is defined behavior. It seems possible to me that future (or even third-party) D-compilers would be at liberty to NOT make this equivalence, unless it were actually part of the spec. > > Also, it seems to me that a really supercharged optimizer might decide that the above code leaves everything unchanged, and so could be removed altogether. (Optimizers don't always have the same notion as we coders of what is intended and what is merely a side-effect). > > Finally, note that you can do: > >> s = s[0..0]; > > instead of: > >> s = null; > > To set a string's length to zero without freeing the allocated memory. But again, not only is this not documented, but the documentation actually implies that this SHOULDN'T work. There is debate over whether this is a feature or a bug. The length-juggling trick also has limitations: setting the length to 0 or from 0 will wipe out the data ptr. Look in src/internal/gc/gc.d the function _d_arraysetlength. For example s.length = 10; // reserve some space s.length = 0; // sets data ptr to null s.length = 10; // reserve some more space s = s[0..0]; // preserves the data ptr s.length = 5; // allocates new data ptr > Arcane Jill > > > PS. It occurs to me that the statement: > > char[] s = a ~ b ~ c ~ d ~ e ~ f ~ g ~ h; > > would run a lot faster if it were rewritten as: > > ((((((((s ~= a) ~= b) ~= c) ~= d) ~= e) ~= f) ~= g) ~= h); > > (Sorry about the brackets - couldn't remember the associativity order of > ~=).

Forums