Jump to page: 1 2
Thread overview
string performance issues
Jun 14, 2004
Daniel Horn
Jun 14, 2004
Ben Hinkle
Jun 14, 2004
Daniel Horn
Jun 14, 2004
Regan Heath
Jun 14, 2004
Daniel Horn
Jun 14, 2004
Regan Heath
Jun 15, 2004
Daniel Horn
Jun 15, 2004
Regan Heath
Jun 15, 2004
Regan Heath
Jun 15, 2004
Derek Parnell
Jun 15, 2004
Regan Heath
Jun 15, 2004
Daniel Horn
Re: string performance issues: a short thought
Jun 15, 2004
Daniel Horn
String contatenation/performance
Jun 15, 2004
Arcane Jill
Jun 15, 2004
Sean Kelly
Jun 15, 2004
Vathix
Jun 15, 2004
Matthew
Jun 15, 2004
Ben Hinkle
Jun 15, 2004
Walter
Jun 15, 2004
Ivan Senji
June 14, 2004
I'm writing a program which spits out an .obj file.
I'm doing

char[] out;
for (int i=0;i<mvert;i++)
   out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";

for (int i=0;i<mface;i++)
   out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";

the performance is simply abysmal.  To write a 50 meg file takes upwards of 2 minutes.

is there any way to write this to be fast without sacrificing readability;
part of the problem seems to be the realloc per face (instead of intelligently doubling the ram allocated)
but I also suspect allocating so many small strings with ftoa and itoa isn't helping.
June 14, 2004
Instead of building one huge string in memory how about processing line by line:

 char[128] out; // 128 is max str len
 for (int i=0;i<mface;i++) {
    sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
    ... do something with out ...
 }

If the end result is going to a file the temporary buffer might not even be needed - the printf can go right to the file.

-Ben

Daniel Horn wrote:

> I'm writing a program which spits out an .obj file.
> I'm doing
> 
> char[] out;
> for (int i=0;i<mvert;i++)
>     out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";
> 
> for (int i=0;i<mface;i++)
>     out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";
> 
> the performance is simply abysmal.  To write a 50 meg file takes upwards of 2 minutes.
> 
> is there any way to write this to be fast without sacrificing readability;
> part of the problem seems to be the realloc per face (instead of
> intelligently doubling the ram allocated)
> but I also suspect allocating so many small strings with ftoa and itoa
> isn't helping.

June 14, 2004
That's the code I was expecting to see and exactly the code I was wishing to avoid:
a) what if the type changes (double to real and suddenly string takes too much memory and buffer overruns)
b) not type safe (what if I say %d but pass in a float)
c) you still have to realloc every face
d) sprintf isn't part of D--it's a nasty hanging chad from C...
I'd like to see a clean solution in D entirely

Ben Hinkle wrote:
> Instead of building one huge string in memory how about
> processing line by line:
> 
>  char[128] out; // 128 is max str len
>  for (int i=0;i<mface;i++) {
>     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
>     ... do something with out ...
>  }
> 
> If the end result is going to a file the temporary buffer
> might not even be needed - the printf can go right to the file.
> 
> -Ben
> 
> Daniel Horn wrote:
> 
> 
>>I'm writing a program which spits out an .obj file.
>>I'm doing
>>
>>char[] out;
>>for (int i=0;i<mvert;i++)
>>    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";
>>
>>for (int i=0;i<mface;i++)
>>    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";
>>
>>the performance is simply abysmal.  To write a 50 meg file takes upwards
>>of 2 minutes.
>>
>>is there any way to write this to be fast without sacrificing readability;
>>part of the problem seems to be the realloc per face (instead of
>>intelligently doubling the ram allocated)
>>but I also suspect allocating so many small strings with ftoa and itoa
>>isn't helping.
> 
> 
June 14, 2004
What about...

f = fopen("file.txt","w");
if (!f) ..barf..

for (int i = 0; i < mvert; i++) {
	fprintf(f,"f %s %s %s\n",toString(x[i]),toString(y[i]),toString(z[i]));
}

fclose(f);

On Mon, 14 Jun 2004 15:22:47 -0700, Daniel Horn <hellcatv@hotmail.com> wrote:
>
> That's the code I was expecting to see and exactly the code I was wishing to avoid:
> a) what if the type changes (double to real and suddenly string takes too much memory and buffer overruns)
> b) not type safe (what if I say %d but pass in a float)
> c) you still have to realloc every face
> d) sprintf isn't part of D--it's a nasty hanging chad from C...
> I'd like to see a clean solution in D entirely
>
> Ben Hinkle wrote:
>> Instead of building one huge string in memory how about
>> processing line by line:
>>
>>  char[128] out; // 128 is max str len
>>  for (int i=0;i<mface;i++) {
>>     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
>>     ... do something with out ...
>>  }
>>
>> If the end result is going to a file the temporary buffer
>> might not even be needed - the printf can go right to the file.
>>
>> -Ben
>>
>> Daniel Horn wrote:
>>
>>
>>> I'm writing a program which spits out an .obj file.
>>> I'm doing
>>>
>>> char[] out;
>>> for (int i=0;i<mvert;i++)
>>>    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";
>>>
>>> for (int i=0;i<mface;i++)
>>>    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";
>>>
>>> the performance is simply abysmal.  To write a 50 meg file takes upwards
>>> of 2 minutes.
>>>
>>> is there any way to write this to be fast without sacrificing readability;
>>> part of the problem seems to be the realloc per face (instead of
>>> intelligently doubling the ram allocated)
>>> but I also suspect allocating so many small strings with ftoa and itoa
>>> isn't helping.
>>
>>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 14, 2004
That uses a string and is not typesafe
and what if I want to send it over the net.

Basically I want to do stringops internally :-) and I want to do it the "D" way.

I'm debating whether there should be a struct String that did the resizing appropriately so appends work fast... or a class String (I'm leaning towards struct since it would be a wrapper around char[] with an xtra length field)
that way I could dynamically size it appropriately (each overrun multiplying allocated length by some constant >= 1.5)  walter would this be a good idea? or is there some magic you can pull so that when you assign .length or append it won't call realloc or some other slow function.

i.e. does a string always hold exactly .length (rounded up to some constant mallocable size) or does it really double the length when you overrun to aggregate decent performance out of things


Regan Heath wrote:
> What about...
> 
> f = fopen("file.txt","w");
> if (!f) ..barf..
> 
> for (int i = 0; i < mvert; i++) {
>     fprintf(f,"f %s %s %s\n",toString(x[i]),toString(y[i]),toString(z[i]));
> }
> 
> fclose(f);
> 
> On Mon, 14 Jun 2004 15:22:47 -0700, Daniel Horn <hellcatv@hotmail.com> wrote:
> 
>>
>> That's the code I was expecting to see and exactly the code I was wishing to avoid:
>> a) what if the type changes (double to real and suddenly string takes too much memory and buffer overruns)
>> b) not type safe (what if I say %d but pass in a float)
>> c) you still have to realloc every face
>> d) sprintf isn't part of D--it's a nasty hanging chad from C...
>> I'd like to see a clean solution in D entirely
>>
>> Ben Hinkle wrote:
>>
>>> Instead of building one huge string in memory how about
>>> processing line by line:
>>>
>>>  char[128] out; // 128 is max str len
>>>  for (int i=0;i<mface;i++) {
>>>     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
>>>     ... do something with out ...
>>>  }
>>>
>>> If the end result is going to a file the temporary buffer
>>> might not even be needed - the printf can go right to the file.
>>>
>>> -Ben
>>>
>>> Daniel Horn wrote:
>>>
>>>
>>>> I'm writing a program which spits out an .obj file.
>>>> I'm doing
>>>>
>>>> char[] out;
>>>> for (int i=0;i<mvert;i++)
>>>>    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";
>>>>
>>>> for (int i=0;i<mface;i++)
>>>>    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";
>>>>
>>>> the performance is simply abysmal.  To write a 50 meg file takes upwards
>>>> of 2 minutes.
>>>>
>>>> is there any way to write this to be fast without sacrificing readability;
>>>> part of the problem seems to be the realloc per face (instead of
>>>> intelligently doubling the ram allocated)
>>>> but I also suspect allocating so many small strings with ftoa and itoa
>>>> isn't helping.
>>>
>>>
>>>
> 
> 
> 
June 14, 2004
On Mon, 14 Jun 2004 16:29:06 -0700, Daniel Horn <hellcatv@hotmail.com> wrote:
> That uses a string

Yep. So did your example? What am I missing? pls explain..

> and is not typesafe

Why not? change x from float[] to a double[] and it still works, change it to an int[] and it still works.. sorry.. correction.. the fprintf should have had %.*s in it. eg.

fprintf(f,"f %.*s %.*s %.*s\n",toString(x[i]),toString(y[i]),toString(z[i]));

> and what if I want to send it over the net.

The f on the start of the line says it's a float, you chop the string on spaces and parse accordingly. Isn't that what the f is there for?

> Basically I want to do stringops internally :-) and I want to do it the "D" way.

Im not sure I understand what you mean..

You can set the length (if you know what you need) and you can assign to a slice i.e.

char[] test = "a guy named jones walked down the street"
char[] foo = "regan";

test[12..17] = foo[];

So assuming your values always have a set length you can set the length of the string, then assign to the appropriate slices the data.

> I'm debating whether there should be a struct String that did the resizing appropriately so appends work fast... or a class String (I'm leaning towards struct since it would be a wrapper around char[] with an xtra length field)
> that way I could dynamically size it appropriately (each overrun multiplying allocated length by some constant >= 1.5)  walter would this be a good idea? or is there some magic you can pull so that when you assign .length or append it won't call realloc or some other slow function.

If you set the length then append, it actually appends to the end of the new allocated length eg.

char[] test = "regan";

test.length = 10;
test ~= "fred";
printf("%d:= ",test.length);
foreach(char c; test)
	printf("%02x ",c);

outputs

14:= 72 65 67 61 6e 00 00 00 00 00 66 72 65 64

> i.e. does a string always hold exactly .length (rounded up to some constant mallocable size) or does it really double the length when you overrun to aggregate decent performance out of things
>
> Regan Heath wrote:
>> What about...
>>
>> f = fopen("file.txt","w");
>> if (!f) ..barf..
>>
>> for (int i = 0; i < mvert; i++) {
>>     fprintf(f,"f %s %s %s\n",toString(x[i]),toString(y[i]),toString(z[i]));
>> }
>>
>> fclose(f);
>>
>> On Mon, 14 Jun 2004 15:22:47 -0700, Daniel Horn <hellcatv@hotmail.com> wrote:
>>
>>>
>>> That's the code I was expecting to see and exactly the code I was wishing to avoid:
>>> a) what if the type changes (double to real and suddenly string takes too much memory and buffer overruns)
>>> b) not type safe (what if I say %d but pass in a float)
>>> c) you still have to realloc every face
>>> d) sprintf isn't part of D--it's a nasty hanging chad from C...
>>> I'd like to see a clean solution in D entirely
>>>
>>> Ben Hinkle wrote:
>>>
>>>> Instead of building one huge string in memory how about
>>>> processing line by line:
>>>>
>>>>  char[128] out; // 128 is max str len
>>>>  for (int i=0;i<mface;i++) {
>>>>     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
>>>>     ... do something with out ...
>>>>  }
>>>>
>>>> If the end result is going to a file the temporary buffer
>>>> might not even be needed - the printf can go right to the file.
>>>>
>>>> -Ben
>>>>
>>>> Daniel Horn wrote:
>>>>
>>>>
>>>>> I'm writing a program which spits out an .obj file.
>>>>> I'm doing
>>>>>
>>>>> char[] out;
>>>>> for (int i=0;i<mvert;i++)
>>>>>    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";
>>>>>
>>>>> for (int i=0;i<mface;i++)
>>>>>    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";
>>>>>
>>>>> the performance is simply abysmal.  To write a 50 meg file takes upwards
>>>>> of 2 minutes.
>>>>>
>>>>> is there any way to write this to be fast without sacrificing readability;
>>>>> part of the problem seems to be the realloc per face (instead of
>>>>> intelligently doubling the ram allocated)
>>>>> but I also suspect allocating so many small strings with ftoa and itoa
>>>>> isn't helping.
>>>>
>>>>
>>>>
>>
>>
>>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 15, 2004
I don't know the length of my numbers...
so slice assignment is painful... basically I have to have some sort of ftoa and itoa function (that need to avoid assigning memory, returning statically sized structs and lengths) then assign it to the slice, after keeping track of the last length and finding the next length.

then I need a separate counter to see how much I've allocated...it's just a mess...and this is exactly what D was supposed to avoid.  This is the old fashioned "C buffer safe" way...and I'm not happy with it or I'd still be using C.

And the bottom line is that I don't want to print to a file, I want to keep it in a string.  the code for concat is so clear--but why is it so slow?

PS: make sure to use toStringz when using printf with %s.  toString is not guaranteed to zero terminate.
Regan Heath wrote:
> On Mon, 14 Jun 2004 16:29:06 -0700, Daniel Horn <hellcatv@hotmail.com> wrote:
> 
>> That uses a string
> 
> 
> Yep. So did your example? What am I missing? pls explain..
> 
>> and is not typesafe
> 
> 
> Why not? change x from float[] to a double[] and it still works, change it to an int[] and it still works.. sorry.. correction.. the fprintf should have had %.*s in it. eg.
> 
> fprintf(f,"f %.*s %.*s %.*s\n",toString(x[i]),toString(y[i]),toString(z[i]));
> 
>> and what if I want to send it over the net.
> 
> 
> The f on the start of the line says it's a float, you chop the string on spaces and parse accordingly. Isn't that what the f is there for?
> 
>> Basically I want to do stringops internally :-) and I want to do it the "D" way.
> 
> 
> Im not sure I understand what you mean..
> 
> You can set the length (if you know what you need) and you can assign to a slice i.e.
June 15, 2004
In article <calcc2$2nq5$1@digitaldaemon.com>, Daniel Horn says...
>
>i.e. does a string always hold exactly .length (rounded up to some constant mallocable size) or does it really double the length when you overrun to aggregate decent performance out of things

I've been meaning to ask this exact question :)  How much memory do dynamic arrays allocate when they grow and do they ever reallocate when they shrink?


Sean


June 15, 2004
"Sean Kelly" <sean@f4.ca> wrote in message news:calekm$2r2h$1@digitaldaemon.com...
> In article <calcc2$2nq5$1@digitaldaemon.com>, Daniel Horn says...
> >
> >i.e. does a string always hold exactly .length (rounded up to some constant mallocable size) or does it really double the length when you overrun to aggregate decent performance out of things
>
> I've been meaning to ask this exact question :)  How much memory do
dynamic
> arrays allocate when they grow and do they ever reallocate when they
shrink?
>

Actual allocations are the smallest power of 2 that holds the requested size. I don't think they reallocate when shrinking because you could have sliced that memory to use somewhere else.


June 15, 2004
Daniel Horn wrote:

> 
> That's the code I was expecting to see and exactly the code I was wishing to avoid:

oh well. you could have warned me :-)

> a) what if the type changes (double to real and suddenly string takes
> too much memory and buffer overruns)

another (possibly more common) case is switching to a template where the
type isn't known. Casting is a way out:
 sprintf(buf,"f %g\n",cast(double)a[i]);
If casting is too ugly then sprintf probably isn't the way to go.
If overflow is a concern then snprintf is an option. Now that I think about
it how about a D wrapper around the printf family that takes a dynamic
array as the candidate output buffer and if the string fits in the array
then it fills it and returns the slice holding the result and otherwise it
allocates a dynamic array and fills that. It would probably be a few lines
of snprintf and array allocation. The declaration is
 char[] sprintf(char[], char*, ...)

> b) not type safe (what if I say %d but pass in a float)

yup. true. as above casting is an option if that is a concern.

> c) you still have to realloc every face

I'm not exactly sure what you mean here but I'm now guessing you really do want to catenate all the strings up into one huge 50meg string in memory. Preallocation could help here.

> d) sprintf isn't part of D--it's a nasty hanging chad from C... I'd like to see a clean solution in D entirely

It is a matter of personal preference. I use C functions whenever it makes sense since I know them well and users reading my code will know them well.

> Ben Hinkle wrote:
>> Instead of building one huge string in memory how about processing line by line:
>> 
>>  char[128] out; // 128 is max str len
>>  for (int i=0;i<mface;i++) {
>>     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
>>     ... do something with out ...
>>  }
>> 
>> If the end result is going to a file the temporary buffer might not even be needed - the printf can go right to the file.
>> 
>> -Ben
>> 
>> Daniel Horn wrote:
>> 
>> 
>>>I'm writing a program which spits out an .obj file.
>>>I'm doing
>>>
>>>char[] out;
>>>for (int i=0;i<mvert;i++)
>>>    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";
>>>
>>>for (int i=0;i<mface;i++)
>>>    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";
>>>
>>>the performance is simply abysmal.  To write a 50 meg file takes upwards of 2 minutes.
>>>
>>>is there any way to write this to be fast without sacrificing
>>>readability; part of the problem seems to be the realloc per face
>>>(instead of intelligently doubling the ram allocated)
>>>but I also suspect allocating so many small strings with ftoa and itoa
>>>isn't helping.
>> 
>>

« First   ‹ Prev
1 2