Thread overview | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 03, 2004 performance in std.stream | ||||
---|---|---|---|---|
| ||||
Right now std.stream.readLine takes no inputs and returns a char[] of the line read. Each time it gets called it builds a string using ~=. Can there be an API that lets you pass in an "inout char[]" argument that only gets resized if needed. That would take a burden off the GC when reading files line by line. That way the existing readLine could be reimplemented as: char[] readLine() { char[] result; readLine(result); return result; } void readLine(out char[] result) { [fill and only reallocate result if needed] } The same would go for readLineW. -Ben |
March 09, 2004 Re: performance in std.stream | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ben Hinkle | You're right, the performance of std.stream isn't good. Want to design a new one? <g> |
March 10, 2004 Re: performance in std.stream | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | In article <c2j7a9$2kc5$1@digitaldaemon.com>, Walter says... > >You're right, the performance of std.stream isn't good. Want to design a new one? <g> > > .. but the stacking of chars onto the string for readLine isn't the problem. Giving it a preallocated buffer and indexing them in does not significantly improve the speed. OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins" example (#22877) is much quicker - I think because it does read's (instead of getc's trying to manage ungetc also). It is a neat start, but is incomplete. |
March 10, 2004 Re: performance in std.stream | ||||
---|---|---|---|---|
| ||||
Posted in reply to larry cowan | larry cowan wrote:
> In article <c2j7a9$2kc5$1@digitaldaemon.com>, Walter says...
>
>>You're right, the performance of std.stream isn't good. Want to design a new
>>one? <g>
>>
>>
> .. but the stacking of chars onto the string for readLine isn't the problem.
>
> Giving it a preallocated buffer and indexing them in does not significantly
> improve the speed.
>
> OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins"
> example (#22877) is much quicker - I think because it does read's (instead of
> getc's trying to manage ungetc also). It is a neat start, but is incomplete.
The biggest disadvantage to the bolt-in approach is that it's inherently dependant on templates. In C++ land this would translate to horrendous compile times, here in D land it means occasional linker weirdness unless you link stream.obj explicitly.
Also, what's missing? I tend to suffer from blindness to hypothetical needs other than my own. :)
-- andy
|
March 10, 2004 Re: performance in std.stream | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andy Friesen | In article <c2m5kb$21oj$1@digitaldaemon.com>, Andy Friesen says... > >larry cowan wrote: >> In article <c2j7a9$2kc5$1@digitaldaemon.com>, Walter says... >> >>>You're right, the performance of std.stream isn't good. Want to design a new one? <g> >>> >>> >> .. but the stacking of chars onto the string for readLine isn't the problem. >> >> Giving it a preallocated buffer and indexing them in does not significantly improve the speed. >> >> OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins". example (#22877) is much quicker - I think because it does read's (instead of getc's trying to manage ungetc also). It is a neat start, but is incomplete. > >The biggest disadvantage to the bolt-in approach is that it's inherently dependant on templates. In C++ land this would translate to horrendous compile times, here in D land it means occasional linker weirdness unless you link stream.obj explicitly. > >Also, what's missing? I tend to suffer from blindness to hypothetical needs other than my own. :) > > -- andy The writeWChar() and writeDChar() methods won't compile. It really needs overloaded this() in read and write to specify options for file open. An open() would be nice for reopening after closing, but it's not really needed when you can just instantiate it again easily - why allow explicit close?. Could a read/write be built for this? Or is this what the R and W inheritance (this(File*...)) is to support? Haven't tried to do anything with this, but there are seek requirements for switching back and forth if a read/write open is used (could be automated, but only for the common cases). |
March 10, 2004 Re: performance in std.stream | ||||
---|---|---|---|---|
| ||||
Posted in reply to larry cowan | larry cowan wrote: > The writeWChar() and writeDChar() methods won't compile. Crazy. I'll take a look. > It really needs overloaded this() in read and write to specify options for file > open. What sort of options? Text/binary, append, and the like? > An open() would be nice for reopening after closing, but it's not really needed > when you can just instantiate it again easily - why allow explicit close?. close() is necessary because D's garbage collector isn't deterministic. The destructor can clean up if you forget, but it's not a good idea to have an open file descriptor laying around like that. > Could a read/write be built for this? Or is this what the R and W inheritance > (this(File*...)) is to support? Haven't tried to do anything with this, but > there are seek requirements for switching back and forth if a read/write open is > used (could be automated, but only for the common cases). I hadn't considered simultaneous read/write; I doubt it will work. -- andy |
March 10, 2004 Re: performance in std.stream | ||||
---|---|---|---|---|
| ||||
Posted in reply to larry cowan | On Wed, 10 Mar 2004 04:12:42 +0000 (UTC), larry cowan <larry_member@pathlink.com> wrote: >In article <c2j7a9$2kc5$1@digitaldaemon.com>, Walter says... >> >>You're right, the performance of std.stream isn't good. Want to design a new one? <g> >> >> >.. but the stacking of chars onto the string for readLine isn't the problem. > >Giving it a preallocated buffer and indexing them in does not significantly improve the speed. Yeah - there are probably bigger performance issues and I haven't actually tested any changes. But in general creating APIs that generate lots of garbage to collect is a performance problem waiting to happen. >OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins" example (#22877) is much quicker - I think because it does read's (instead of getc's trying to manage ungetc also). It is a neat start, but is incomplete. I wasn't aware of Andy's stream.d. I'm downloading "apropos" from http://ikagames.com/andy/d/ now since it looks like that is where I can find it. -Ben |
March 10, 2004 Re: performance in std.stream | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andy Friesen | In article <c2mcd0$2ebl$1@digitaldaemon.com>, Andy Friesen says... > >larry cowan wrote: > >> The writeWChar() and writeDChar() methods won't compile. > >Crazy. I'll take a look. streams.d(529): function toString overloads char[](char c) and char[](creal r) both match argument list for toString > >> It really needs overloaded this() in read and write to specify options for file >> open. > >What sort of options? Text/binary, append, and the like? Yeah, the normal filemodes. And rw+ as indicated below... And rb and wb for windoughs. > >> An open() would be nice for reopening after closing, but it's not really needed when you can just instantiate it again easily - why allow explicit close?. > >close() is necessary because D's garbage collector isn't deterministic. > The destructor can clean up if you forget, but it's not a good idea to >have an open file descriptor laying around like that. Ok, reasonable, but then you should be able to reopen it (not necessarily in the same mode. > >> Could a read/write be built for this? Or is this what the R and W inheritance >> (this(File*...)) is to support? Haven't tried to do anything with this, but >> there are seek requirements for switching back and forth if a read/write open is >> used (could be automated, but only for the common cases). > >I hadn't considered simultaneous read/write; I doubt it will work. > Not simultaneous - just on the same open(). You switch back and forth using a seek ( maybe to current point + 0L ) to clear the pointer mode and then can issue an opposite mode command, e.g., read, seek, overwrite, seek, read, read, seek-back, overwrite, overwrite, ... . This is used to maintain a work file of current events, or a fixed-size-rec database file. You can do the same with two open file pointers (read & write to same file), but bsd unix has supported this for over 20 years - and I think it came from even earlier at AT&T. You could support smart mode-switching, but usually you want to move the pointer back to the start of the record you just read, or to eof, or to front of file (to update a status rec of some kind), so that's not really useful. > -- andy |
March 10, 2004 Re: performance in std.stream | ||||
---|---|---|---|---|
| ||||
Posted in reply to larry cowan | larry cowan wrote:
> In article <c2mcd0$2ebl$1@digitaldaemon.com>, Andy Friesen says...
>
>>larry cowan wrote:
>>...
>>>Could a read/write be built for this? Or is this what the R and W inheritance
>>>(this(File*...)) is to support? Haven't tried to do anything with this, but
>>>there are seek requirements for switching back and forth if a read/write open is
>>>used (could be automated, but only for the common cases).
>>
>>I hadn't considered simultaneous read/write; I doubt it will work.
>
> Not simultaneous - just on the same open(). You switch back and forth using
> a seek ( maybe to current point + 0L ) to clear the pointer mode and then can
> issue an opposite mode command, e.g., read, seek, overwrite, seek, read, read,
> seek-back, overwrite, overwrite, ... . This is used to maintain a work file
> of current events, or a fixed-size-rec database file. You can do the same with
> two open file pointers (read & write to same file), but bsd unix has supported
> this for over 20 years - and I think it came from even earlier at AT&T. You
> could support smart mode-switching, but usually you want to move the pointer
> back to the start of the record you just read, or to eof, or to front of file
> (to update a status rec of some kind), so that's not really useful.
>
>> -- andy
Yes. Being able to do block oriented read-write to the same file is very important. I'm not sure whether it should be a part of std.stream, or whether it should be in a separate package, but it's really quite important. Without that capability one can, to take an extreme example, end up copying an entire database for a simple one character change.
|
March 10, 2004 Re: performance in std.stream | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ben Hinkle | I've been building a multi-layer IO package for DSC (D Servlet Container) that's fully buffered and generates zero garbage. The layers include: - raw byte-array style I/O - informal text-oriented Tokenizer I/O with text/binary conversion - structured binary I/O (directly to/from variables) with optional endian flipping - structured text-oriented input with data conversion - structured class serialization/deserialization framework It's quite flexible: for example, one can 'read' a line into a Token (using a LineTokenizer) and then map said Token into any of the other layers for further slicing. One can mix and match the different layers in pretty much any way that makes sense. There is no copying of data unless requested by the app, and no memory allocation other than app-allocated I/O buffers. It does random-access where the underlying system supports it (not on a socket), and is intended to support memory-mapped buffers (for *huge* files). It uses lookahead instead of unget... for those cases that need such things. I'm writing this stuff for a high-performance server (in D), so one of the primary goals is very low runtime overhead (i.e. zero memory allocation). If anyone's interested, I'll try to get the first major I/O cut out the door by, say, the end of the month. - Kris "Ben Hinkle" <bhinkle4@juno.com> wrote in message news:c257jd$1ph8$1@digitaldaemon.com... > Right now std.stream.readLine takes no inputs and returns a > char[] of the line read. Each time it gets called it builds a string > using ~=. Can there be an API that lets you pass in an > "inout char[]" argument that only gets resized if needed. That > would take a burden off the GC when reading files line by line. > > That way the existing readLine could be reimplemented as: > > char[] readLine() > { > char[] result; > readLine(result); > return result; > } > void readLine(out char[] result) > { > [fill and only reallocate result if needed] > } > > The same would go for readLineW. > -Ben > > |
Copyright © 1999-2021 by the D Language Foundation