Streams, Readers, Writers and Filters (page 6)

In article <opr958zkem5a2sq9@digitalmars.com>, Regan Heath says... > >I dont think I understand what a Writer and a Reader are exactly.. > >AFAICS you have a Source i.e. a file and a Sink i.e. a socket. The source and sink are not streams themselves, but you could and probably would wrap a stream interface around them. In fact designing the stream as a template would allow you to wrap it around anything that provided the requisite methods i.e. read() write() etc. > >Where does the Reader/Writer enter the picture? Are they simply filters (and thus streams) that convert from one data format to another? Or... > >> dchar-sequence to dchar-sequence are another kettle of fish altogether. > >By the same rationale as ubyte-sequence to ubyte-sequence aren't they a STREAM also? Just to clarify, it is generally understood that in order to considered a stream, the smallest unit you can read or write has to be ONE BYTE. A Reader is simply something which has a read() function, but for which the smallest unit it can read is ONE DCHAR. Similarly, a Writer is simply something which has a write() function, but for which the smallest unit it can write is one dchar. Readers/Writers are often not considered streams, for this reason. There are a number of good reasons why this makes sense. If you write a dchar to a socket, for example, you would have to worry about the endianness of the thing. Should you send in machine byte order? Network byte order? And - just because you only write in four-byte chunks, that doesn't guarantee that what's at the other end of the socket won't read in one-byte chunks. In actual fact, if you squirt dchars into a stream, four bytes at a time, then this is generally considered to be an encoding in its own right. The encoding is called UTF-32LE if the bytes are in little-endian order, or UTF-32BE if the bytes are in big-endian order. So, by calling it a stream, you've artificially added a new layer of encoding. In general, anything wider than a byte is not suitable for a stream (as such) because of byte-ordering issues. A Reader therefore converts a stream OF BYTES into dchars for some internal use. That is, once you've got your dchars, they stay that way, and are dealt with as such by your application. Conversely with Writer. We could call all of these things streams and leave it at that, of course, but that doesn't change the underlying problem, which is that a file, or socket (etc.) has no knowledge of character encoding standards, and therefore, conceptually, cannot store character - only bytes. To interpret those bytes as characters, you need to know the encoding standard. (There are hueristic algorithms which can take a good guess, of course, but that's beside the point). Does that help? Arcane Jill

On Fri, 25 Jun 2004 23:02:14 -0400, Ben Hinkle wrote: > Sam McCall wrote: > >> Ben Hinkle wrote: >> >>>>2) File should in any case be renamed FileStream >>> >>> >>> but what else would a File be? ;-) >>> Personally I like the analogy with stdio's FILE. >> >> Well, Java uses File to represent a path. I like the idea of >> File f=new File("C:\\something.txt"); >> Tests: >> f.exists() >> f.isFile() // exists and is regular file >> f.isDirectory() // exists and is directory >> f.isDevice() // unix devices, (windows COM1 etc?) >> f.canRead() >> f.canWrite() >> and then >> f.open("r+") // returns a FileStream >> f.open("a",false) // returns a RawFileStream, >> //optional arguments win again >> Sam > > I don't think this is one of Java's highlights, though. A File class shouldn't include directories. Users have a pretty good understanding of what the word "File" means and it doesn't include directories. I think the general consensus is that Java's File class should have been called something like Path. I'm sure that this comes from Unix and other operating systems in which a directory is just a special sort of file - one that contains a list of files. -- Derek Melbourne, Australia

> Fair enough, but the style guide says "meaningless type aliases should be avoided". E.g. alias int INT; I should think alias BufferedStream!(File) BufferedFile; is less meaningless. But is there any substantial reason for File not to be an unbuffered except havint to type BufferedFile instead of File? It seems awkward to screw with the names in this way to save a couple of keystrokes, although I like short names.

June 27, 2004

Re: Streams, Readers, Writers and Filters

Posted by Regan Heath
in reply to Arcane Jill

Permalink

Regan Heath

Posted in reply to Arcane Jill

Permalink

On Sat, 26 Jun 2004 07:52:11 +0000 (UTC), Arcane Jill <Arcane_member@pathlink.com> wrote:
> In article <opr958zkem5a2sq9@digitalmars.com>, Regan Heath says...
>>
>> I dont think I understand what a Writer and a Reader are exactly..
>>
>> AFAICS you have a Source i.e. a file and a Sink i.e. a socket. The source
>> and sink are not streams themselves, but you could and probably would wrap
>> a stream interface around them. In fact designing the stream as a template
>> would allow you to wrap it around anything that provided the requisite
>> methods i.e. read() write() etc.
>>
>> Where does the Reader/Writer enter the picture? Are they simply filters
>> (and thus streams) that convert from one data format to another? Or...
>>
>>> dchar-sequence to dchar-sequence are another kettle of fish altogether.
>>
>> By the same rationale as ubyte-sequence to ubyte-sequence aren't they a
>> STREAM also?
>
> Just to clarify, it is generally understood that in order to considered a
> stream, the smallest unit you can read or write has to be ONE BYTE. A Reader is
> simply something which has a read() function, but for which the smallest unit it
> can read is ONE DCHAR. Similarly, a Writer is simply something which has a
> write() function, but for which the smallest unit it can write is one dchar.
> Readers/Writers are often not considered streams, for this reason.
>
> There are a number of good reasons why this makes sense. If you write a dchar to
> a socket, for example, you would have to worry about the endianness of the
> thing. Should you send in machine byte order? Network byte order? And - just
> because you only write in four-byte chunks, that doesn't guarantee that what's
> at the other end of the socket won't read in one-byte chunks. In actual fact, if
> you squirt dchars into a stream, four bytes at a time, then this is generally
> considered to be an encoding in its own right. The encoding is called UTF-32LE
> if the bytes are in little-endian order, or UTF-32BE if the bytes are in
> big-endian order. So, by calling it a stream, you've artificially added a new
> layer of encoding.
>
> In general, anything wider than a byte is not suitable for a stream (as such)
> because of byte-ordering issues. A Reader therefore converts a stream OF BYTES
> into dchars for some internal use. That is, once you've got your dchars, they
> stay that way, and are dealt with as such by your application. Conversely with
> Writer.
>
> We could call all of these things streams and leave it at that, of course, but
> that doesn't change the underlying problem, which is that a file, or socket
> (etc.) has no knowledge of character encoding standards, and therefore,
> conceptually, cannot store character - only bytes. To interpret those bytes as
> characters, you need to know the encoding standard. (There are hueristic
> algorithms which can take a good guess, of course, but that's beside the point).
>
> Does that help?

Yes indeed.

I dont think you need something called a 'Reader' or something called a 'Writer' IMO they are both filters.

You said filters are streams, well, these are filters, but not streams, so I say most but not all filters are streams.

I think the simplest concepts which deal with this are the best to use, and IMO they are:

Stream - read() write() ubytes.
Filter - convert from one thing to another.

So you have Streams, and then filters applied to them, when you want to read unicode text from a socket, you simply attach your unicode filter to the socket stream. As long as the other end used the same unicode filter to encode the data into the stream yours will decode it correctly.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

On Sat, 26 Jun 2004 18:03:46 +0200, Bent Rasmussen <exo@bent-rasmussen.info> wrote: >> Fair enough, but the style guide says "meaningless type aliases should be >> avoided". > > E.g. > > alias int INT; > > I should think > > alias BufferedStream!(File) BufferedFile; > > is less meaningless. I agree. but why not. > alias BufferedStream!(RawFile) File; as in most cases people want a buffer file. But.. you may want an unbuffered one. > But is there any substantial reason for File not to be an unbuffered except > havint to type BufferedFile instead of File? It seems awkward to screw with > the names in this way to save a couple of keystrokes, although I like short > names. "File" and "RawFile" short and to the point. :0) Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

> I agree. but why not. > > > alias BufferedStream!(RawFile) File; > > as in most cases people want a buffer file. > But.. you may want an unbuffered one. I follow the principle that if a' is an extension of a, then the name of a' signals that. I don't, in general, choose a name for a which is more complex than the name of a'. Of course it is nice to have the common case have a short name, but its hardly difficult to remember it once you've learned it, and a shorter name can perhaps be found. A prominent exception is uint vs int, where the simple type has a name that expresses an exception to a less simple type, but either I'm used to it or its just so discrete that it doesn't bother me. And if both types have non-composite names, then the whole "issue" disappears (e.g. natural, integer, real.) > "File" and "RawFile" short and to the point. :0) It is. Don't mind me. :) > Regan

On Sun, 27 Jun 2004 13:19:05 +0200, Bent Rasmussen <exo@bent-rasmussen.info> wrote: >> I agree. but why not. >> >> > alias BufferedStream!(RawFile) File; >> >> as in most cases people want a buffer file. >> But.. you may want an unbuffered one. > > I follow the principle that if a' is an extension of a, then the name of a' > signals that. I don't, in general, choose a name for a which is more complex > than the name of a'. I agree, this is good logical progression. > Of course it is nice to have the common case have a short name, but its > hardly difficult to remember it once you've learned it, and a shorter name > can perhaps be found. True. Because it's logical. You see BufferedFile you guess File might exist. > A prominent exception is uint vs int, where the simple type has a name that > expresses an exception to a less simple type, Which is the simple type? uint or int? > but either I'm used to it or > its just so discrete that it doesn't bother me. And if both types have > non-composite names, then the whole "issue" disappears (e.g. natural, > integer, real.) Basically I think it's convienience vs logical progression. >> "File" and "RawFile" short and to the point. :0) > > It is. Don't mind me. :) > >> Regan > > -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Forums