June 25, 2004
Arcane Jill wrote:
> (1) Reader classes which convert ubtyes from a stream (of known encoding) into
> dchars (Unicode). You'd need one Reader for each encoding standard.
> 
> (2) Writer classes which convert dchars (Unicode) into ubytes (of some known
> encoding) to be sent to a stream (again, one for each encoding standard)
> 
> (3) Filters, as described by you, which convert ubytes into more ubytes, and can
> do completely arbitrary things.
> 
What about (4), TextFilters, that convert dchars into dchars? This seems neccesary for completeness, if nothing else ;-) But some things like a pushback filter that supports "unreading" might be applicable at the byte level in some situations and the character level in others.
Sam
June 25, 2004
Arcane Jill wrote:
> Oh - one other thing I forgot. I think we need functions like basename(),
> dirname(), pathinfo(), realpath() and so on, (stolen from PHP), and some
> function to append a pathname-component to a pathname. Of course, these things
> are dead easy to do with ordinary string manipulation ... IF you assume that the
> file separator is "/". But that won't work on a Mac.
Eh? Unless I misunderstand, it _will_ work on a mac, but not on windows... unless you mean classic Mac OS, I don't think there are any plans to port a D compiler to that?
Yes, these functions would be useful!
Sam
June 25, 2004
On Fri, 25 Jun 2004 07:08:05 +0000 (UTC), Arcane Jill <Arcane_member@pathlink.com> wrote:
> In article <opr94eenw35a2sq9@digitalmars.com>, Regan Heath says...
>
>> If we want to stream it [File], we pass it into the constructor of a Stream or
>> BufferedStream
>
> A File /IS/ a stream. How could it not be? Sorry, I just didn't understand you
> here.

That's my point. I dont think it should be a stream. fopen etc is not a stream. We want something as a drop in replacement for that, then, we write a stream class, one that will take any class that support read() write() etc.

Perhaps my idea of streams is different to the norm?

>>> 11) I want the function available(), as Java has. A buffered stream
>>> always knows
>>> how much it's got left in its buffer, and I have no problem with an
>>> unbuffered
>>> stream returning zero.
>>
>> Isn't this true for a normal unbuffered file as well. at the point of
>> opening you know how big it is. it could grow.. but until you reach that
>> initial size you know there is more or not etc.
>
> Ah - now it's I who was misunderstood. Allow me clarify. available() must return
> a number which is less than or equal to the number of bytes which may be read
> from a stream ... and this is the important part ... WITHOUT BLOCKING.
> available() MUST return immediately, without causing a thread-switch. It must
> *NOT* return the number of bytes left in a file - unless all of them are already
> buffered.

ahh.. ok this would be part of the BufferedStream class. And it would simply return the # in the buffer. Simple, easy, fast, efficient. :)

> This is SO important in bits of code which MUST NOT WAIT.
>
> Arcane Jill
>
>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 25, 2004
On Fri, 25 Jun 2004 07:23:28 +0000 (UTC), Arcane Jill <Arcane_member@pathlink.com> wrote:
> In article <opr94etton5a2sq9@digitalmars.com>, Regan Heath says...
>
>> What do you think of my filters idea, as long as you can snap any number
>> of filters to streams and each other your data will be transcoded etc from
>> one end to the other, and back again in the other direction.
>
>
> Regan, your filters are /almost/ the same idea as mango's Readers/Writers. We're
> pretty much talking the same thing here, only by a different name.

I suspected as much.

> But there is nonetheless a very important difference between the two concepts,
> which you may have missed. This is that a character sequence is a sequence of
> 32-bit-wide dchars, wheras a traditional stream is a sequence of 8-bit-wide
> bytes.

Yep. ok.

> So, at some stage, you need a "filter" which converts from ubyte[] to
> dchar[].

Yep..

> Such filters do not chain, because the output from one will not be the
> same type as the input to the next.

Who said all filters have to have the same input/ouput types, you could have a ubyte[] to dchar[] filter, and another filter foo which went from dchar[] to dchar[] doing something to it, like uppercasing it all or whatever.

> Now, you COULD insist that everything be
> done on an 8-bit stream (mandating UTF-8 as the format for actual characters),
> but there is an efficiency issue there. UTF-32 is always going to be faster to
> process than UTF-8.

I am not simply thinking of characters here, I am thinking in terms of an 8-bit stream of data, which may represent characters, but may represent something else entirely. i.e. a bzip filter for a raw data stream, you plug it into your FileStream and hey presto you have bzipped or bunzipped a file.

> Besides which - you don't NEED a chain of filters when transcoding in D, because
> one end WILL be Unicode, always.

So.. if I'm writing the binary representation of a structure out to disk which end is unicode?

> So I'd say the ideal situation would be:
>
> (1) Reader classes which convert ubtyes from a stream (of known encoding) into
> dchars (Unicode). You'd need one Reader for each encoding standard.

Only if you're reading text.. surely?

> (2) Writer classes which convert dchars (Unicode) into ubytes (of some known
> encoding) to be sent to a stream (again, one for each encoding standard)

sure, if you're writing text.

> (3) Filters, as described by you, which convert ubytes into more ubytes, and can
> do completely arbitrary things.

Yep. Perfect.

> But I don't think your 8-bit-wide filters should be trying to handle dchars.
> That's a different job. I think the above would give you maximum flexibility,
> however, without losing any efficiency. What do you think?

I agree with pretty much all you say.

I think we're just thinking about it with different priorities in mind, perhaps due to your recent excursion into unicode? :oÞ

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 25, 2004
On Fri, 25 Jun 2004 07:51:12 +0000 (UTC), Arcane Jill <Arcane_member@pathlink.com> wrote:
> In article <cbgjnv$105r$1@digitaldaemon.com>, Kris says...
>
>> May I enquire, Jill, as to why you need such functionality? I'm thinking at
>> the 50,000' level rather than the intimate details of some IO
>> implementation. It's always useful to understand the application.
>
> For example, consider a cryptographically secure random number stream. You'd
> want the ultra-secure version which always blocks until sufficient entropy is
> available - no problem there - but some folk would also want a non-blocking
> (less secure) version (like the difference between Unix's /dev/random and
> /dev/urandom). The non-blocking version would call available() on the entropy
> stream before trying to collect the entropy, in order to provide a guarantee of
> non-blocking. If bytes were available, it could read them, and be as secure as
> possible. If bytes were not available it could re-stir the exising entropy pool,
> and still return immediately. This sort of thing is absolutely crucial in
> crypto.
>
>
>
>> Secondly, if the IO were always buffered, and you had access to the content
>> thereof (plus the number of readable bytes), would that satisfy the
>> requirement?
>
> Not all streams which are able to deliver bytes on demand without waiting
> necessarily have a buffer. I have a proof-of-concept stream in my in-progress
> crypto random stuff which simply delivers bytes by calling rand(). Such a stream
> will never block, and it's available() function could simply always return 2, or
> 128, or any other arbitrary number. It does not, however, have a buffer to
> return.
>
> Access to the contents of an internal buffer implies a certain implementation.
> This assumption may not always be correct, or relevant.
>
> available(), by itself, would be enough. Thereafter you could get the "buffer
> contents" with a straightforward read().


Lets assume for the sake of it that we're talking about a FileStream.. so.. you open it, then you call available() it returns 0 as there is nothing in the buffer.. yet.. you (return immediately) and do something else.. at which point does the stream actually read something into it's buffer?

To me it seems something has to tell it to read stuff into it's buffer. Either:
  - it does this on open
  - you have another thread constantly polling it telling it to read
  - something else?

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 25, 2004
On Fri, 25 Jun 2004 21:43:46 +1200, Sam McCall <tunah.d@tunah.net> wrote:

> Arcane Jill wrote:
>> Oh - one other thing I forgot. I think we need functions like basename(),
>> dirname(), pathinfo(), realpath() and so on, (stolen from PHP), and some
>> function to append a pathname-component to a pathname. Of course, these things
>> are dead easy to do with ordinary string manipulation ... IF you assume that the
>> file separator is "/". But that won't work on a Mac.
> Eh? Unless I misunderstand, it _will_ work on a mac, but not on windows... unless you mean classic Mac OS, I don't think there are any plans to port a D compiler to that?
> Yes, these functions would be useful!
> Sam

C's stdlib.h on windows has:

_fullpath - figures the absolute path of a given relative path
_makepath - creates a path name from components
_splitpath - break a path name into components

I don't believe they are ANSI (as indicated in the docs and by the _ on the front of the fn name)

I have written all the fns above before in C. Typically using a #define SEP '/' or #define SEP '\\'. Plus functions to un-mix seperators in any given path (you cant trust users to type anything right!)

I reckon they'd be dead simple in D. I might give them a go when I have some spare time.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 25, 2004
In article <cbgm5b$13rn$1@digitaldaemon.com>, Kris says...
>
>"Arcane Jill"  wrote
>> Not all streams which are able to deliver bytes on demand without waiting necessarily have a buffer. I have a proof-of-concept stream in my
>in-progress
>> crypto random stuff which simply delivers bytes by calling rand(). Such a
>stream
>> will never block, and it's available() function could simply always return
>2, or
>> 128, or any other arbitrary number. It does not, however, have a buffer to return.
>
>Right; poor phrasing on my part. In terms of D exposure, would something like an IAvailable interface suffice?

I don't know, because I don't know what IAvailable does. I suspect not, however.
Essentially, I need to write a non-blocking stream filter, which transforms one
(underlying) stream into another. Thus, you'd construct a Jill's Stream from a
std.stream - which guarantees that the underlying stream has read(), write(),
etc., (and available(), if such a function is added to stream).

Now, if, instead, Jill's Stream were to be constructed from an IAvailable, then
you'd guarantee that underlying - thing - had an available() function, but,
unless I've misunderstood, it would NOT have read() and write(), which kinda
makes it useless.

But I don't see the problem with simply adding available() to std.stream. It's a VERY easy function to add - in most cases it could be implemented as { return 0; }


>If so, what about the equivalent for
>writing? Is there a similar need to never perform a thread-switch?
>
>- Kris

I can't think of one. I think that the fact that Java has available() for reading, but not for writing, is a clue that this is kind of the desirable thing to do. (Not that Java always gets things right, of course, but in this case it would be spot on for my needs).

Arcane Jill




June 25, 2004
In article <opr95cp4td5a2sq9@digitalmars.com>, Regan Heath says...
>
>> A File /IS/ a stream. How could it not be? Sorry, I just didn't
>> understand you
>> here.
>
>That's my point. I dont think it should be a stream. fopen etc is not a stream. We want something as a drop in replacement for that, then, we write a stream class, one that will take any class that support read() write() etc.

That's what most of us mean by "stream". A stream is simply something that does
read() and write().


>Perhaps my idea of streams is different to the norm?

Could be.



>ahh.. ok this would be part of the BufferedStream class. And it would simply return the # in the buffer. Simple, easy, fast, efficient. :)

Actually, it should return the number in the buffer PLUS the value returned by the underlying stream's available() function - because the buffered stream can get at least that many bytes from the underlying stream without blocking, and use those bytes to refill its own buffer. (That's assuming that the buffered stream is happy to get less than a bufferful at a time from the underlying stream).

And, the function should be present in unbuffered streams too, and in these cases it should return zero.

But yes - simple, easy, fast and efficient. And useful.

Jill



June 25, 2004
In article <opr95c8xsi5a2sq9@digitalmars.com>, Regan Heath says...

>Lets assume for the sake of it that we're talking about a FileStream.. so.. you open it, then you call available() it returns 0 as there is nothing in the buffer.. yet.. you (return immediately) and do something else.. at which point does the stream actually read something into it's buffer?
>
>To me it seems something has to tell it to read stuff into it's buffer. Either:
>   - it does this on open
>   - you have another thread constantly polling it telling it to read
>   - something else?
>

Correct. Think of an entropy stream like a pipe, or a socket. One process pulls stuff out. Another process pushes stuff in. You could run an entropy-gathering demon in the background, if you chose.

Alternatively, a blocking entropy generator could, on blocking, unstruct the user to "wiggle that mouse" to feed the entropy pool until there's enough there.


Strong crypto is hard to get right. But I intend to try, and do better than most.

Jill


June 25, 2004
Arcane Jill wrote:

> In article <cbf0lp$1lvi$1@digitaldaemon.com>, Ben Hinkle says...
> 
>>I'd first like to see what Sean does with std.stream plus I tend to agree with Matthew that more discussion is needed before jumping too soon in any one direction. There's a lot of cool stuff in mango that I'd love to see somehow merged with std.stream if possible. Maybe I'll give a poke at porting Kris's tokenizers and endian stuff to std.stream just to see what it looks like.
> 
> I'd be quite happy if std.stream were to be improved. Here are some suggestions. You'll probably think that many of them are trivial, but each, in their own way, contributes a small amount of annoyance, and I'm sure these things could be easily got rid of.
> 
> 1) Since it is more normal to want buffered file access than non-buffered
> file access (in C, fopen() is called more often than open()), it makes
> sense that File should be buffered by default, and there should be a
> separate class, maybe called RawFileStream or something, for the
> unbuffered case.

Funny you should suggest RawFileStream because the first version of stream.d that I sent Walter had "RawFile" and "File" instead of "File" and "BufferedFile". I decided to go with File and BufferedFile for backwards compatiblity and to avoid buffering stdin/out (unless the type of stdin/out was changed to Stream instead of File). Going back to buffering by default is probably better long-term.

> 2) File should in any case be renamed FileStream

but what else would a File be? ;-)
Personally I like the analogy with stdio's FILE.

> 3) FileMode.In and FileMode.Out should be renamed Filemode.IN and Filemode.OUT respectively.

why not FileMode? The dmd "style guide" page indicates FileMode would be preferred. The style guide also says all enums should be caps so IN/OUT seems right (though I tend to think we should move away from the historical baggage of C's preprocessor since FileMode.In doesn't look to me like a variable anything else besides a constant).

> 4) It should be possible to construct a File object in create mode, in one
> step. As in File f = new File(filename, FileMode.CREATE);

yup. or'able with In/Out.

> 5) In fact, all possible combinations of file opening supported by fopen() should be supported by File. It should be possible to assert that the file does or does not exist before opening it (atomically), to truncate or not truncate, to position the file pointer at the start or end of the file, to allow append-only access, etc.
>
> 6) The destructor should always close the file
>
> 7) EITHER Stream classes should be auto (likely to be an unpopular suggestion, I know), OR there should be an auto wrapper class that you can construct from a Stream, in order to guarantee that the file will be closed in the event of an exception (which could of course be thrown by ANY piece of code). Currently we have to either roll our own auto wrapper, or use a try/catch block.
> 
> 8) Documentation should be complete and accurate.
> 
> 9) There should be a FilterStream class, from which BufferedStream
> inherits, so that we can write our own stream filters. (Java does this.
> It's neat).
> 
> 10) Streams don't necessarily have to do transcoding (see - I learnt a new word), but nonetheless it should be POSSIBLE to construct them from a Reader/Writer in order to make such extensions possible in the future.
> 
> 11) I want the function available(), as Java has. A buffered stream always knows how much it's got left in its buffer, and I have no problem with an unbuffered stream returning zero.
> 
> 12) stdin, stdout and stderr should be globally available D streams.
> (Maybe they are already, but point (8) means there's a lot I don't know
> about existing capabilities)

All look very reasonable.

> 13) Streams should overload the << and >> operators. (Someone suggested
> using ~. That would be fine too).

I think Walter is hoping the typesafe varargs changes will remove a important motivation for adding << and >> (though there is the question of run-time vs compile-time safety). I have been playing around with making std.stream's printf typesafe and that's why I was trying to rebuild the phobos unittests.

> None of these is particularly difficult in and of itself, but together they add up to a frustrating gripe list. But I'm fairly confident that if these flaws are fixed (along with any other gripes which others may mention in the course of this thread) then I imagine that most people will be pretty happy with new improved std.stream.
> 
> 
> 
> 
> 
>>This will probably open up rat-holes, but two quick examples of things to discuss:
>>
>>1) in mango it looks like to open a file and read it you need to create a FileConduit and pass that to a Reader constructor. So you have to grok the difference between Conduits and Readers/Writers (and maybe Buffers? I notice IConduit has a createBuffer method so is it not buffered by default? I'm not sure). In std.stream you make one object and there is less to grok. The flexibility of mango is probably nice but it adds complexity. Each person has a different notion of where to draw the boundaries.
> 
> But there is logic behind it. Currently, D does no transcoding - that is, writeLine() will spit out raw UTF-8. Now that's fine if your output is going to a text file, but if it's going to a console, you're screwed. Now you COULD simplify this a bit by "automatically" encoding the output in the operating system default encoding - but that would just reverse the problem. Now, output to the console would be fine, but output destined to leave your machine and end up on someone else's machine (e.g. text file, socket, etc.) would also be similarly munged. UTF-8 is pretty much the best portable format, so ideally you only want to encode at the last minute, just before the stream hits the user.
> 
> 
> 
>>2) in mango to use object serialization/deserialization you register an instance of a class so that means at startup you basically have to instantiate one of every class that might want to be deserialized. Seems wastful and it could affect class design to avoid having classes that have interdependencies.
> 
> I'm not convinced that serialization necessarily has anything to do with streams. You could serialize to a string, or an in-memory buffer. I guess that would be faster for small objects but disadvantageous for very large ones. In any case, you don't need to decide on a firm serialization policy in order to make streams feel nice. That can come later, once we're happy with the basics.
> 
> Arcane Jill