non-seekable streams and size() - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » non-seekable streams and size()

Thread overview

non-seekable streams and size()
Apr 17, 2005 Ben Hinkle
Apr 17, 2005 Andrew Fedoniouk
Apr 17, 2005 Ben Hinkle
Apr 17, 2005 Andrew Fedoniouk
Apr 17, 2005 Ben Hinkle
Apr 17, 2005 Georg Wrede
Apr 17, 2005 Andrew Fedoniouk
Apr 17, 2005 Andrew Fedoniouk
Apr 17, 2005 Georg Wrede
Apr 18, 2005 Andrew Fedoniouk
Apr 17, 2005 Georg Wrede
Apr 18, 2005 Regan Heath
Apr 18, 2005 Andrew Fedoniouk
Apr 18, 2005 Regan Heath
Apr 18, 2005 Ben Hinkle
Apr 18, 2005 Andrew Fedoniouk
Apr 18, 2005 Ben Hinkle
Apr 18, 2005 Georg Wrede
Apr 18, 2005 Ben Hinkle
Apr 18, 2005 Ben Hinkle
Apr 18, 2005 Regan Heath
Apr 18, 2005 Georg Wrede
Apr 18, 2005 Regan Heath
Apr 18, 2005 Georg Wrede
Apr 18, 2005 Andrew Fedoniouk

April 17, 2005

non-seekable streams and size()

Posted by Ben Hinkle

Ben Hinkle

What should size() of a non-seekable stream return or do? Currently it depends on the stream type: for a general stream it throws a SeekException and for a File on Windows it returns 0 (which is just what GetFileSize returns for non-seekable streams like pipes). I'm tempted to have it return ulong.max. Any objections?

While I'm at it I'm making eof testing more efficient for both seekable and non-seekable streams by using the convention that if readBlock returns 0 then the stream is at eof (and I'd like to document that). Technically that wasn't part of the existing readBlock's documentation but it's what happens in practice and it comes in handy with non-seekable streams.

April 17, 2005

Re: non-seekable streams and size()

Posted by Andrew Fedoniouk
in reply to Ben Hinkle

Andrew Fedoniouk

Posted in reply to Ben Hinkle

Out of scope probably....

Imho, "seekable" stream is a nonsense.

If stream is seakable then it is a vector.
Almost in all cases such stream could be represented
as char[] or wchar[], etc. MM files allows to expand
this not only on heap memory but to the file access.

For text IO it makes sense to support simple idiom of formatting Writer and Reader's.

class Writer { this(IPutChar inp){} uint writef(...) {}  }
class Reader { this(IGetChar outp){} uint readf(...) {}  }

I guess this is just enough for implementation of stdio/stdout style of applications.

C++ <stream> and co. are so universal, theoretical and generic
that it is almost not used in real life in pure form.
These << and >> are sounds good for first semester student
but is a nightmare when you will try to output/input something
formatted for real life. And yet << and >> are "poor C++ man"
approach to handle types of unisex arguments.

Our old friends printf/writef and scanf/readf
are time proven and do realy work. In D
when you have (seems like :-) acces to TypeInfo of arguments
writef/readf are just perfect - compact and powerfull.

a?

IMHO, IMHO and again IMHO.

Andrew.

"Ben Hinkle" <ben.hinkle@gmail.com> wrote in message news:d3u2i7$1vbi$1@digitaldaemon.com...
> What should size() of a non-seekable stream return or do? Currently it depends on the stream type: for a general stream it throws a SeekException and for a File on Windows it returns 0 (which is just what GetFileSize returns for non-seekable streams like pipes). I'm tempted to have it return ulong.max. Any objections?
>
> While I'm at it I'm making eof testing more efficient for both seekable and non-seekable streams by using the convention that if readBlock returns 0 then the stream is at eof (and I'd like to document that). Technically that wasn't part of the existing readBlock's documentation but it's what happens in practice and it comes in handy with non-seekable streams.
>

April 17, 2005

Re: non-seekable streams and size()

Posted by Georg Wrede
in reply to Ben Hinkle

Georg Wrede

Posted in reply to Ben Hinkle

Size() implies seekability.

Someone using size() on non-seekable streams is making a programmer error, IMHO. My suggestion is a non-quenchable error.


Ben Hinkle wrote:
> What should size() of a non-seekable stream return or do? Currently it depends on the stream type: for a general stream it throws a SeekException and for a File on Windows it returns 0 (which is just what GetFileSize returns for non-seekable streams like pipes). I'm tempted to have it return ulong.max. Any objections?
> 
> While I'm at it I'm making eof testing more efficient for both seekable and non-seekable streams by using the convention that if readBlock returns 0 then the stream is at eof (and I'd like to document that). Technically that wasn't part of the existing readBlock's documentation but it's what happens in practice and it comes in handy with non-seekable streams. 
> 
>

April 17, 2005

Re: non-seekable streams and size()

Posted by Ben Hinkle
in reply to Andrew Fedoniouk

Ben Hinkle

Posted in reply to Andrew Fedoniouk

"Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d3u9cf$25du$1@digitaldaemon.com...
> Out of scope probably....
>
> Imho, "seekable" stream is a nonsense.
>
> If stream is seakable then it is a vector.

Files on disk are seekable and they can be too large or too cumbersome to fit into memory.

> Almost in all cases such stream could be represented
> as char[] or wchar[], etc. MM files allows to expand
> this not only on heap memory but to the file access.

The classic example is a large file of binary data organized into many chunks of the same size (ie a huge array of structs on disk). Random access to such data requires seeking. Is such a situation infrequent enough to be ignored? It's a reasonable question. Some APIs don't allow random access and instead have some streams support a mark/reset API.

> For text IO it makes sense to support simple idiom of formatting Writer and Reader's.
>
> class Writer { this(IPutChar inp){} uint writef(...) {}  }
> class Reader { this(IGetChar outp){} uint readf(...) {}  }
>
> I guess this is just enough for implementation of stdio/stdout style of applications.

Std.stream has writef and scanf in OutputStream and InputStream interfaces and implemented in Stream. Suggestions for improving InputStream and OutputStream are always welcome.

> C++ <stream> and co. are so universal, theoretical and generic
> that it is almost not used in real life in pure form.
> These << and >> are sounds good for first semester student
> but is a nightmare when you will try to output/input something
> formatted for real life. And yet << and >> are "poor C++ man"
> approach to handle types of unisex arguments.

It will probably be a while (if ever) before << and >> become part of std.stream.

> Our old friends printf/writef and scanf/readf
> are time proven and do realy work. In D
> when you have (seems like :-) acces to TypeInfo of arguments
> writef/readf are just perfect - compact and powerfull.

agreed.

> a?

?

> IMHO, IMHO and again IMHO.

no problem.

> Andrew.
>
> "Ben Hinkle" <ben.hinkle@gmail.com> wrote in message news:d3u2i7$1vbi$1@digitaldaemon.com...
>> What should size() of a non-seekable stream return or do? Currently it depends on the stream type: for a general stream it throws a SeekException and for a File on Windows it returns 0 (which is just what GetFileSize returns for non-seekable streams like pipes). I'm tempted to have it return ulong.max. Any objections?
>>
>> While I'm at it I'm making eof testing more efficient for both seekable and non-seekable streams by using the convention that if readBlock returns 0 then the stream is at eof (and I'd like to document that). Technically that wasn't part of the existing readBlock's documentation but it's what happens in practice and it comes in handy with non-seekable streams.
>>
>
>

April 17, 2005

Re: non-seekable streams and size()

Posted by Andrew Fedoniouk
in reply to Ben Hinkle

Andrew Fedoniouk

Posted in reply to Ben Hinkle

Hi, Ben, see below:

"Ben Hinkle" <ben.hinkle@gmail.com> wrote in message news:d3udbh$2945$1@digitaldaemon.com...
>
> "Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d3u9cf$25du$1@digitaldaemon.com...
>> Out of scope probably....
>>
>> Imho, "seekable" stream is a nonsense.
>>
>> If stream is seakable then it is a vector.
>
> Files on disk are seekable and they can be too large or too cumbersome to fit into memory.

Ummm.... memory mapped files ( at least in Win32 ) are not mapped in the
whole.
Only 4k pages you are getting access to.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dngenlib/html/msdn_manamemo.asp
So it is not an issue.

>
>> Almost in all cases such stream could be represented
>> as char[] or wchar[], etc. MM files allows to expand
>> this not only on heap memory but to the file access.
>
> The classic example is a large file of binary data organized into many chunks of the same size (ie a huge array of structs on disk). Random access to such data requires seeking. Is such a situation infrequent enough to be ignored? It's a reasonable question. Some APIs don't allow random access and instead have some streams support a mark/reset API.

What is wrong with classic fread/fwrite in "rb"/"wb" modes ? They just work.

>
>> For text IO it makes sense to support simple idiom of formatting Writer and Reader's.
>>
>> class Writer { this(IPutChar inp){} uint writef(...) {}  }
>> class Reader { this(IGetChar outp){} uint readf(...) {}  }
>>
>> I guess this is just enough for implementation of stdio/stdout style of applications.
>
> Std.stream has writef and scanf in OutputStream and InputStream interfaces and implemented in Stream. Suggestions for improving InputStream and OutputStream are always welcome.

Text IO and binary IO are, IMHO, too different entities and it is better to do not mix them and to use something like this:

class writer { this(IPutChar inp){} uint writef(...) {}  }
class reader { this(IGetChar outp){} uint readf(...) {}  }

class bin_writer { this(IPutByte inp){} uint write(...) {}  }
class bin_reader { this(IGetByte outp){} uint read(...) {}  }

The main difference of bin_writer/reader from fread/fwrite is that they use some uniform format for binary data common for little/big endians.

Text reader/writer should take care about encodings.

Various implementations of IPutChar and  IPutByte - this all we
need.
Like:

      IGetByte File.byteSrc():
      IGetChar File.charSrc():
      IGetByte Socket.byteSrc():
      IGetChar Socket.charSrc():

      IGetByte byteSrc(ubyte[]):
      IGetChar charSrc(ubyte[]):

interface IGetChar
{
    bool fetch(out dchar c);
}
interface IGetByte
{
    bool fetch(out ubyte b);
}

interface IPutChar
{
    bool store(dchar c);
}
interface IPutByte
{
    bool store(ubyte b);
}

>
>> C++ <stream> and co. are so universal, theoretical and generic
>> that it is almost not used in real life in pure form.
>> These << and >> are sounds good for first semester student
>> but is a nightmare when you will try to output/input something
>> formatted for real life. And yet << and >> are "poor C++ man"
>> approach to handle types of unisex arguments.
>
> It will probably be a while (if ever) before << and >> become part of std.stream.

Please don't do that. If anyone needs this idiom (e.g. Mango) then opShl and opShr implementation is just matter of minutes in some particular place knows about what format to use and how exactly to emit/inject stuff.

>
>> Our old friends printf/writef and scanf/readf
>> are time proven and do realy work. In D
>> when you have (seems like :-) acces to TypeInfo of arguments
>> writef/readf are just perfect - compact and powerfull.
>
> agreed.
>
>> a?
>
> ?

:) Nothing, eh?

April 17, 2005

Re: non-seekable streams and size()

Posted by Ben Hinkle
in reply to Andrew Fedoniouk

Ben Hinkle

Posted in reply to Andrew Fedoniouk

"Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d3ugvl$2cd2$1@digitaldaemon.com...
> Hi, Ben, see below:
>
> "Ben Hinkle" <ben.hinkle@gmail.com> wrote in message news:d3udbh$2945$1@digitaldaemon.com...
>>
>> "Andrew Fedoniouk" <news@terrainformatica.com> wrote in message news:d3u9cf$25du$1@digitaldaemon.com...
>>> Out of scope probably....
>>>
>>> Imho, "seekable" stream is a nonsense.
>>>
>>> If stream is seakable then it is a vector.
>>
>> Files on disk are seekable and they can be too large or too cumbersome to fit into memory.
>
> Ummm.... memory mapped files ( at least in Win32 ) are not mapped in the
> whole.
> Only 4k pages you are getting access to.
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dngenlib/html/msdn_manamemo.asp
> So it is not an issue.

well, true that you can map and unmap different parts of the file. I was lumping that into "cumbersome" but I suppose it isn't that bad.

>>
>>> Almost in all cases such stream could be represented
>>> as char[] or wchar[], etc. MM files allows to expand
>>> this not only on heap memory but to the file access.
>>
>> The classic example is a large file of binary data organized into many chunks of the same size (ie a huge array of structs on disk). Random access to such data requires seeking. Is such a situation infrequent enough to be ignored? It's a reasonable question. Some APIs don't allow random access and instead have some streams support a mark/reset API.
>
> What is wrong with classic fread/fwrite in "rb"/"wb" modes ? They just work.

seekable streams just work, too :-)

>>
>>> For text IO it makes sense to support simple idiom of formatting Writer and Reader's.
>>>
>>> class Writer { this(IPutChar inp){} uint writef(...) {}  }
>>> class Reader { this(IGetChar outp){} uint readf(...) {}  }
>>>
>>> I guess this is just enough for implementation of stdio/stdout style of applications.
>>
>> Std.stream has writef and scanf in OutputStream and InputStream interfaces and implemented in Stream. Suggestions for improving InputStream and OutputStream are always welcome.
>
> Text IO and binary IO are, IMHO, too different entities and it is better to do not mix them and to use something like this:
>
> class writer { this(IPutChar inp){} uint writef(...) {}  }
> class reader { this(IGetChar outp){} uint readf(...) {}  }
>
> class bin_writer { this(IPutByte inp){} uint write(...) {}  }
> class bin_reader { this(IGetByte outp){} uint read(...) {}  }
>
> The main difference of bin_writer/reader from fread/fwrite is that they use some uniform format for binary data common for little/big endians.

EndianStream allows custom control of the binary data endianess - and it covers the endianess of wchar strings, too.

> Text reader/writer should take care about encodings.

Since D is UTF-centric so too is std.stream - although it is missing the dchar functions. It would be nice if phobos had some helpers for managing encodings, but that's a slightly messy area to get into.

> Various implementations of IPutChar and  IPutByte - this all we
> need.
> Like:
>
>      IGetByte File.byteSrc():
>      IGetChar File.charSrc():
>      IGetByte Socket.byteSrc():
>      IGetChar Socket.charSrc():
>
>      IGetByte byteSrc(ubyte[]):
>      IGetChar charSrc(ubyte[]):
>
> interface IGetChar
> {
>    bool fetch(out dchar c);
> }
> interface IGetByte
> {
>    bool fetch(out ubyte b);
> }
>
> interface IPutChar
> {
>    bool store(dchar c);
> }
> interface IPutByte
> {
>    bool store(ubyte b);
> }

That's a reasonable approach (assuming the rest of the API would be rich enough to do all the things std.stream does). I think Mango does something similar though I can't remember. I tend to like the simplicity of std.stream. You just get a File (or whatever) and use it. Plus there is enough overlap between all the text_read/bin_read/text_write/bin_write that personally I think it makes sense to lump everything together. If anything the file std.stream is getting a tad large so maybe some of the less common streams can go into a different module.

April 17, 2005

Re: non-seekable streams and size()

Posted by Georg Wrede
in reply to Ben Hinkle

Georg Wrede

Posted in reply to Ben Hinkle

Ben Hinkle wrote:
> "Andrew Fedoniouk" <news@terrainformatica.com> wrote:

>> Almost in all cases such stream could be represented as char[] or
>> wchar[], etc. MM files allows to expand this not only on heap
>> memory but to the file access.
> 
> The classic example is a large file of binary data organized into
> many chunks of the same size (ie a huge array of structs on disk).
> Random access to such data requires seeking. Is such a situation
> infrequent enough to be ignored? It's a reasonable question. Some
> APIs don't allow random access and instead have some streams support
> a mark/reset API.

IMHO, the more things grow, the more things grow.

Hard disks will stay larger than memory, and therefore we cannot start relying on MM files only.

Seekability has "always" been one of the cornerstones in file handling. I'd (almost) go as far as saying, that no serious RDBMS can be built without seekability. Since D is a "systems language", there's no way we can skip seekability.

(We all do want Oracle to be ported to D, don't we? :-) )

However, any input where you don't know the size of the entire input, seeking is something you don't do. (And don't let the VB-guy try to do.)

April 17, 2005

Re: non-seekable streams and size()

Posted by Andrew Fedoniouk
in reply to Georg Wrede

Andrew Fedoniouk

Posted in reply to Georg Wrede

> IMHO, the more things grow, the more things grow.
>
> Hard disks will stay larger than memory, and therefore we cannot start relying on MM files only.

Yes. Not only.

But...

Please read rationale in Konstantin Knizhnik FastDB http://www.garret.ru/~knizhnik/fastdb/FastDB.htm

Andrew.

April 17, 2005

Re: non-seekable streams and size()

Posted by Andrew Fedoniouk
in reply to Andrew Fedoniouk

Andrew Fedoniouk

Posted in reply to Andrew Fedoniouk

Sorry this is the URL http://www.garret.ru/~knizhnik/fastdb.html

April 17, 2005

Re: non-seekable streams and size()

Posted by Georg Wrede
in reply to Andrew Fedoniouk

Georg Wrede

Posted in reply to Andrew Fedoniouk

Thanks for the link. I'll read that as soon as I have time. Looks promising for quite a few projects of mine!


Andrew Fedoniouk wrote:
> Sorry this is the URL
> http://www.garret.ru/~knizhnik/fastdb.html 
> 
>

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation