[RFC] I/O and Buffer Range (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » [RFC] I/O and Buffer Range (page 3)

January 06, 2014

Re: [RFC] I/O and Buffer Range

Posted by Dmitry Olshansky
in reply to Jason White

Dmitry Olshansky

Posted in reply to Jason White

06-Jan-2014 09:41, Jason White пишет:
> On Sunday, 5 January 2014 at 13:30:59 UTC, Dmitry Olshansky wrote:
>> I my view text implies something like:
>>
>> void write(const(char)[]);
>> size_t read(char[]);
>>
>> And binary would be:
>>
>> void write(const(ubyte)[]);
>> size_t read(ubyte[]);
>>
>> Should not clash.
>
> Those would do the same thing for either text or binary data. When I say
> text writing, I guess I mean the serialization of any type to text (like
> what std.stdio.write does):
>
>      void write(T)(T value);         // Text writing
>      void write(const(ubyte)[] buf); // Binary writing
>
>      write([1, 2, 3]); // want to write "[1, 2, 3]"
>                        // but writes "\x01\x02\x03"
>
> This clashes. We need to be able to specify if we want to write/read a
> text representation or just the raw binary data. In the above case, the
> most specialized overload will be called.

Ok, now I see. In my eye though serialization completely hides raw stream write.

So:
struct SomeStream{
    void write(const(ubyte)[] data...);
}

struct Serializer(Stream){
    void write(T)(T value); //calls stream.write inside of it
private:
    Stream stream;
}

>> In-memory array IMHO better not pretend to be a stream. This kind of
>> wrapping goes in the wrong direction (losing capabilities). Instead
>> wrapping a stream and/or array as a buffer range proved to me to be
>> more natural (extending capabilities).
>
> Shouldn't buffers/arrays provide a stream interface in addition to
> buffer-specific operations?

I think it may be best not to. Buffer builds on top of unbuffered stream. If there is a need to do large reads it may as well use naked stream and not worry about extra copying happening in the buffer layer.

I need to think on this. Seeing as lookahead + seek could be labeled as read even though it's not.

> I don't see why it would conflict with a
> range interface. As I understand it, ranges act on a single element at a
> time while streams act on multiple elements at a time. For ArrayBuffer
> in datapicked, a stream-style read is just lookahead(n) and cur += n.
> What capabilities are lost?

In short - lookahead is slicing, read would be copying.
For me prime capability of an array is slicing that is dirt cheap O(1). On the other hand stream interface is all about copying bytes to the user provided array.

In this setting it means that if you want to wrap array as stream, then it must follow generic stream interface. The latter cannot and should not think of slicing and the like. Then while wrapping it in some adapter up the chain it's no longer seen as array (because adapter is generic too and is made for streams). This is what I call capability loss.

> If buffers/arrays provide a stream interface, then they can be used by
> code that doesn't directly need the buffering capabilities but would
> still benefit from them.

See above - it would be better if the code was written for ranges not streams. Then e.g. slicing of buffer range on top of array works just as cheap as it was for arrays. And zero copies are made (=performance).

>>> Currently, std.stdio has all three of
>>> those facets rolled into one.
>>
>> Locking though is a province of shared and may need a bit more thought.
>
> Locking of streams is something that I haven't explored too deeply yet.
> Streams that communicate with the OS certainly need locking as thread
> locality makes no difference there.

Actually these objects do just fine, since OS does the locking (or makes sure of something equivalent). If your stream is TLS there is no need for extra locking at all (no interleaving of I/O calls is possible) regardless of its kind.

Shared instances would need locking as 2 threads may request some operation, and as OS locks only on per sys-call basis something cruel may happen in the code that deals with buffering etc.

-- 
Dmitry Olshansky

January 07, 2014

Re: [RFC] I/O and Buffer Range

Posted by Jason White
in reply to Dmitry Olshansky

Jason White

Posted in reply to Dmitry Olshansky

On Monday, 6 January 2014 at 10:26:27 UTC, Dmitry Olshansky wrote:
> Ok, now I see. In my eye though serialization completely hides raw stream write.
>
> So:
> struct SomeStream{
>     void write(const(ubyte)[] data...);
> }
>
> struct Serializer(Stream){
>     void write(T)(T value); //calls stream.write inside of it
> private:
>     Stream stream;
> }

I was thinking it should also have "alias stream this;", but maybe that's not the best thing to do for a serializer.

I concede, I've s/(read|write)Data/\1/g on

    https://github.com/jasonwhite/io/blob/master/src/io/file.d

and it now works on Windows with useful exception messages.

>> Shouldn't buffers/arrays provide a stream interface in addition to
>> buffer-specific operations?
>
> I think it may be best not to. Buffer builds on top of unbuffered stream. If there is a need to do large reads it may as well use naked stream and not worry about extra copying happening in the buffer layer.
>
> I need to think on this. Seeing as lookahead + seek could be labeled as read even though it's not.
>
>> I don't see why it would conflict with a
>> range interface. As I understand it, ranges act on a single element at a
>> time while streams act on multiple elements at a time. For ArrayBuffer
>> in datapicked, a stream-style read is just lookahead(n) and cur += n.
>> What capabilities are lost?
>
> In short - lookahead is slicing, read would be copying.
> For me prime capability of an array is slicing that is dirt cheap O(1). On the other hand stream interface is all about copying bytes to the user provided array.
>
> In this setting it means that if you want to wrap array as stream, then it must follow generic stream interface. The latter cannot and should not think of slicing and the like. Then while wrapping it in some adapter up the chain it's no longer seen as array (because adapter is generic too and is made for streams). This is what I call capability loss.
>
>> If buffers/arrays provide a stream interface, then they can be used by
>> code that doesn't directly need the buffering capabilities but would
>> still benefit from them.
>
> See above - it would be better if the code was written for ranges not streams. Then e.g. slicing of buffer range on top of array works just as cheap as it was for arrays. And zero copies are made (=performance).

Okay, I see. I'm just concerned about composability. I'll have to think more about how it's affected.

(BTW, you can probably simplify lookahead/lookbehind with look(ptrdiff_t n) where the sign of n indicates ahead/behind.)

> Actually these objects do just fine, since OS does the locking (or makes sure of something equivalent). If your stream is TLS there is no need for extra locking at all (no interleaving of I/O calls is possible) regardless of its kind.
>
> Shared instances would need locking as 2 threads may request some operation, and as OS locks only on per sys-call basis something cruel may happen in the code that deals with buffering etc.

Oh yeah, you're right.

As a side note: I would love to get a kick-ass I/O stream package into Phobos. It could replace std.stream as well as std.stdio. Stuff like serializers and lexers would be more robust and easier to write.

January 07, 2014

Re: [RFC] I/O and Buffer Range

Posted by Dmitry Olshansky
in reply to Jason White

Dmitry Olshansky

Posted in reply to Jason White

07-Jan-2014 11:59, Jason White пишет:
> On Monday, 6 January 2014 at 10:26:27 UTC, Dmitry Olshansky wrote:
>> Ok, now I see. In my eye though serialization completely hides raw
>> stream write.
>>
>> So:
>> struct SomeStream{
>>     void write(const(ubyte)[] data...);
>> }
>>
>> struct Serializer(Stream){
>>     void write(T)(T value); //calls stream.write inside of it
>> private:
>>     Stream stream;
>> }
>
> I was thinking it should also have "alias stream this;", but maybe
> that's not the best thing to do for a serializer.
>
> I concede, I've s/(read|write)Data/\1/g on
>
>      https://github.com/jasonwhite/io/blob/master/src/io/file.d
>
> and it now works on Windows with useful exception messages.

Cool, got to steal sysErrorString too! :)

>> Actually these objects do just fine, since OS does the locking (or
>> makes sure of something equivalent). If your stream is TLS there is no
>> need for extra locking at all (no interleaving of I/O calls is
>> possible) regardless of its kind.
>>
>> Shared instances would need locking as 2 threads may request some
>> operation, and as OS locks only on per sys-call basis something cruel
>> may happen in the code that deals with buffering etc.
>
> Oh yeah, you're right.
>
> As a side note: I would love to get a kick-ass I/O stream package into
> Phobos. It could replace std.stream as well as std.stdio.

Then our goals are aligned. Be sure to take a peek at (if you haven't already):
https://github.com/schveiguy/phobos/blob/new-io/std/io.d

I have my share criticisms for it but it's a nice piece of work that addresses many questions of I/O I've yet to consider.

> Stuff like
> serializers and lexers would be more robust and easier to write.

Indeed and even beyond that.

-- 
Dmitry Olshansky

January 09, 2014

Re: [RFC] I/O and Buffer Range

Posted by Brian Schott
in reply to Dmitry Olshansky

Brian Schott

Posted in reply to Dmitry Olshansky

My experimental lexer generator now uses the buffer range. https://github.com/Hackerpilot/Dscanner/tree/NewLexer/stdx

The problem that I have with it right now is that range.lookbehind(1).length != range.lookahead(1).length. This was confusing.

January 12, 2014

Re: [RFC] I/O and Buffer Range

Posted by Dmitry Olshansky
in reply to Brian Schott

Dmitry Olshansky

Posted in reply to Brian Schott

09-Jan-2014 13:23, Brian Schott пишет:
> My experimental lexer generator now uses the buffer range.
> https://github.com/Hackerpilot/Dscanner/tree/NewLexer/stdx
>

Cool!

A minor note:
https://github.com/Hackerpilot/Dscanner/blob/NewLexer/stdx/d/lexer.d#L487

lookahead(n) should always give a slice of length n,
or 0 so you may as well test for != 0.

In general you should avoid marking too often, it takes a bit of .

I'm currently in favor of my 2nd design where marking is replaced by .save returning an independent view of buffer, making Buffer a normal forward range that is cheap to copy.

https://github.com/blackwhale/datapicked/blob/fwd-buffer-range/dpick/buffer/

Sadly it segfaults with LDC so I can't quite assess its performance :(

> The problem that I have with it right now is that
> range.lookbehind(1).length != range.lookahead(1).length. This was
> confusing.

That indeed may look confusing at start but keep in mind that range.front is in fact a lookahead of 1. Thus all algorithms that can work with lookahead of 1 LL(1), LALR(1) etc. would easily work any forward/input range (though any practical parser need to slice or copy parts of input aside).

-- 
Dmitry Olshansky

January 16, 2014

Re: [RFC] I/O and Buffer Range

Posted by Steven Schveighoffer
in reply to Dmitry Olshansky

Steven Schveighoffer

Posted in reply to Dmitry Olshansky

On Tue, 07 Jan 2014 05:04:07 -0500, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

> 07-Jan-2014 11:59, Jason White пишет:
>> On Monday, 6 January 2014 at 10:26:27 UTC, Dmitry Olshansky wrote:
>>> Ok, now I see. In my eye though serialization completely hides raw
>>> stream write.
>>>
>>> So:
>>> struct SomeStream{
>>>     void write(const(ubyte)[] data...);
>>> }
>>>
>>> struct Serializer(Stream){
>>>     void write(T)(T value); //calls stream.write inside of it
>>> private:
>>>     Stream stream;
>>> }
>>
>> I was thinking it should also have "alias stream this;", but maybe
>> that's not the best thing to do for a serializer.
>>
>> I concede, I've s/(read|write)Data/\1/g on
>>
>>      https://github.com/jasonwhite/io/blob/master/src/io/file.d
>>
>> and it now works on Windows with useful exception messages.
>
> Cool, got to steal sysErrorString too! :)
>
>>> Actually these objects do just fine, since OS does the locking (or
>>> makes sure of something equivalent). If your stream is TLS there is no
>>> need for extra locking at all (no interleaving of I/O calls is
>>> possible) regardless of its kind.
>>>
>>> Shared instances would need locking as 2 threads may request some
>>> operation, and as OS locks only on per sys-call basis something cruel
>>> may happen in the code that deals with buffering etc.
>>
>> Oh yeah, you're right.
>>
>> As a side note: I would love to get a kick-ass I/O stream package into
>> Phobos. It could replace std.stream as well as std.stdio.
>
> Then our goals are aligned. Be sure to take a peek at (if you haven't already):
> https://github.com/schveiguy/phobos/blob/new-io/std/io.d

Yes, I'm gearing up to revisit that after a long D hiatus, and I came across this thread.

At this point, I really really like the ideas that you have in this. It solves an issue that I struggled with, and my solution was quite clunky.

I am thinking of this layout for streams/buffers:

1. Unbuffered stream used for raw i/o, based on a class hierarchy (which I have pretty much written)
2. Buffer like you have, based on a struct, with specific primitives. It's job is to collect data from the underlying stream, and present it to consumers as a random-access buffer.
3. Filter that has access to transform the buffer data/copy it.
4. Ranges that use the buffer/filter to process/present the data.

The problem I struggled with is the presentation of UTF data of any format as char[] wchar[] or dchar[]. 2 things need to happen. First is that the data needs to be post-processed to perform any necessary byte swapping. The second is to transcode the data into the correct width.

In this way, you can process UTF data of any type (I even have code to detect the encoding and automatically process it), and then use it in a way that makes sense for your code.

My solution was to paste in a "processing" delegate into the class hierarchy of buffered streams that allowed one read/write access to the buffer. But it's clunky, and difficult to deal with in a generalized fashion.

But the idea of using a buffer in between the stream and the range, and possibly bolting together multiple transformations in a clean way, makes this problem easy to solve, and I think it is closer to the vision Andrei/Walter have.

I also like the idea of "pinning" the data instead of my mechanism of using a delegate (which was similar but not as general). It also has better opportunities for optimization.

Other ideas that came to me that buffer filters could represent:

* compression/decompression
* encryption

I am going to study your code some more and see how I can update my code to use it. I still need to maintain the std.stdio.File interface, and Walter is insistent that the initial state of stdout/err/in must be synchronous with C (which kind of sucks, but I have plans on how to make it not be so bad).

There is still a lot of work left to do, but I think one of the hard parts is done, namely dealing with UTF transcoding. The remaining sticky part is dealing with shared. But with structs, this should make things much easier.

One question, is there a reason a buffer type has to be a range at all? I can see where it's easy to make it a range, but I don't see higher-level code using the range primitives when dealing with chunks of a stream.

-Steve

January 16, 2014

Re: [RFC] I/O and Buffer Range

Posted by Dmitry Olshansky
in reply to Steven Schveighoffer

Dmitry Olshansky

Posted in reply to Steven Schveighoffer

16-Jan-2014 19:55, Steven Schveighoffer пишет:
> On Tue, 07 Jan 2014 05:04:07 -0500, Dmitry Olshansky
> <dmitry.olsh@gmail.com> wrote:
>> Then our goals are aligned. Be sure to take a peek at (if you haven't
>> already):
>> https://github.com/schveiguy/phobos/blob/new-io/std/io.d
>
> Yes, I'm gearing up to revisit that after a long D hiatus, and I came
> across this thread.
>
> At this point, I really really like the ideas that you have in this. It
> solves an issue that I struggled with, and my solution was quite clunky.
>
> I am thinking of this layout for streams/buffers:
>
> 1. Unbuffered stream used for raw i/o, based on a class hierarchy (which
> I have pretty much written)
> 2. Buffer like you have, based on a struct, with specific primitives.
> It's job is to collect data from the underlying stream, and present it
> to consumers as a random-access buffer.

The only interesting thing I'd add here s that some buffer may work without underlying stream. Best examples are arrays and MM-files.

> 3. Filter that has access to transform the buffer data/copy it.
> 4. Ranges that use the buffer/filter to process/present the data.
>

Yes, yes and yes. I find it surprisingly good to see our vision seems to match. I was half-expecting you'd come along and destroy it all ;)

> The problem I struggled with is the presentation of UTF data of any
> format as char[] wchar[] or dchar[]. 2 things need to happen. First is
> that the data needs to be post-processed to perform any necessary byte
> swapping. The second is to transcode the data into the correct width.
>
> In this way, you can process UTF data of any type (I even have code to
> detect the encoding and automatically process it), and then use it in a
> way that makes sense for your code.
>
> My solution was to paste in a "processing" delegate into the class
> hierarchy of buffered streams that allowed one read/write access to the
> buffer. But it's clunky, and difficult to deal with in a generalized
> fashion.
>
> But the idea of using a buffer in between the stream and the range, and
> possibly bolting together multiple transformations in a clean way, makes
> this problem easy to solve, and I think it is closer to the vision
> Andrei/Walter have.

In essence a transcoding filter for UTF-16 would wrap a buffer of ubyte and itself present a buffer interface (but of wchar).

My own stuff currently deals only in ubyte and the limited decoding is represented by a "decode" function that takes a buffer of ubyte and decodes UTF-8. I think typed buffers/filters is the way to go.

>
> I also like the idea of "pinning" the data instead of my mechanism of
> using a delegate (which was similar but not as general). It also has
> better opportunities for optimization.
>
> Other ideas that came to me that buffer filters could represent:
>
> * compression/decompression
> * encryption
>
> I am going to study your code some more and see how I can update my code
> to use it. I still need to maintain the std.stdio.File interface, and
> Walter is insistent that the initial state of stdout/err/in must be
> synchronous with C (which kind of sucks, but I have plans on how to make
> it not be so bad).

I seriously not seeing how interfacing with C runtime could be fast enough.

> There is still a lot of work left to do, but I think one of the hard
> parts is done, namely dealing with UTF transcoding. The remaining sticky
> part is dealing with shared. But with structs, this should make things
> much easier.

I'm thinking a generic locking wrapper is possible along the lines of:

shared Locked!(GenericBuffer!char) stdin; //usage

struct Locked(T){
shared:
private:
	T _this;
	Mutex mut;
public:
	//forwarded methods
}

The wrapper will introduce a lock, and implement every method of wrapped struct roughly like this:
mut.lock();
scope(exit) mut.unlock();
(cast(T*)_this).method(args);

I'm sure it could be pretty automatic.

> One question, is there a reason a buffer type has to be a range at all?
> I can see where it's easy to make it a range, but I don't see
> higher-level code using the range primitives when dealing with chunks of
> a stream.

Lexers/parsers enjoy it - i.e. they work pretty much as ranges especially when skipping spaces and the like. As I said the main reason was: if it fits as range why not? After all it makes one-pass processing of data trivial as it rides on top of foreach:

foreach(octect; mybuffer)
{
	if(intersting(octect))
		do_cool_stuff();
}

Things like countUntil make perfect sense when called on buffer (e.g. to find matching sentinel).

-- 
Dmitry Olshansky

January 16, 2014

Re: [RFC] I/O and Buffer Range

Posted by Steven Schveighoffer
in reply to Dmitry Olshansky

Steven Schveighoffer

Posted in reply to Dmitry Olshansky

On Thu, 16 Jan 2014 13:44:08 -0500, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

> 16-Jan-2014 19:55, Steven Schveighoffer пишет:
>> On Tue, 07 Jan 2014 05:04:07 -0500, Dmitry Olshansky
>> <dmitry.olsh@gmail.com> wrote:
>>> Then our goals are aligned. Be sure to take a peek at (if you haven't
>>> already):
>>> https://github.com/schveiguy/phobos/blob/new-io/std/io.d
>>
>> Yes, I'm gearing up to revisit that after a long D hiatus, and I came
>> across this thread.
>>
>> At this point, I really really like the ideas that you have in this. It
>> solves an issue that I struggled with, and my solution was quite clunky.
>>
>> I am thinking of this layout for streams/buffers:
>>
>> 1. Unbuffered stream used for raw i/o, based on a class hierarchy (which
>> I have pretty much written)
>> 2. Buffer like you have, based on a struct, with specific primitives.
>> It's job is to collect data from the underlying stream, and present it
>> to consumers as a random-access buffer.
>
> The only interesting thing I'd add here s that some buffer may work without underlying stream. Best examples are arrays and MM-files.

Yes, but I would stress that for convenience, the buffer should forward some of the stream primitives (such as seeking) in cases where seeking is possible, at least in the case of a buffer that wraps a stream.

That actually is another point that would have sucked with my class-based solution -- allocating a class to use an array as backing.

>
>> 3. Filter that has access to transform the buffer data/copy it.
>> 4. Ranges that use the buffer/filter to process/present the data.
>>
>
> Yes, yes and yes. I find it surprisingly good to see our vision seems to match. I was half-expecting you'd come along and destroy it all ;)

:) I've been preaching for a while that ranges don't make good streams, and that streams should be classes, but I hadn't considered splitting out the buffer. I think it's the right balance.

>
>> The problem I struggled with is the presentation of UTF data of any
>> format as char[] wchar[] or dchar[]. 2 things need to happen. First is
>> that the data needs to be post-processed to perform any necessary byte
>> swapping. The second is to transcode the data into the correct width.
>>
>> In this way, you can process UTF data of any type (I even have code to
>> detect the encoding and automatically process it), and then use it in a
>> way that makes sense for your code.
>>
>> My solution was to paste in a "processing" delegate into the class
>> hierarchy of buffered streams that allowed one read/write access to the
>> buffer. But it's clunky, and difficult to deal with in a generalized
>> fashion.
>>
>> But the idea of using a buffer in between the stream and the range, and
>> possibly bolting together multiple transformations in a clean way, makes
>> this problem easy to solve, and I think it is closer to the vision
>> Andrei/Walter have.
>
> In essence a transcoding filter for UTF-16 would wrap a buffer of ubyte and itself present a buffer interface (but of wchar).

My intended interface allows you to specify the desired type per read. Think of the case of stdin, where the clients will be varied and written by many different people, and its interface is decided by Phobos.

But a transcoding buffer may make some optimizations. For instance, reading a UTF32 file as utf-8 can re-use the same buffer, as no code unit uses more than 4 code points (did I get that right?).

>> I am going to study your code some more and see how I can update my code
>> to use it. I still need to maintain the std.stdio.File interface, and
>> Walter is insistent that the initial state of stdout/err/in must be
>> synchronous with C (which kind of sucks, but I have plans on how to make
>> it not be so bad).
>
> I seriously not seeing how interfacing with C runtime could be fast enough.

It's not. But an important stipulation in order for this to all be accepted is that it doesn't break existing code that expects things like printf and writef to interleave properly.

However, I think we can have an opt-in scheme, and there are certain cases where we can proactively switch to a D-buffer scheme. For example, if you get a ByLine range, it expects to exhaust the data from stream, and may not properly work with C printf.

The idea is that stdio.File can switch at runtime from FILE * to D streams as needed or directed.

>> There is still a lot of work left to do, but I think one of the hard
>> parts is done, namely dealing with UTF transcoding. The remaining sticky
>> part is dealing with shared. But with structs, this should make things
>> much easier.
>
> I'm thinking a generic locking wrapper is possible along the lines of:
>
> shared Locked!(GenericBuffer!char) stdin; //usage
>
> struct Locked(T){
> shared:
> private:
> 	T _this;
> 	Mutex mut;
> public:
> 	//forwarded methods
> }
>
> The wrapper will introduce a lock, and implement every method of wrapped struct roughly like this:
> mut.lock();
> scope(exit) mut.unlock();
> (cast(T*)_this).method(args);
>
> I'm sure it could be pretty automatic.

This would be a key addition for ANY type in order to properly work with shared. BUT, I don't see how it works safely generically because you necessarily have to cast away shared in order to call the methods. You would have to limit this to only working on types it was intended for.

I've been expecting to have to do something like this, but not looking forward to it :(

>> One question, is there a reason a buffer type has to be a range at all?
>> I can see where it's easy to make it a range, but I don't see
>> higher-level code using the range primitives when dealing with chunks of
>> a stream.
>
> Lexers/parsers enjoy it - i.e. they work pretty much as ranges especially when skipping spaces and the like. As I said the main reason was: if it fits as range why not? After all it makes one-pass processing of data trivial as it rides on top of foreach:
>
> foreach(octect; mybuffer)
> {
> 	if(intersting(octect))
> 		do_cool_stuff();
> }
>
> Things like countUntil make perfect sense when called on buffer (e.g. to find matching sentinel).
>

I think I misstated my question. What I am curious about is why a type must be a forward range to pass isBuffer. Of course, if it makes sense for a buffer type to also be a range, it can certainly implement that interface as well. But I don't know that I would need those primitives in all cases. I don't have any specific use case for having a buffer that doesn't implement a range interface, but I am hesitant to necessarily couple the buffer interface to ranges just because we can't think of a counter-case :)

-Steve

January 16, 2014

Re: [RFC] I/O and Buffer Range

Posted by Walter Bright
in reply to Dmitry Olshansky

Walter Bright

Posted in reply to Dmitry Olshansky

On 12/29/2013 2:02 PM, Dmitry Olshansky wrote:
> The BufferRange concept itself (for now called simply Buffer) is defined here:
> http://blackwhale.github.io/datapicked/dpick.buffer.traits.html

I am confused because there are 4 terms conflated here:

BufferRange
Buffer
InputStream
Stream

January 16, 2014

Re: [RFC] I/O and Buffer Range

Posted by Dmitry Olshansky
in reply to Steven Schveighoffer

Dmitry Olshansky

Posted in reply to Steven Schveighoffer

17-Jan-2014 00:00, Steven Schveighoffer пишет:
> On Thu, 16 Jan 2014 13:44:08 -0500, Dmitry Olshansky
> <dmitry.olsh@gmail.com> wrote:
>
>> 16-Jan-2014 19:55, Steven Schveighoffer пишет:
>>> On Tue, 07 Jan 2014 05:04:07 -0500, Dmitry Olshansky
>>> <dmitry.olsh@gmail.com> wrote:
[snip]

>> In essence a transcoding filter for UTF-16 would wrap a buffer of
>> ubyte and itself present a buffer interface (but of wchar).
>
> My intended interface allows you to specify the desired type per read.
> Think of the case of stdin, where the clients will be varied and written
> by many different people, and its interface is decided by Phobos.
>
> But a transcoding buffer may make some optimizations. For instance,
> reading a UTF32 file as utf-8 can re-use the same buffer, as no code
> unit uses more than 4 code points (did I get that right?).
>

The other way around :) 4 code units - 1 code point.

>>> I am going to study your code some more and see how I can update my code
>>> to use it. I still need to maintain the std.stdio.File interface, and
>>> Walter is insistent that the initial state of stdout/err/in must be
>>> synchronous with C (which kind of sucks, but I have plans on how to make
>>> it not be so bad).
>>
>> I seriously not seeing how interfacing with C runtime could be fast
>> enough.
>
> It's not. But an important stipulation in order for this to all be
> accepted is that it doesn't break existing code that expects things like
> printf and writef to interleave properly.
>
> However, I think we can have an opt-in scheme, and there are certain
> cases where we can proactively switch to a D-buffer scheme. For example,
> if you get a ByLine range, it expects to exhaust the data from stream,
> and may not properly work with C printf.
>
> The idea is that stdio.File can switch at runtime from FILE * to D
> streams as needed or directed.
>
>>> There is still a lot of work left to do, but I think one of the hard
>>> parts is done, namely dealing with UTF transcoding. The remaining sticky
>>> part is dealing with shared. But with structs, this should make things
>>> much easier.
>>
>> I'm thinking a generic locking wrapper is possible along the lines of:
>>
>> shared Locked!(GenericBuffer!char) stdin; //usage
>>
>> struct Locked(T){
>> shared:
>> private:
>>     T _this;
>>     Mutex mut;
>> public:
>>     //forwarded methods
>> }
>>
>> The wrapper will introduce a lock, and implement every method of
>> wrapped struct roughly like this:
>> mut.lock();
>> scope(exit) mut.unlock();
>> (cast(T*)_this).method(args);
>>
>> I'm sure it could be pretty automatic.
>
> This would be a key addition for ANY type in order to properly work with
> shared. BUT, I don't see how it works safely generically because you
> necessarily have to cast away shared in order to call the methods. You
> would have to limit this to only working on types it was intended for.

The requirement may be that it's pure or should I say "well-contained". In other words as long as it doesn't smuggle references somewhere else it should be fine.
That is to say it's 100% fool-proof, nor do I think that essentially simulating a synchronized class is a always a good thing to do...

> I've been expecting to have to do something like this, but not looking
> forward to it :(

>>> One question, is there a reason a buffer type has to be a range at all?
>>> I can see where it's easy to make it a range, but I don't see
>>> higher-level code using the range primitives when dealing with chunks of
>>> a stream.
>>
>> Lexers/parsers enjoy it - i.e. they work pretty much as ranges
>> especially when skipping spaces and the like. As I said the main
>> reason was: if it fits as range why not? After all it makes one-pass
>> processing of data trivial as it rides on top of foreach:
>>
>> foreach(octect; mybuffer)
>> {
>>     if(intersting(octect))
>>         do_cool_stuff();
>> }
>>
>> Things like countUntil make perfect sense when called on buffer (e.g.
>> to find matching sentinel).
>>
>
> I think I misstated my question. What I am curious about is why a type
> must be a forward range to pass isBuffer. Of course, if it makes sense
> for a buffer type to also be a range, it can certainly implement that
> interface as well. But I don't know that I would need those primitives
> in all cases. I don't have any specific use case for having a buffer
> that doesn't implement a range interface, but I am hesitant to
> necessarily couple the buffer interface to ranges just because we can't
> think of a counter-case :)

Convenient to work with does ring good to me. I simply see no need to reinvent std.algorithm on buffers especially the ones that just scan left-to-right.
Example would be calculating a checksum of a stream (say data comes from a pipe or socket i.e. buffered). It's a trivial application of std.algorithm.reduce and there no need to reinvent that wheel IMHO.

-- 
Dmitry Olshansky

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation