protocol for using InputRanges (page 14) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » protocol for using InputRanges (page 14)

March 28, 2014

Re: protocol for using InputRanges

Posted by Walter Bright
in reply to Dmitry Olshansky

Walter Bright

Posted in reply to Dmitry Olshansky

On 3/28/2014 9:48 AM, Dmitry Olshansky wrote:
> 28-Mar-2014 13:55, Walter Bright пишет:
>> On 3/28/2014 1:32 AM, Johannes Pfau wrote:
>>> Ranges have equivalents in other languages:
>>> iterators in c++,
>>> IEnumerator in c#,
>>> Iterator in java
>>>
>>> all these languages have special stream types for raw data. I don't
>>> think it's bad if we also have streams/ranges separate in D.
>>
>>
>> Do you see a point to be able to, in an algorithm, seamlessly swap a
>> socket with a string?
>>
>
> Certainly NOT a socket. There is no escaping the fact that there are specifics
> to unbuffered direct streams.
> What you mention only makes sense with buffering either implicit or (I'd prefer)
> explicit.

Yes, it does require a one element buffer. But seriously, does a one character buffer from a socket have a measurable impact on reading from a network? I'm an efficiency wonk as much or more than anyone, and this appears to me to be a false savings.

March 28, 2014

Re: protocol for using InputRanges

Posted by Dmitry Olshansky
in reply to Walter Bright

Dmitry Olshansky

Posted in reply to Walter Bright

28-Mar-2014 21:07, Walter Bright пишет:
> On 3/28/2014 9:48 AM, Dmitry Olshansky wrote:
>> 28-Mar-2014 13:55, Walter Bright пишет:
>>> On 3/28/2014 1:32 AM, Johannes Pfau wrote:
>>>> Ranges have equivalents in other languages:
>>>> iterators in c++,
>>>> IEnumerator in c#,
>>>> Iterator in java
>>>>
>>>> all these languages have special stream types for raw data. I don't
>>>> think it's bad if we also have streams/ranges separate in D.
>>>
>>>
>>> Do you see a point to be able to, in an algorithm, seamlessly swap a
>>> socket with a string?
>>>
>>
>> Certainly NOT a socket. There is no escaping the fact that there are
>> specifics
>> to unbuffered direct streams.
>> What you mention only makes sense with buffering either implicit or
>> (I'd prefer)
>> explicit.
>
> Yes, it does require a one element buffer. But seriously, does a one
> character buffer from a socket have a measurable impact on reading from
> a network?

WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

> I'm an efficiency wonk as much or more than anyone, and this
> appears to me to be a false savings.

Oh crap. This is very wrong. Do you often work with I/O and networking?

-- 
Dmitry Olshansky

March 28, 2014

Re: protocol for using InputRanges

Posted by w0rp
in reply to Johannes Pfau

w0rp

Posted in reply to Johannes Pfau

On Friday, 28 March 2014 at 16:59:05 UTC, Johannes Pfau wrote:
> It 'works' with streams but it's way too slow. You don't want to read
> byte-per-byte. Of course you can always implement ranges on top of
> streams. Usually these will not provide byte-per-byte access but
> efficient higher level abstractions (byLine, byChunk, decodeText).
>
> The point is you can implement ranges on streams easily, but you can't
> use ranges as the generic primitive for raw data. What's the element
> type of a data range?
> ubyte - performance sucks
> ubyte[n], ubyte[] now you have a range of ranges, most algorithms wont
> work as expected (find, count, ...).
>
> (the call empty/don't call empty discussion is completely unrelated to
> this, btw. You can implement ranges on streams either way, but again,
> using ranges for raw data streams is not a good idea.)

I think a key is to offer something with gives you chunks at a time right at the top, and the use .joiner on that. I read files this way currently.

auto fileByteRange = File("something").byChunk(chunkSize).joiner;

I believe this to be a very good way to get good performance without losing the functionality of std.algorithm.

March 28, 2014

Re: protocol for using InputRanges

Posted by Johannes Pfau
in reply to w0rp

Johannes Pfau

Posted in reply to w0rp

Am Fri, 28 Mar 2014 17:22:26 +0000
schrieb "w0rp" <devw0rp@gmail.com>:

> 
> I think a key is to offer something with gives you chunks at a time right at the top, and the use .joiner on that. I read files this way currently.
> 
> auto fileByteRange = File("something").byChunk(chunkSize).joiner;
> 

byChunk is implemented on top of the file rawRead API though, and
that's a stream API ;-)
As said before implementing ranges on top of streams is fine, but if you
want ranges to replace streams as the lowest level interface you'll
either suffer from performance issues or you'll have to extend the
range interface and effectively make it a stream interface. (For example
byChunk doesn't offer a way to provide a buffer. I'd expect a low level
API to offer this, but it'll complicate range API a lot. File.rawRead
on the other hand provides exactly that and you can implement byChunk
on top of rawRead easily. The other way round is not as easy).

BTW: If this code performs well of course depends what you with that fileByteRange range. For example if you only read the complete file into a memory buffer joiner would reduce performance significantly.

> I believe this to be a very good way to get good performance without losing the functionality of std.algorithm.

Yes, that's exactly how ranges/streams should interface, there's no
real problem for users.
stream.getSomeRange().rangeAPICalls....

March 28, 2014

Re: protocol for using InputRanges

Posted by Walter Bright
in reply to Dmitry Olshansky

Walter Bright

Posted in reply to Dmitry Olshansky

On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
> WAT? The overhead is in issuing system calls, you'd want to do as little of them
> as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

That's why we have things like byLine().

March 28, 2014

Re: protocol for using InputRanges

Posted by QAston
in reply to Johannes Pfau

QAston

Posted in reply to Johannes Pfau

On Friday, 28 March 2014 at 08:34:08 UTC, Johannes Pfau wrote:
> Am Thu, 27 Mar 2014 17:20:25 -0700
> schrieb Walter Bright <newshound2@digitalmars.com>:
>
>> On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
>> > On 3/27/14, 2:24 PM, Walter Bright wrote:
>> >> The range protocol is designed to work with streams.
>> >
>> > It's designed to work with containers.
>> 
>> I know we talked about streams when we designed it.
>> 
>> 
>> >> It's a giant fail
>> >> if they do not, or if you want to create a separate, non-range
>> >> universe to deal with streams.
>> >
>> > It's not a giant fail, we just need to adjust the notion.
>> 
>> Are you suggesting that ranges needn't support streams?
>> 
>> Note also that I suggested a way Steven could create an adapter with
>> the behavior he desired, yet still adhere to protocol. No notion
>> adjustments required.
>
> Ranges have equivalents in other languages:
> iterators in c++,
> IEnumerator in c#,
> Iterator in java
>
> all these languages have special stream types for raw data. I don't
> think it's bad if we also have streams/ranges separate in D.

There are stream iterators in C++:
http://www.cplusplus.com/reference/iterator/istream_iterator/

March 28, 2014

Re: protocol for using InputRanges

Posted by Dmitry Olshansky
in reply to Walter Bright

Dmitry Olshansky

Posted in reply to Walter Bright

28-Mar-2014 22:29, Walter Bright пишет:
> On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
>> WAT? The overhead is in issuing system calls, you'd want to do as
>> little of them
>> as possible. Reading byte by byte is an exemplar of idiocy in I/O code.
>
> That's why we have things like byLine().
>

Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW.

-- 
Dmitry Olshansky

March 28, 2014

Re: protocol for using InputRanges

Posted by Paolo Invernizzi
in reply to John Stahara

Paolo Invernizzi

Posted in reply to John Stahara

On Friday, 28 March 2014 at 16:30:36 UTC, John Stahara wrote:
> On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:
>
>> On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
>>> On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi
>>> <paolo.invernizzi@no.address> wrote:
>>>> For what concern us, everyone here is happy with the fact that empty
>>>> *must* be checked prior to front/popFront.
>>>
>>> This is actually not true.
>>>
>>> R
>> 
>> What I'm meaning, it's that we don't care: we are always respecting the
>> sequence "empty > front > pop", and everybody here find it natural.
>
>
> To clarify for Mr. Invernizzi: the "we" to which he refers is the group
> of people he works with, and /not/ the members of this newsgroup.
>
> --jjs

Thank you John, that's exact: I'm talking about my colleagues working with D.
-- Paolo

March 28, 2014

Re: protocol for using InputRanges

Posted by Walter Bright
in reply to Dmitry Olshansky

Walter Bright

Posted in reply to Dmitry Olshansky

On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:
> Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even
> though sys calls are amortized by C runtime, we have a function call per byte.
> No wonder it's SLOW.

How about a PR to fix it?

March 28, 2014

Re: protocol for using InputRanges

Posted by Tobias Müller
in reply to Walter Bright

Tobias Müller

Posted in reply to Walter Bright

On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
> On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
>> This loop is intuitive. Not being allowed to call empty or front multiple times
>> or not at all is unintuitive. They should not be named as if they are properties
>> then.
>
> I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.

Disclaimer: I'm a C++ programmer just lurking here, I've never
actually used D.

I find it very counter-intuitive that 'empty' is required before
front or popFront.
Since 'pump priming' in the constructor isn't wanted either, i'd
suggest the following protocol:

while (popFront())
{
    front;
}

popFront is then required to return !empty.
'empty' as a separate property getter can stay but is not
required for the protocol.

This way, it's clear that the work to fetch the next element is
always done in popFront.

Generally I find dependecies between functions problematic that
require a specific call sequence. If they can be removed, they
should.

With my proposed solution, there's still one minor dependency,
namely that front is not valid before the first popFront. This
could be solved by again combining the two, as proposed by
someone else in this thread.

Tobi

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation