Unbuffered range interface

Mar 26, 2014

Andrei Alexandrescu

Mar 26, 2014

Ola Fosheim Grøstad

Mar 26, 2014

Vladimir Panteleev

Mar 26, 2014

Andrea Fontana

Mar 27, 2014

Dmitry Olshansky

Mar 27, 2014

Joseph Rushton Wakeling

Mar 28, 2014

QAston

Mar 28, 2014

QAston

Mar 30, 2014

Marco Leise

The recent discussion initiated by Walter points to a problem known since a long time ago: ranges are well modeled by objects in memory (arrays, lists, other containers) but poorly by objects that need to load or construct elements on the fly. Here are some scenarios that we need to accommodate: 1. Files and sockets Fastest access is implemented at the OS level by means of read() that takes a user-provided buffer. 2. Compression, decompression, encryption, decryption I think certain sizes would work better than others, but I'm not sure how that all goes. A good case study. 3. Of course character encoding, decoding, and transcoding. What set of primitives would work best for all these scenarios and more? Andrei

On Wednesday, 26 March 2014 at 00:55:31 UTC, Andrei Alexandrescu wrote: > The recent discussion initiated by Walter points to a problem known since a long time ago: ranges are well modeled by objects in memory (arrays, lists, other containers) but poorly by objects that need to load or construct elements on the fly. > > Here are some scenarios that we need to accommodate: > > 1. Files and sockets > > Fastest access is implemented at the OS level by means of read() that takes a user-provided buffer. > > 2. Compression, decompression, encryption, decryption > > I think certain sizes would work better than others, but I'm not sure how that all goes. A good case study. > > 3. Of course character encoding, decoding, and transcoding. > > What set of primitives would work best for all these scenarios and more? Have you seen Dmitry Olshansky's BufferedRange? http://forum.dlang.org/post/l9q66g$2he3$1@digitalmars.com

Similar scenario writing my mongodb (document based db) wrapper. On Wednesday, 26 March 2014 at 00:55:31 UTC, Andrei Alexandrescu wrote: > The recent discussion initiated by Walter points to a problem known since a long time ago: ranges are well modeled by objects in memory (arrays, lists, other containers) but poorly by objects that need to load or construct elements on the fly. > > Here are some scenarios that we need to accommodate: > > 1. Files and sockets > > Fastest access is implemented at the OS level by means of read() that takes a user-provided buffer. > > 2. Compression, decompression, encryption, decryption > > I think certain sizes would work better than others, but I'm not sure how that all goes. A good case study. > > 3. Of course character encoding, decoding, and transcoding. > > What set of primitives would work best for all these scenarios and more? > > > Andrei

26-Mar-2014 04:55, Andrei Alexandrescu пишет: > The recent discussion initiated by Walter points to a problem known > since a long time ago: ranges are well modeled by objects in memory > (arrays, lists, other containers) but poorly by objects that need to > load or construct elements on the fly. > > Here are some scenarios that we need to accommodate: > I've observed that ranges do mostly fine on top of buffering primitive. Said primitive provides direct access to underlying array. For cases listed below I've come to conclusion that we'd need a generic sliding-window (over whatever is the true source of data) abstraction. I call it a buffer, and a special range over said abstraction a buffer-range. Key observation on the level of ranges is that we need something that looks a lot like RA range, has slicing, but doesn't have length. There must be a way to do queries like "is there X elements available" and "please extend/move underlying window to have at least Y elements ahead available". > 1. Files and sockets > > Fastest access is implemented at the OS level by means of read() that > takes a user-provided buffer. Ranges would need at least buffering primitive below. It may or may not be backed by file/socket descriptor down below (say MM-file easily provides buffer interface by re-mapping different views of file). > 2. Compression, decompression, encryption, decryption This is much better addressed at the level of buffering itself, i.e. making an adapter/filter for a buffer. > I think certain sizes would work better than others, but I'm not sure > how that all goes. A good case study. > > 3. Of course character encoding, decoding, and transcoding. Also seems to be decently addressed as filter for buffers. Depending on the kind of the trade-off between laziness vs throughput it may be postponed to ranges. -- Dmitry Olshansky

On 26/03/14 01:55, Andrei Alexandrescu wrote: > The recent discussion initiated by Walter points to a problem known since a long > time ago: ranges are well modeled by objects in memory (arrays, lists, other > containers) but poorly by objects that need to load or construct elements on the > fly. Is it important to distinguish here between stuff that is loaded/constructed on the fly and may differ, versus stuff that is constructed on the fly but deterministically, so that it's _as if_ the value was actually stored?

On Friday, 28 March 2014 at 19:21:51 UTC, QAston wrote: >> What set of primitives would work best for all these scenarios and more? > > class Silver > { > enum bullet = true; > } Woops, I should have posted golden hammer, sorry.

March 30, 2014

Re: Unbuffered range interface

Posted by Marco Leise
in reply to Andrei Alexandrescu

Permalink

Marco Leise

Posted in reply to Andrei Alexandrescu

Permalink

I see it like Dimitry and have actually implemented what he
describes a while ago. It is a one consumer, one producer
circular buffer which offers a sliding window into the
content.
As typical for a stream(!) it operates on variable blocks of
bytes and producer and consumer never actually negotiate a
fixed block size. Instead the producer puts as many bytes into
the buffer as it thinks is a good trade-off between the count
of system calls and the delay before the consumer can start
reading. The consumer on the other hand asks for as many bytes
as it needs at minimum and gets blocked if this requirement
is yet unmet.
When there are enough bytes available for consumption the
consumer may also decide to map *all* of the currently
available data if that further improves processing performance.

Source: https://github.com/mleise/piped/blob/master/src/piped/circularbuffer.d

The public primitives (to come back to the topic) are the following as exposed by a "get" and a "put" pointer into the buffer:

commit(<byte count or data type>)
  Tells the buffer that the sliding window can be moved by X
  bytes, after they have been processed (read from or written
  to).
mapAvailable()
  For the consumer: Returns a window spanning all of the
  buffer that the producer has committed already.
  For the producer: Returns all of the free memory in the
  buffer that can be written to.
mapAtLeast(<byte count>)
  Same as mapAvailable() but blocks until a certain amount of
  bytes are ready.
map(<byte count>)
  Same as mapAtLeast, but shortens the resulting slice to
  match the byte count it has actually been asked for:
  mapAtLeast(count)[0 .. count];
map(T)()
  Treats the start of the sliding window as data of type T and
  returns a pointer to it. (When enough bytes are available.)
finish()
  This basically tells the counterpart that we are done with
  the stream. For the producer this is like setting an EOF
  flag. No more data will be written. All queries for more
  will result in an exception.

In this buffer I replaced EOF checks with throwing exceptions
on attempts to read past the end of the stream. In my limited
use cases the expected length of the stream could always be
established on the fly, for example by reading file header
fields.
There are also primitives for reading and writing bit runs
mostly to accommodate compressed streams:

peekBits(<bit count>)
  Return ubyte, uint, ulong, ... of the next bits at the start
  of the sliding window, but don't remove them yet.
  This is required for some compression algorithms to decide on
  the next steps to take.
skipBits(<bit count>)
  For performance reasons, if you just peeked the bits and
  cached the result you can just skip them. It can also be
  useful for other cases where you have no use for some bits
  in the stream.
commitBits(<bit count>)
  As above for integral bytes. The committed buffer bits
  become available to the counterpart.
readBit()
  Read a single bit. Optimized.
readBits(<bit count>)
  Works like peekBits() followed by skipBits().
skipBitsToNextByte()
  Very typical in compression. Skips the remainder of a byte
  until we are at an integral position again.

-- 
Marco

Forums