std.base64 replacement (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Phobos » std.base64 replacement (page 3)

October 14, 2010

[phobos] std.base64 replacement

Posted by Masahiro Nakagawa
in reply to Daniel Murphy

Masahiro Nakagawa

Posted in reply to Daniel Murphy

On Wed, 13 Oct 2010 19:03:41 +0900, Daniel Murphy <yebblies at gmail.com> wrote:

> On Wed, Oct 13, 2010 at 4:47 PM, Shin Fujishiro <rsinfu at gmail.com> wrote:
>>
>> Input ranges can only support N-1 conversions in a sane way.  They can read as much items as needed from the 'front' of their underlying source ranges, but can only expose a single item.
>>
>> Similarly, output ranges are restricted to 1-N conversions.
>>
>> Yeah, I know you can work around the problem by caching several items inside a decorator range.  It's done in your code and pretty works. :-) But I think it is showing how ranges are unfit for filtering purposes.
>>
>> I see that caching may be undesirable in some situations, but this adapter
> (and I assume most others) can be implemented perfectly well without it.
>  It's a flaw in implementation, not a limitation of ranges.

I wait no-caching implementation. I think Range should not have useless
state.
In addition, current implementation seems to be still more complex than
necessary.

>> I don't see much benefit to make filters decorator ranges in the first place.  You can implement them, but decorator ranges should be considered extensions to core filters implemented as Masahiro's way.
>>
>> So, I believe that Masahiro's encode(src,sink) design wins.  His base64 filter has a control over the number of bytes to process, and hence no need for extra caching.
>>
>> Of course, decorator ranges are useful in some situation, and we'll eventually need them.  But they should never supersede Masahiro's filters.
>>
>>
> I don't see any real differences between the lazy range design and the conversion function design, apart from the usual lazy vs eager factors of performance, memory consumption and interface simplicity.
>
> I tend to see the lazy solution as the primary solution, and the
> conversion
> function as an alternative implementation optimized for speed and/or
> usability.
> One similar example is std.string.split vs std.algorithm.splitter.

I don't think so. Range is a good concept, but not all.
I think usability is a most important factor for module API.

> That being said, I think we do need both, as the conversion function
> should
> be more efficient and simpler to use for the most common case (buffer ->
> buffer).

I agree.

> I'd hate to have to use
>   copy(Base64.decode(inputbuffer), outputbuffer);
> over
>   Base64.decode(inputbuffer, outputbuffer);
> just as I'd never want to write
>   copy(repeat(5), buffer);
> over
>   fill(buffer, 5);

Base64's API is following:

   encodeLength(length);
   encode(src, dst);
   encode(src);
   encoder(returns Encoder!(char[]) or Encoder!(char))
   decodeLength(length);
   decode(src, dst);
   decode(src);
   decoder(returns Decoder!(ubyte[]) or Decoder!(ubyte))

Do you see any problems?


Masahiro

October 14, 2010

[phobos] std.base64 replacement

Posted by Shin Fujishiro
in reply to Daniel Murphy

Shin Fujishiro

Posted in reply to Daniel Murphy

I'm sorry but I respond to a part of your message for now.  I'll post a reply for the rest later!

Daniel Murphy <yebblies at gmail.com> wrote:
> > Input ranges can only support N-1 conversions in a sane way.  They can read as much items as needed from the 'front' of their underlying source ranges, but can only expose a single item.
> >
> > Similarly, output ranges are restricted to 1-N conversions.
> >
> > Yeah, I know you can work around the problem by caching several items inside a decorator range.  It's done in your code and pretty works. :-) But I think it is showing how ranges are unfit for filtering purposes.
>
> I see that caching may be undesirable in some situations, but this adapter
> (and I assume most others) can be implemented perfectly well without it.
>  It's a flaw in implementation, not a limitation of ranges.

So... how?

To tell a truth, I've tried creating decorator ranges for character code conversion and zlib decompression, and both required internal caching. I think it's a limitation of the decorator design.

Perhaps I'm missing something?  Please tell us if you know a way to implement M:N decorators without caching!

Shin

October 13, 2010

[phobos] std.base64 replacement

Posted by Sean Kelly
in reply to Shin Fujishiro

Sean Kelly

Posted in reply to Shin Fujishiro

On Oct 13, 2010, at 1:52 PM, Shin Fujishiro wrote:

> I'm sorry but I respond to a part of your message for now.  I'll post a reply for the rest later!
> 
> Daniel Murphy <yebblies at gmail.com> wrote:
>>> Input ranges can only support N-1 conversions in a sane way.  They can read as much items as needed from the 'front' of their underlying source ranges, but can only expose a single item.
>>> 
>>> Similarly, output ranges are restricted to 1-N conversions.
>>> 
>>> Yeah, I know you can work around the problem by caching several items inside a decorator range.  It's done in your code and pretty works. :-) But I think it is showing how ranges are unfit for filtering purposes.
>> 
>> I see that caching may be undesirable in some situations, but this adapter (and I assume most others) can be implemented perfectly well without it. It's a flaw in implementation, not a limitation of ranges.
> 
> So... how?
> 
> To tell a truth, I've tried creating decorator ranges for character code conversion and zlib decompression, and both required internal caching. I think it's a limitation of the decorator design.

I'd thought so too.  zlib in particular has this issue--the quality of compression is directly related to the buffer size.

October 14, 2010

[phobos] std.base64 replacement

Posted by Shin Fujishiro
in reply to Daniel Murphy

Shin Fujishiro

Posted in reply to Daniel Murphy

Daniel Murphy <yebblies at gmail.com> wrote:
> On Wed, Oct 13, 2010 at 4:47 PM, Shin Fujishiro <rsinfu at gmail.com> wrote:
> 
> > The biggest reason why I think so is ranges' inaptitude for filtering purposes.  M-N conversions, which happens in base64 and character code conversion etc., can't be supported by ranges without twisted hacks.  Most filters needs to control how many items to read and write *by themselves*.
> >
> 
> I'm not sure what you mean about having control over the number of items processed.  Do you mean that because of caching more bytes can be encoded/decoded than are ever used?

No.  Uh... Let's forget about ranges for the next two paragraphs.

Consider decoding a base64 unit (4 chars) by hand.  You may want to
(a) pull four chars from a source, and decode them into three bytes.
Then, (b) you'll push the decoded bytes to a destination.  Done.

Conversion works naturally if (a) data can be pulled from a source and (b) converted data can be pushed to a destination.  If either of them can't be achieved, we need an extra cache to pool dangling data.  The "control" I wrote meant these two points.

Then, can ranges support pull and push?  No, unfortunately.  They are restricted to pull *or* push semantics.  Decorator input ranges can pull data from source ranges, but can't push converted data to anywhere. Output ranges have similar inconveniences.

So, ranges are not best for conversion drivers IMO.  Ranges are at their best when used as sources and destinations.  We may support decorator ranges, but they should not be the main API.

Hey, I'm not dissing ranges nor your implementation. :-)  I'm just afraid of people making everything ranges in the first place!

Regards,
Shin

October 13, 2010

[phobos] std.base64 replacement

Posted by Andrei Alexandrescu
in reply to Shin Fujishiro

Andrei Alexandrescu

Posted in reply to Shin Fujishiro

On 10/13/2010 04:48 PM, Shin Fujishiro wrote:
> No.  Uh... Let's forget about ranges for the next two paragraphs.
>
> Consider decoding a base64 unit (4 chars) by hand.  You may want to
> (a) pull four chars from a source, and decode them into three bytes.
> Then, (b) you'll push the decoded bytes to a destination.  Done.
>
> Conversion works naturally if (a) data can be pulled from a source and (b) converted data can be pushed to a destination.  If either of them can't be achieved, we need an extra cache to pool dangling data.  The "control" I wrote meant these two points.
>
> Then, can ranges support pull and push?  No, unfortunately.  They are restricted to pull *or* push semantics.  Decorator input ranges can pull data from source ranges, but can't push converted data to anywhere. Output ranges have similar inconveniences.
>
> So, ranges are not best for conversion drivers IMO.  Ranges are at their best when used as sources and destinations.  We may support decorator ranges, but they should not be the main API.
>
> Hey, I'm not dissing ranges nor your implementation. :-)  I'm just afraid of people making everything ranges in the first place!

This ties into our earlier discussion about streams, and the ongoing discussion on the newsgroup.

Shin, there's no interface to satisfy all streaming needs. Some streams produce data at a variable rate. For those this is best:

void read(ref ubyte[] data);

So the client would have to pull data at unpredictable lengths and deal with it. Some streams need to hold internal buffers that are not under user's control. For those a straight range interface exposing ubyte[] is enough:

@property ubyte[] front();

Some other ranges work best with a user-supplied buffer of a size also decided by the user:

size_t read(ubyte[] buffer);

Now let's talk about decorator streams, which must read from some stream and write to another. In particular, those M:N ranges that produce and consume data at different rates. Depending on M > N versus M < N _and_ on the use of one of the APIs above, the M:N decorator would have to do its own buffering. I don't think there's a simple way out that satisfies everyone.

About the ongoing discussion about Base64: I do see a few problems with the current interface, although not a major one.

1. The template parameters '!' and '/' are not justified. They should be runtime parameters. Rule of thumb: use generic code when you stand to profit.

2. This function:

size_t encode(Range)(in ubyte[] source, Range range);

has one issue: (a) it forces input to an array although it could work with any input range with length of ubyte. Suggestion:

size_t encode(R1, R2)(R1 source, R2 target);

Constrain the template any way you need that keeps implementation efficient. Ideally you should have roughly the same performance with a ubyte[] as before.

3. Same discussion about decode. This is actually more important because you might want to decode streams of dchar. This is how many streams will come through, even though they are technically Ascii.

I'm not saying we should use ranges everywhere, but if it doesn't really cost anything, accepting a range is better than an array.

Regarding Daniel's approach with char/byte level ranges through and through, in an ideal world I'd agree. But I fear that the implementation would not be as efficient. (I suggest you benchmark it against Masahiro's.) Also, practically, more often than not I'll want to work one chunk at a time, not one byte/char at a time.

Andrei

October 14, 2010

[phobos] std.base64 replacement

Posted by Masahiro Nakagawa
in reply to Andrei Alexandrescu

Masahiro Nakagawa

Posted in reply to Andrei Alexandrescu

On Thu, 14 Oct 2010 09:45:38 +0900, Andrei Alexandrescu <andrei at erdani.com> wrote:

>
> About the ongoing discussion about Base64: I do see a few problems with the current interface, although not a major one.
>
> 1. The template parameters '!' and '/' are not justified. They should be runtime parameters. Rule of thumb: use generic code when you stand to profit.

I don't understand this point.
Please tell me the merit of runtime parameters.
I can't imagine such situations.

> 2. This function:
>
> size_t encode(Range)(in ubyte[] source, Range range);
>
> has one issue: (a) it forces input to an array although it could work with any input range with length of ubyte. Suggestion:
>
> size_t encode(R1, R2)(R1 source, R2 target);
>
> Constrain the template any way you need that keeps implementation efficient. Ideally you should have roughly the same performance with a ubyte[] as before.
>
> 3. Same discussion about decode. This is actually more important because you might want to decode streams of dchar. This is how many streams will come through, even though they are technically Ascii.

http://lists.puremagic.com/pipermail/phobos/2010-October/002920.html

I already mentioned such encode / decode but no response.
So I suspend the implementation of these functions.
OK, I will implement.

October 14, 2010

[phobos] std.base64 replacement

Posted by Daniel Murphy
in reply to Masahiro Nakagawa

Daniel Murphy

Posted in reply to Masahiro Nakagawa

On Thu, Oct 14, 2010 at 4:59 AM, Masahiro Nakagawa <repeatedly at gmail.com>wrote:

> I wait no-caching implementation. I think Range should not have useless
> state.
> In addition, current implementation seems to be still more complex than
> necessary.
>
>
>
I thought you might say that.
http://yebblies.com/rangebase64nocache.d
These ranges only read the bits that they actually need off the input, only
converting one output item per popFront.  I don't consider this a better
solution.  I haven't adapted length to work with this version, but I will if
this implementation is actually going to be used.

 I'd welcome any ideas/suggestions on how to simplify the implementation of
either version, especially the Decoder length, which is considerably ugly.


> I don't think so. Range is a good concept, but not all.
> I think usability is a most important factor for module API.
>
>
>
> Base64's API is following:
>
>  encodeLength(length);
>  encode(src, dst);
>  encode(src);
>  encoder(returns Encoder!(char[]) or Encoder!(char))
>  decodeLength(length);
>  decode(src, dst);
>  decode(src);
>  decoder(returns Decoder!(ubyte[]) or Decoder!(ubyte))
>
> Do you see any problems?
>
> Looks fine to me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101014/cd08de23/attachment-0001.html>

October 14, 2010

[phobos] std.base64 replacement

Posted by Daniel Murphy
in reply to Shin Fujishiro

Daniel Murphy

Posted in reply to Shin Fujishiro

On Thu, Oct 14, 2010 at 6:52 AM, Shin Fujishiro <rsinfu at gmail.com> wrote:

> So... how?
>
> To tell a truth, I've tried creating decorator ranges for character code conversion and zlib decompression, and both required internal caching. I think it's a limitation of the decorator design.
>
> Perhaps I'm missing something?  Please tell us if you know a way to implement M:N decorators without caching!
>
>
By remembering where it's up to, and only popping items off the input when it's done with them.  Maybe this is just another form of caching?

eg.
struct byByte
{
  ushort[] data;
  bool hi;
  ubyte v;
  this(ushort[] data) { this.data = data; }
  void popFront()
  {
    v = hi ? (data.front >> 8) : (data.front & 0xFF);
    hi = !hi;
    if (!hi) data.popFront();
  }
  ubyte front() { return v; }
  bool empty() { return data.empty; }
}

I do understand this is not a perfect solution, and does not apply in some cases.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101014/c558a9d1/attachment.html>

October 14, 2010

[phobos] std.base64 replacement

Posted by Daniel Murphy
in reply to Sean Kelly

Daniel Murphy

Posted in reply to Sean Kelly

On Thu, Oct 14, 2010 at 6:57 AM, Sean Kelly <sean at invisibleduck.org> wrote:
>
> >
> > To tell a truth, I've tried creating decorator ranges for character code conversion and zlib decompression, and both required internal caching. I think it's a limitation of the decorator design.
>
> I'd thought so too.  zlib in particular has this issue--the quality of compression is directly related to the buffer size.
>
> Maybe the possibilities are more limited than I assumed.  Now that I think
about it I see using a position variable only works when the output is in
the same order as the input.
So you would be able to do rle or splitting data into parts, but not a whole
bunch of other algorithms.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101014/5cfc3167/attachment.html>

October 14, 2010

[phobos] std.base64 replacement

Posted by Daniel Murphy
in reply to Shin Fujishiro

Daniel Murphy

Posted in reply to Shin Fujishiro

On Thu, Oct 14, 2010 at 7:48 AM, Shin Fujishiro <rsinfu at gmail.com> wrote:

>
> Conversion works naturally if (a) data can be pulled from a source and (b) converted data can be pushed to a destination.  If either of them can't be achieved, we need an extra cache to pool dangling data.  The "control" I wrote meant these two points.
>
> Then, can ranges support pull and push?  No, unfortunately.  They are restricted to pull *or* push semantics.  Decorator input ranges can pull data from source ranges, but can't push converted data to anywhere. Output ranges have similar inconveniences.
>
>
Sorry for being slow, I think I understand what you mean now. :)

Ranges typically model one iteration of an algorithm, and this works best if each iteration reads N items and outputs 1.  Is this along the right lines? I understand that pushing converted data is part of the conversion process, but that is not the only way to do it.

The conversion function approach works when you want to convert the data,
then use it.  I'd assume this is the common case.
useConvertedData( convert( originalData ) );
or
convert(originalData, buffer);

The range approach works better when you want to do something with your data as if it was the converted data.

find( converter(originalData), something );
auto n = calculateChecksum( converter(originalData) );

Both are forms of conversion and each is most useful in different situations.

Base64 encoding/decoding may be an exception, but you can describe the process of getting the next byte/char fairly easily.

Encode: Get the next 6 bits from the input -> find char code -> yeild char Decode: map chars to the values they represent -> get the next 8 bits -> yeild ubyte

Converting CAN be defined in terms of getting the next piece of converted data, and this is the only method ranges properly support.

So, ranges are not best for conversion drivers IMO.  Ranges are at
> their best when used as sources and destinations.  We may support decorator ranges, but they should not be the main API.
>
>
I'm quite happy with that.  It does seem like conversion functions cover the most common case with better syntax and performance, with a loss of flexibility.  I don't mean to argue for shunning conversion functions in phobos, just providing equivalent ranges where we can.

What I meant earlier when I said that I think of ranges as the primary implementation, is that I feel ranges are a closer representation of the algorithm being used, where a conversion function represents the process of applying the algorithm to data.  I don't mean we should tell people not to use the conversion functions.

> Hey, I'm not dissing ranges nor your implementation. :-)  I'm just afraid of people making everything ranges in the first place!
>
>
Of course!
I want ranges everywhere they make sense, and would like conversion
functions where they make sense.  I just happen to believe both do in this
case!  I don't want to force the use of ranges anywhere they make code
inefficient, uglier or more complicated.

Daniel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101014/2b4c499f/attachment.html>

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation