October 12, 2010
I added following features.

- encode and decode functions take buffer or OutputRange
- Helper functions for calculating buffer size
-- encodeLength and decodeLength names from std.base64 API.
- Encoder and Decoder for Range interface

Please check the code.

http://bitbucket.org/repeatedly/scrap/src/tip/base64.d


Masahiro

On Mon, 11 Oct 2010 13:02:59 +0900, Andrei Alexandrescu <andrei at erdani.com> wrote:

> If you define encoding and decoding on a range, there's no need to allocate for every pass through the loop. You reuse the buffer.
>
> @Sean: I doubt you'll see any performance improvement if you encode in-place vs. in a separate buffer.
>
>
> Andrei
>
> On 10/10/10 22:25 CDT, Masahiro Nakagawa wrote:
>> I agree. Last night, I thought about encode / decode with buffer. In range, each memory allocation on loop is bad...
>>
>>
>> Masahiro
>>
>> On Mon, 11 Oct 2010 11:45:08 +0900, Sean Kelly <sean at invisibleduck.org> wrote:
>>
>>> As others have said, I'd like this to use ranges instead if possible, and I'd like the option to supply a destination range as well. The majority of work I do with this sort of thing encodes in-place into existing buffers.
>>>
>>> _______________________________________________
>>> phobos mailing list
>>> phobos at puremagic.com
>>> http://lists.puremagic.com/mailman/listinfo/phobos
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
October 12, 2010
While this is range based, I had something different in mind. (Not these function names, just the basic idea of the signatures)

struct Base64Encoder(Range) if (isInputRange!Range && is(ElementType!Range
== ubyte))
{
    // range methods
   @property char front() { ... }
   void popFront() { ... }
   @property size_t length() { ... }
   ...
};
Base64Encoder!(Range) Base64Encode(Range)(Range r) if (isInputRange!Range &&
is(ElementType!Range == ubyte))
{
    return Base64Encoder!Range(r);
}

This way encoding would convert an input range of ubyte to an input range of char, and decoding would convert Range!char to Range!ubyte.

This way you would be able to use it with std.algorithm, std.range etc.
When called with an array the range would be able to provide length and be
bidirectional.
This way there would be no allocations inside the range at all.

You could create and fill a buffer using
auto buffer = array(encode(data));
or fill an existing buffer using
copy(encode(data), buffer);

At this point I don't know if everything would be better off using char or dchar ranges.

@Andrei
Is it an improvement for the range to yield char[] instead of char?

@Sean
Would copy(decode(buffer), buffer) cover your use case sufficiently?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101012/7ddfcbe1/attachment-0001.html>
October 12, 2010
On Tue, 12 Oct 2010 14:31:57 +0900, Daniel Murphy <yebblies at gmail.com> wrote:

> While this is range based, I had something different in mind. (Not these function names, just the basic idea of the signatures)
>
> struct Base64Encoder(Range) if (isInputRange!Range &&
> is(ElementType!Range
> == ubyte))
> {
>     // range methods
>    @property char front() { ... }
>    void popFront() { ... }
>    @property size_t length() { ... }
>    ...
> };
> Base64Encoder!(Range) Base64Encode(Range)(Range r) if
> (isInputRange!Range &&
> is(ElementType!Range == ubyte))
> {
>     return Base64Encoder!Range(r);
> }
>
> This way encoding would convert an input range of ubyte to an input
> range of
> char, and decoding would convert Range!char to Range!ubyte.
>
> This way you would be able to use it with std.algorithm, std.range etc.
> When called with an array the range would be able to provide length and
> be bidirectional.
> This way there would be no allocations inside the range at all.

Yes, your range doesn't need allocation.
However, users eventually need to store the result returned from the range
to the buffer allocated by themselves.
How to use the encoded / decoded result?

> You could create and fill a buffer using
> auto buffer = array(encode(data));
> or fill an existing buffer using
> copy(encode(data), buffer);

"copy(encode(data), buffer)" seems to be equivalent to 'encode(data, buffer)'.

Sorry, I don't almost see the merit of your proposal.
Base64 is simple(ubyte[] -encode-> char[], char[] -decode-> ubyte[]).
Your range seems to be over-generalization to Base64...

P.S.
encode(InputRange, OutputRange) becomes a new function candidate for no
allocation.
Of course, decode too. These are inspired by your range.


Masahiro
October 12, 2010
On Tue, Oct 12, 2010 at 7:18 PM, Masahiro Nakagawa <repeatedly at gmail.com>wrote:

> Yes, your range doesn't need allocation.
> However, users eventually need to store the result returned from the range
> to the buffer allocated by themselves.
> How to use the encoded / decoded result?
>
> "copy(encode(data), buffer)" seems to be equivalent to 'encode(data,
> buffer)'.
>
> Sorry, I don't almost see the merit of your proposal.
> Base64 is simple(ubyte[] -encode-> char[], char[] -decode-> ubyte[]).
> Your range seems to be over-generalization to Base64...
>
> P.S.
> encode(InputRange, OutputRange) becomes a new function candidate for no
> allocation.
> Of course, decode too. These are inspired by your range.
>
>
> Masahiro
>
> I guess it's the difference between an function that works on ranges and a
range that adapts data.  I assumed the latter would be the best choice for
linear encoding/decoding of a data stream.  I don't think there's a
precedent to follow on this yet.
The main use case that won't work with this design is when the output is
done by a function, not another output range.

eg.
sendOverNetwork(base64Encode(lazyReadFromFile(Filename)));

I don't see how this can be done without allocation when the output method does not provide an output range.

The data can also be piped through other conversion or encoding ranges.
eg.
auto uuencoded = array(uuencode(base64Decode(inputdata)));

auto encdata = base64Encode("1,2,3,4");
foreach(n; map!(to!int)(splitter(base64Decode(encdata), ',')))
{
    // do something with n
}

I guess what I'm saying is that data doesn't always end up in an output
range, or in a newly allocated array.
Maybe this is way more general than is needed, but I see it fitting the
other range designs in phobos much more closely.

Daniel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101012/1a32c11b/attachment.html>
October 12, 2010
Hmm... OK.

Please give me your implementation.
I will merge your code into base64 module as Encoder that return type of
front is char(Decoder too).


Masahiro


On Tue, 12 Oct 2010 19:17:28 +0900, Daniel Murphy <yebblies at gmail.com> wrote:

> On Tue, Oct 12, 2010 at 7:18 PM, Masahiro Nakagawa <repeatedly at gmail.com>wrote:
>
>> Yes, your range doesn't need allocation.
>> However, users eventually need to store the result returned from the
>> range
>> to the buffer allocated by themselves.
>> How to use the encoded / decoded result?
>>
>> "copy(encode(data), buffer)" seems to be equivalent to 'encode(data,
>> buffer)'.
>>
>> Sorry, I don't almost see the merit of your proposal.
>> Base64 is simple(ubyte[] -encode-> char[], char[] -decode-> ubyte[]).
>> Your range seems to be over-generalization to Base64...
>>
>> P.S.
>> encode(InputRange, OutputRange) becomes a new function candidate for no
>> allocation.
>> Of course, decode too. These are inspired by your range.
>>
>>
>> Masahiro
>>
>> I guess it's the difference between an function that works on ranges and a
> range that adapts data.  I assumed the latter would be the best choice
> for
> linear encoding/decoding of a data stream.  I don't think there's a
> precedent to follow on this yet.
> The main use case that won't work with this design is when the output is
> done by a function, not another output range.
>
> eg.
> sendOverNetwork(base64Encode(lazyReadFromFile(Filename)));
>
> I don't see how this can be done without allocation when the output
> method
> does not provide an output range.
>
> The data can also be piped through other conversion or encoding ranges.
> eg.
> auto uuencoded = array(uuencode(base64Decode(inputdata)));
>
> auto encdata = base64Encode("1,2,3,4");
> foreach(n; map!(to!int)(splitter(base64Decode(encdata), ',')))
> {
>     // do something with n
> }
>
> I guess what I'm saying is that data doesn't always end up in an output
> range, or in a newly allocated array.
> Maybe this is way more general than is needed, but I see it fitting the
> other range designs in phobos much more closely.
>
> Daniel.


-- 
/+
  + Masahiro Nakagawa (repeatedly at gmail.com)
  +/
October 12, 2010
On Oct 11, 2010, at 10:31 PM, Daniel Murphy wrote:

> While this is range based, I had something different in mind. (Not these function names, just the basic idea of the signatures)
> 
> struct Base64Encoder(Range) if (isInputRange!Range && is(ElementType!Range == ubyte))
> {
>     // range methods
>    @property char front() { ... }
>    void popFront() { ... }
>    @property size_t length() { ... }
>    ...
> };
> Base64Encoder!(Range) Base64Encode(Range)(Range r) if (isInputRange!Range && is(ElementType!Range == ubyte))
> {
>     return Base64Encoder!Range(r);
> }
> 
> This way encoding would convert an input range of ubyte to an input range of char, and decoding would convert Range!char to Range!ubyte.
> 
> This way you would be able to use it with std.algorithm, std.range etc.
> When called with an array the range would be able to provide length and be bidirectional.
> This way there would be no allocations inside the range at all.
> 
> You could create and fill a buffer using
> auto buffer = array(encode(data));
> or fill an existing buffer using
> copy(encode(data), buffer);
> 
> At this point I don't know if everything would be better off using char or dchar ranges.
> 
> @Andrei
> Is it an improvement for the range to yield char[] instead of char?
> 
> @Sean
> Would copy(decode(buffer), buffer) cover your use case sufficiently?

It would, and I really like the idea of a b64 adaptor instead of just a plain old conversion function.
October 13, 2010
Ok, here's a first attempt.

http://yebblies.com/rangebase64.d

I ran into a couple of issues along the way:

Taking strings as input doesn't work properly, as string and wstring are defined not to have a length.  I'm not sure whether to special case them and assume they contain only ascii characters or not.  At the moment it only accepts ubyte ranges.

This makes me think we _really_ need a AsciiString type in phobos, possibly with a matching assumeAscii function that checks the characters used when in debug mode.  Does anyone think this would be a good idea?

Decoder cannot provide length when padding is used without examining the end of the input.  Currently decoder on a forward range scans to then end of the range, giving the constructor complexity of O(n).  Is this acceptable?

Is there any point in making Encoder/Decoder bidirectional/random access?



As a side issue, would anyone else find a range that exposes the raw bytes of another range useful?

auto x = [0xFFEEDDCC, 0xBBAA9988];
assert(equal(rawBytes(x), [0xCC, 0xDD, 0xEE, 0xFF, 0x88, 0x99, 0xAA, 0xBB]);
This allows you to pass any range of POD types into Base64.encode or any
similar range.

Daniel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101013/b8bc0c20/attachment.html>
October 13, 2010
Daniel Murphy <yebblies at gmail.com> wrote:
> This way encoding would convert an input range of ubyte to an input range of char, and decoding would convert Range!char to Range!ubyte.
> 
> This way you would be able to use it with std.algorithm, std.range etc.
> When called with an array the range would be able to provide length and be
> bidirectional.
> This way there would be no allocations inside the range at all.
> 
> You could create and fill a buffer using
> auto buffer = array(encode(data));
> or fill an existing buffer using
> copy(encode(data), buffer);

I don't see much benefit to make filters decorator ranges in the first place.  You can implement them, but decorator ranges should be considered extensions to core filters implemented as Masahiro's way.

The biggest reason why I think so is ranges' inaptitude for filtering purposes.  M-N conversions, which happens in base64 and character code conversion etc., can't be supported by ranges without twisted hacks.  Most filters needs to control how many items to read and write *by themselves*.

Input ranges can only support N-1 conversions in a sane way.  They can read as much items as needed from the 'front' of their underlying source ranges, but can only expose a single item.

Similarly, output ranges are restricted to 1-N conversions.

Yeah, I know you can work around the problem by caching several items inside a decorator range.  It's done in your code and pretty works. :-) But I think it is showing how ranges are unfit for filtering purposes.

So, I believe that Masahiro's encode(src,sink) design wins.  His base64 filter has a control over the number of bytes to process, and hence no need for extra caching.

Of course, decorator ranges are useful in some situation, and we'll eventually need them.  But they should never supersede Masahiro's filters.


Shin
October 13, 2010
Oops, please read 'M-N' as 'M:N' if you are confused.
October 13, 2010
On Wed, Oct 13, 2010 at 4:47 PM, Shin Fujishiro <rsinfu at gmail.com> wrote:

> The biggest reason why I think so is ranges' inaptitude for filtering purposes.  M-N conversions, which happens in base64 and character code conversion etc., can't be supported by ranges without twisted hacks.  Most filters needs to control how many items to read and write *by themselves*.
>

I'm not sure what you mean about having control over the number of items processed.  Do you mean that because of caching more bytes can be encoded/decoded than are ever used?


>
> Input ranges can only support N-1 conversions in a sane way.  They can read as much items as needed from the 'front' of their underlying source ranges, but can only expose a single item.
>
> Similarly, output ranges are restricted to 1-N conversions.
>
> Yeah, I know you can work around the problem by caching several items inside a decorator range.  It's done in your code and pretty works. :-) But I think it is showing how ranges are unfit for filtering purposes.
>
> I see that caching may be undesirable in some situations, but this adapter
(and I assume most others) can be implemented perfectly well without it.
 It's a flaw in implementation, not a limitation of ranges.

When using a output range, I think there is an expectation that output has
been completed after each call to put, which does prevent you from designing
a range that only produces an output every second call to put.  (I might be
imagining this expectation, I haven't seen/written very much code using
output ranges)
When using forward ranges this problem doesn't exist, because they guarantee
you are only consuming your view of the data.

What other problems prevent ranges from modelling M:N filtering properly? (without twisted hacks of course)


> I don't see much benefit to make filters decorator ranges in the first place.  You can implement them, but decorator ranges should be considered extensions to core filters implemented as Masahiro's way.
>
> So, I believe that Masahiro's encode(src,sink) design wins.  His base64 filter has a control over the number of bytes to process, and hence no need for extra caching.
>
> Of course, decorator ranges are useful in some situation, and we'll eventually need them.  But they should never supersede Masahiro's filters.
>
>
I don't see any real differences between the lazy range design and the conversion function design, apart from the usual lazy vs eager factors of performance, memory consumption and interface simplicity.

I tend to see the lazy solution as the primary solution, and the conversion
function as an alternative implementation optimized for speed and/or
usability.
One similar example is std.string.split vs std.algorithm.splitter.

That being said, I think we do need both, as the conversion function should
be more efficient and simpler to use for the most common case (buffer ->
buffer).
I'd hate to have to use
  copy(Base64.decode(inputbuffer), outputbuffer);
over
  Base64.decode(inputbuffer, outputbuffer);
just as I'd never want to write
  copy(repeat(5), buffer);
over
  fill(buffer, 5);


So, what am I missing?  What does a conversion function design have to offer over that a range can't do?

Thanks, Daniel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101013/0af519a1/attachment.html>