BitArray/BitFields - Resumed and polishing (page 7) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » BitArray/BitFields - Resumed and polishing (page 7)

January 03, 2013

Re: BitArray/BitFields - Resumed and polishing

Posted by Dmitry Olshansky
in reply to Era Scarecrow

Dmitry Olshansky

Posted in reply to Era Scarecrow

04-Jan-2013 00:11, Era Scarecrow пишет:
> On Thursday, 3 January 2013 at 15:48:50 UTC, Dmitry Olshansky wrote:
>> 1/3/2013 2:20 PM, Era Scarecrow wrote:
>>> Suddenly it won't work and slicing is only a range and can only be
>>> used in foreach.
>>
>> No surprise here. Basically container != range over it. It all flows
>> from there. Range doesn't have append nor it owns any data.
>> About foreach see below. Yes, the BitArray container won't pass
>> hasSlicing trait, but it shouldn't as it's meant for ranges. (so in a
>> sense my comment about trying to mimic built-in array applies)
>>
>> And slice can have opIndex and opSlice being a random access range of
>> bool.
>
>   Hmmm.. shouldn't try to make it as close to an actual array as
> possible. I feel like I'm missing something very very simple to making
> it all correct.
>

[snip]
>>
>> Again I don't think there is a need to append to a slice. Create an
>> array out of slice (or in fact any range of bool) then append.
>
>   I want to say you're wrong, I'm thinking heavily in the field where
> you would be doing compression or encryption where random access and
> appending could be needed at any step. I can see creating a simple table
> of the compressed sequences (Say huffman), and then appending the slices
> which then greatly simplifies the interface.

Appending a slice *to* BitArray is perfectly fine as in fact any range of bool (or bit if you like). Any array-like or string-like container has to support appending a range of element type (and insertion for that matter). The table you mention is then an array of BitArray that is sliced (getting a range) and these slices are appended to some other BitArray. BitArrays in this table could also be appended to just fine (so you construct sequences out of others easily).

The key point is that *appending a range to container* is fine (and idiomatic) and *slicing is O(1), fast and non-allocating*.

>I can also see where if you
> append to a slice it automatically making a new block of memory would be
> useful as well, as you know you're going to start fresh.

Maybe but I can see it being a nasty surprise in a tight loop.

> I can make a
> short test set of functions and post them if you like, doing ranges &
> slices otherwise might not allow such diversity (simplicity?) in the code.

Would be great to put them together as we have implementation details sorted out but have trouble with usage patterns.

>> There is an extra rule however, it is the following. If the object in
>> question is not a range and doesn't have opApply but has opSlice()
>> then it slices it first, then again it checks the 2 ways on this slice
>> - as range or opApply.
>
>   I didn't know that; I know it attempted a slice but not that the slice
> has to be a range. Likely that needs to be emphasized in the documentation.

That's pretty much how std.container tries to work. There is little documentation on the subject though, same note IRC is in TDPL book (highly recommend that one if you don't have it already).

[snip]

>>> I'm aware of the container version, and overall if it can just
>>> replace BitArray as it is, then we might consider only worrying about
>>> a compact version, at which point it would become a template... Might
>>> start seeing BitArray!1024  in places (or BitString!1024 ?).
>>
>> I'm not sure if it will gain that much to have templated fixed size
>> bit array over one that just has small optimization. Could be a lot
>> but somehow I don't expect much, that needs some measurement.
>
>   I agree, but if it's going to be a value type that's heavy on small
> string optimization with as little referencing as possible, then it has
> to know what size to depend on, since resizing beyond the max may not be
> an option if we take that part out.

I meant it's still resizing up to any point, just that it does dup's underlying array on copy (if not small, if small then postblit does nothing). It's questionable to dup on copy but the idea is that it won't be often.

If you mean to make type that doesn't resize beyond some maximum (so it's always "small") then indeed having fixed size as template parameter makes more sense.

-- 
Dmitry Olshansky

January 03, 2013

Re: BitArray/BitFields - Resumed and polishing

Posted by Era Scarecrow
in reply to Dmitry Olshansky

Era Scarecrow

Posted in reply to Dmitry Olshansky

On Thursday, 3 January 2013 at 21:15:19 UTC, Dmitry Olshansky wrote:
> 04-Jan-2013 00:11, Era Scarecrow wrote:

> Appending a slice *to* BitArray is perfectly fine as in fact any range of bool (or bit if you like). Any array-like or string-like container has to support appending a range of element type (and insertion for that matter). The table you mention is then an array of BitArray that is sliced (getting a range) and these slices are appended to some other BitArray. BitArrays in this table could also be appended to just fine (so you construct sequences out of others easily).
>
> The key point is that *appending a range to container* is fine (and idiomatic) and *slicing is O(1), fast and non-allocating*.

 So the slice is a non-allocating view of data some data. I can see adding more limitations for that then.

>> I can also see where if you append to a slice it automatically making a new block of memory would be useful as well, as you know you're going to start fresh.
>
> Maybe but I can see it being a nasty surprise in a tight loop.

 Only on the first iteration, unless you do a LOT of slicing & appending on non-compact blocks, but if you're appending to a slice in normal arrays it will dup it anyways (if it doesn't own the following memory) so the behavior doesn't really change.

>> I can make a short test set of functions and post them if you like, doing ranges & slices otherwise might not allow such diversity (simplicity?) in the code.
>
> Would be great to put them together as we have implementation details sorted out but have trouble with usage patterns.

 K, I'll likely re-work my multi-level huffman algorithmn and a LZW compression, although the LZW wouldn't make use of the more exotic features. Got half the LZW written, but I likely won't get this done for a few days.

>>> There is an extra rule however, it is the following. If the object in question is not a range and doesn't have opApply but has opSlice() then it slices it first, then again it checks the 2 ways on this slice - as range or opApply.
>>
>> I didn't know that; I know it attempted a slice but not that the slice has to be a range. Likely that needs to be emphasized in the documentation.
>
> That's pretty much how std.container tries to work. There is little documentation on the subject though, same note IRC is in TDPL book (highly recommend that one if you don't have it already).

 Got a copy, in fact this is one of the first prints. I've likely read this book a half dozen times to get a full feel of the language (other than templates which I've learned elsewhere)

>> I agree, but if it's going to be a value type that's heavy on small string optimization with as little referencing as possible, then it has to know what size to depend on, since resizing beyond the Max may not be an option if we take that part out.
>
> I meant it's still resizing up to any point, just that it does dup's underlying array on copy (if not small, if small then postblit does nothing). It's questionable to dup on copy but the idea is that it won't be often.
>
> If you mean to make type that doesn't resize beyond some maximum (so it's always "small") then indeed having fixed size as template parameter makes more sense.

 That was the idea. A true value type with no hidden allocation. There are many cases for it, but I think having the BitArray which can allocate has a larger audience than a fixed size. Won't know for sure though, maybe it's the other way around.

January 06, 2013

Re: BitArray/BitFields - Resumed and polishing

Posted by Era Scarecrow
in reply to Era Scarecrow

Era Scarecrow

Posted in reply to Era Scarecrow

On Thursday, 3 January 2013 at 21:45:24 UTC, Era Scarecrow wrote:
> K, I'll likely re-work my multi-level huffman algorithmn and a LZW compression, although the LZW wouldn't make use of the more exotic features. Got half the LZW written, but I likely won't get this done for a few days.

 Nearly done. Got a working LZW, multi-level huffman working, only cleanup/refactoring to do, and a way to save/load the tree structure for external storage.

 Do I assume you would want me to post it? (Even incomplete and lacking a good portion of documentation/unittests)

January 06, 2013

Re: BitArray/BitFields - Resumed and polishing

Posted by Dmitry Olshansky
in reply to Era Scarecrow

Dmitry Olshansky

Posted in reply to Era Scarecrow

07-Jan-2013 00:51, Era Scarecrow пишет:
> On Thursday, 3 January 2013 at 21:45:24 UTC, Era Scarecrow wrote:
>> K, I'll likely re-work my multi-level huffman algorithmn and a LZW
>> compression, although the LZW wouldn't make use of the more exotic
>> features. Got half the LZW written, but I likely won't get this done
>> for a few days.
>
>   Nearly done. Got a working LZW, multi-level huffman working, only
> cleanup/refactoring to do, and a way to save/load the tree structure for
> external storage.
>
>   Do I assume you would want me to post it? (Even incomplete and lacking
> a good portion of documentation/unittests)

Sure thing! I'd love to check it out.

-- 
Dmitry Olshansky

January 06, 2013

LZW & Huffman - BitArray implimentations & examples

Posted by Era Scarecrow
in reply to Dmitry Olshansky

Era Scarecrow

Posted in reply to Dmitry Olshansky

On Sunday, 6 January 2013 at 21:14:08 UTC, Dmitry Olshansky wrote:
> 07-Jan-2013 00:51, Era Scarecrow wrote:

>> Nearly done. Got a working LZW, multi-level huffman working, only cleanup/refactoring to do, and a way to save/load the tree structure for external storage.
>>
>> Do I assume you would want me to post it? (Even incomplete and lacking a good portion of documentation/unittests)
>
> Sure thing! I'd love to check it out.

 Here you go. It's about 16k currently (5k for LZW, 10k for MLH). I write on the fly so some of it isn't as thought out as it could be.

 https://github.com/rtcvb32/Side-Projects/blob/master/mlh.d

January 07, 2013

Re: LZW & Huffman - BitArray implimentations & examples

Posted by Dmitry Olshansky
in reply to Era Scarecrow

Dmitry Olshansky

Posted in reply to Era Scarecrow

07-Jan-2013 01:54, Era Scarecrow пишет:
> On Sunday, 6 January 2013 at 21:14:08 UTC, Dmitry Olshansky wrote:
>> 07-Jan-2013 00:51, Era Scarecrow wrote:
>
>>> Nearly done. Got a working LZW, multi-level huffman working, only
>>> cleanup/refactoring to do, and a way to save/load the tree structure
>>> for external storage.
>>>
>>> Do I assume you would want me to post it? (Even incomplete and
>>> lacking a good portion of documentation/unittests)
>>
>> Sure thing! I'd love to check it out.
>
>   Here you go. It's about 16k currently (5k for LZW, 10k for MLH). I
> write on the fly so some of it isn't as thought out as it could be.
>
>   https://github.com/rtcvb32/Side-Projects/blob/master/mlh.d

Thanks, unfortunately I got busy with my own stuff ... bookmarked for review.

-- 
Dmitry Olshansky

January 09, 2013

BitArray new design

Posted by Era Scarecrow
in reply to Era Scarecrow

Era Scarecrow

Posted in reply to Era Scarecrow

 Having insomnia has left me with a certain gifted curse. :P

 I've been thinking, and i think i've written BitArray all wrong, in order to better structure it i would break it up into four something parts. The following would remove the complication of 'isCompact' and the union, the code can be @safe (rather than @trusted).

 BitString, this is the main of BitArray, handles all basic operations but doesn't manage memory of any kind.

 BitArrayRange, which is a reference forwarding for a BitString including various range functions.

 Then we come to memory management. This can be done simply as wrappers.

 BitArray - Memory management and dynamically allocates/reallocates as needed
 BitArrayFixed - Template that includes a fixed size of memory for small memory optimization.

It goes something like... (API not yet decided fully, but you should get the idea)

BitString {
  //bitfields for start/ending
  size_t[]* bulkRaw; //pointer to the array?
  inout(ubyte[]) noEndianBulkRaw() inout {
    return cast(inout(ubyte[])) bulkRaw;
  }
  this(size[]* raw) {
    bulkRaw = raw;
  }

  //binaryOperators and opIndex, etc
  //length modifier, but no memory re-allocation allowed
}

//Template with bool isConst?
//BitStringRange(bool isConst) {
//  static if (isConst) { const(BitString)* array; }
//  else                { BitString* array; }

BitStringRange {
  BitString* array;
  ulong start, end;
  //forwarding operators for start/end
  //range functions
}

BitArray {
  BitString array;
  size_t[] rawMemory;
  alias array this;
  //include length modifiers and dynamic allocation
  void init() {}
  //opCast can convert to BitArrayFixed?
}

BitArrayFixed(int fixedSize) {
  BitString array;
  size_t[fixedSize] rawMemory;
  alias array this;

  void init(){ array = BitString(rawMemory); }
  //opCast can convert to BitArray?
}

 Thoughts and comments?

January 09, 2013

Re: BitArray new design

Posted by Era Scarecrow
in reply to Era Scarecrow

Era Scarecrow

Posted in reply to Era Scarecrow

On Wednesday, 9 January 2013 at 20:00:59 UTC, Era Scarecrow wrote:
> BitString {
>   //bitfields for start/ending
>   size_t[]* bulkRaw; //pointer to the array?
>   //binaryOperators and opIndex, etc
>   //length modifier, but no memory re-allocation allowed
> }
>
> BitArray {
>   BitString array;
>   size_t[] rawMemory;
>   alias array this;
>   //include length modifiers and dynamic allocation
>   void init() {}
>   //opCast can convert to BitArrayFixed?
> }

 Hmm as I'm thinking about it more, a mixin for BitString may be better..

//    'size_t[] bulkRaw' required by enclosing struct/class
  mixin template BitString() {
    struct BitString {
    }
  }

  BitArray {
    mixin BitString;
    size_t[] bulkRaw;

    BitString array;
    alias array this;

    //length setter, handling allocation/reallocation
  }

  //since BitArray and BitArrayFixed may have different pointer/setups?
  //Or would either work together?
  BitArrayRange(BS, bool isConst) {
    BS* bitString;
  }

  If anyone wants to talk live with me, I tend to be on YIM, so rtcvb32 or Era_Scarecrow.

January 09, 2013

Re: BitArray new design

Posted by Era Scarecrow
in reply to Era Scarecrow

Era Scarecrow

Posted in reply to Era Scarecrow

On Wednesday, 9 January 2013 at 21:16:37 UTC, Era Scarecrow wrote:
>  Hmm as I'm thinking about it more, a mixin for BitString may be better..

 I'm having troubles trying to understand why this won't work. Anyone clarify?



mixin template BitString()
{
    struct BitString {
        size_t opIndex(int i) {
//Error: this for bulkRaw needs to be type BitArray not type BitString
            return bulkRaw[i];
        }
    }
}

struct BitArray {
    mixin BitString;
    size_t[] bulkRaw;
    BitString ba;
}

January 10, 2013

Re: BitArray new design

Posted by Dmitry Olshansky
in reply to Era Scarecrow

Dmitry Olshansky

Posted in reply to Era Scarecrow

10-Jan-2013 01:47, Era Scarecrow пишет:
> On Wednesday, 9 January 2013 at 21:16:37 UTC, Era Scarecrow wrote:
>>  Hmm as I'm thinking about it more, a mixin for BitString may be better..
>

Overall direction is good (I mean separation of ranges/containers). I'll post more comments once I think it through.The minor detail is that BitString sounds like a individual container whereas it's a "binary view of packed bools".

>   I'm having troubles trying to understand why this won't work. Anyone
> clarify?
>
>
>
> mixin template BitString()
> {
>      struct BitString {
>          size_t opIndex(int i) {
> //Error: this for bulkRaw needs to be type BitArray not type BitString
>              return bulkRaw[i];
>          }
>      }
> }

You want to mixin a definition of a inner struct BitString into each bit array?
Or include all of functions of BitString to be mixed into BitArray?

If the latter then:

 mixin template BitString()
 {
      size_t opIndex(int i) {
              return bulkRaw[i];
          }
//....
 }

i.e. no struct it's just a template.

> struct BitArray {
>      mixin BitString;
>      size_t[] bulkRaw;
>      BitString ba;
> }
>

-- 
Dmitry Olshansky

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation