August 07, 2014
On 8/6/14, 6:40 PM, Era Scarecrow wrote:
> When I re-wrote the BitArray I personally think it is an improvement in
> many ways, and being a re-write you can't just do bug #xxxx and then bug
> #xxxx and then bug... etc etc.

Welcome back!

Regarding BitArray, when I implemented HeapBlock in https://github.com/andralex/phobos/blob/allocator/std/allocator.d I needed a way to "see" a ulong[] as an array of bits, with primitives such as "find a run of at least n zeros" and "set n bit to zero or one" etc. A "bit range" abstraction built on top of a ulong[] would have been quite useful. It would have the classic range primitives but also functions specific to runs of bits, which are unique to it.

I wanted to sit down and extract the abstraction at some point, but didn't get to it yet. Sounds like a great thing to include in std.bitmanip.


Andrei

August 07, 2014
On Thursday, 7 August 2014 at 10:08:30 UTC, Marc Schütz wrote:
> I haven't looked at your rewrite, but from what I've seen the current implementation is indeed very awkward and full of bugs.
> ... This seems to imply that not even the semantics are completely clear. I guess a complete rewrite would be appropriate in such a situation.

On Thursday, 7 August 2014 at 14:24:13 UTC, Andrei Alexandrescu wrote:
> Welcome back!

 Thanks i guess :)

> I wanted to sit down and extract the abstraction at some point, but didn't get to it yet. Sounds like a great thing to include in std.bitmanip.


 So the gist of it, is there's still interest in the rewrite. Interesting...

 As for being able to find x number of bits that are 0 or 1 in a row, that both sounds easy and hard at the same time (easy if you don't mind it being slow). In my rewrite there was a bulk template I created that was intended to do huge speedups with binary data (when the data was naturally aligned correctly and large enough to use the binary operators on normal types). Some of the unittests and example code also used them in a read-only fashion that could be specialized for finding a certain type of pattern...


 As I'm trying to recall, I think the largest bug in my implementation was the confusion of when it should be reference vs value type, since when it was small enough it didn't need to allocate any data (64/128 bits or less) but larger than that it had to allocate data. Naturally this causes problems...

 Of course I also implemented it to be able to pass around a slice of a BitArray which was always reference...

 I guess the things to do is make sure the current bug fixes are included in my implementation (including BitFields, I had it programmed to accept default values), get it to merge with phobos, afterwards continue forward and try to make improvements and further bug fixes...
August 08, 2014
05-Aug-2014 13:30, Era Scarecrow пишет:
>   It's been a little while since I've been back. Got burned out and
> stuff. Currently watching all the Dconf 2014 videos (although I never
> finished the 2013 ones).
>
>   So, I don't suppose there's a short quick & dirty summary of what's
> happened in the last 18 months?

E-hm, not easy. Lots of good stuff...
@nogc, tons of bug fixes, better packages. And we still await on allocators ;)

>   Also somewhat curious if anyone took my BitArray updates and got them
> implemented in phobos or not. Quite annoying I still haven't gotten a
> full understanding of GitHub yet, and at this rate probably won't.

Glad to see you're back!

FYI
https://github.com/D-Programming-Language/phobos/pull/2248
https://github.com/D-Programming-Language/phobos/pull/2249

These were mostly bugfixes not trying to fix any design flaws.

-- 
Dmitry Olshansky
August 08, 2014
On Friday, 8 August 2014 at 21:35:59 UTC, Dmitry Olshansky wrote:
> E-hm, not easy. Lots of good stuff...
> @nogc, tons of bug fixes, better packages. And we still await on allocators ;)

 Hmm good allocation fixes would be nice... I know i'd like to see as much of the phobos library adjusted not to use the GC as possible, or custom allocators for bare metal programming and stuff where the OS can't help you.
> Era Scarecrow wrote:
>>  Also somewhat curious if anyone took my BitArray updates and got them implemented in phobos or not. Quite annoying I still haven't gotten a
>> full understanding of GitHub yet, and at this rate probably won't.
>
> Glad to see you're back!

 Thanks..

> FYI
> https://github.com/D-Programming-Language/phobos/pull/2248
> https://github.com/D-Programming-Language/phobos/pull/2249
>
> These were mostly bugfixes not trying to fix any design flaws.

 Yeah doesn't look like any of my code. Unfortunately due to the huge re-write i did most of those fixes go in the garbage (my rewrite probably covered most of them anyways).

 I think i'll concentrate on bitfields first, then when that's good i'll try and get BitArray in. As i recall my BitArray requires default assignment on bitfields, so i had to add that before my BitArray worked right. Depending on how much the CTFE has improved, i may be able to cut out extra bit/type enum i added which gives useful static information on the side i used for my BitArray. Far more likely the log2 can be reduced/removed...
August 08, 2014
On 8/7/14, 12:40 PM, Era Scarecrow wrote:
>   As for being able to find x number of bits that are 0 or 1 in a row,
> that both sounds easy and hard at the same time (easy if you don't mind
> it being slow). In my rewrite there was a bulk template I created that
> was intended to do huge speedups with binary data (when the data was
> naturally aligned correctly and large enough to use the binary operators
> on normal types). Some of the unittests and example code also used them
> in a read-only fashion that could be specialized for finding a certain
> type of pattern...

A thought: if whatever work on bit arrays you do isn't fast, it's probably not worth doing; people who opt for that kind of packing are most often performance motivated.

Alignment is often not an issue - you handle the setup/teardown misalignments separately and to the bulk 64 bits at a time.

Instead of a full-blown abstraction you may want to instead opt for defining some simple primitives using ulong[] that are accessible to people having data embedded in odd places. The bit operations in std.allocator would be good and practical.


Andrei

August 08, 2014
On Friday, 8 August 2014 at 22:28:56 UTC, Era Scarecrow wrote:
> On Friday, 8 August 2014 at 21:35:59 UTC, Dmitry Olshansky wrote:
>> FYI
>> https://github.com/D-Programming-Language/phobos/pull/2248
>> https://github.com/D-Programming-Language/phobos/pull/2249
>>
>> These were mostly bugfixes not trying to fix any design flaws.
>
>  Yeah doesn't look like any of my code. Unfortunately due to the huge re-write i did most of those fixes go in the garbage (my rewrite probably covered most of them anyways).
>

Yea, one of those was my code and the other was somebody else's PR that I revived since they weren't responding.

I was moving forward with the philosophy that we should make the existing implementation as correct as possible and leave new features to new designs.

I think it will be difficult to make a "one size fits all" BitArray that satisfies everybody's wishes.
E.g.:
Bit level slice operations versus performance.
Value semantics versus D slice semantics.
Having compatibility with other parts of phobos versus having a maximum of 2^35-1 bits on a 32 bit system.

This is not as bad making a one size fits all fixed point integer, but it's not pleasant either.
August 08, 2014
On Friday, 8 August 2014 at 22:43:38 UTC, Andrei Alexandrescu wrote:
> A thought: if whatever work on bit arrays you do isn't fast, it's probably not worth doing; people who opt for that kind of packing are most often performance motivated.
>
> Alignment is often not an issue - you handle the setup/teardown misalignments separately and to the bulk 64 bits at a time.
>
> Instead of a full-blown abstraction you may want to instead opt for defining some simple primitives using ulong[] that are accessible to people having data embedded in odd places. The bit operations in std.allocator would be good and practical.

 And now i'm confused. There's enhancement requests for making it so you can prepend or append to the bitarray really quick, which ensures it's not aligned right unless you rebuild the entire thing. Quite often you'll be working with only a few bits at a time vs large bulk operations, and if they don't align right somehow, then either bit-shifting has to be present and sort of a leap-frog approach, or they align at the same place so you can treat it more like an array...

 Of course you can always expand it so 1 bit becomes 1 byte, and then afterwards repack it into something smaller for storage purposes...




On Friday, 8 August 2014 at 22:46:44 UTC, safety0ff wrote:
> Yea, one of those was my code and the other was somebody else's PR that I revived since they weren't responding.
>
> I was moving forward with the philosophy that we should make the existing implementation as correct as possible and leave new features to new designs.

 Not surprising. We want to move forward/advance as much as possible. Kinda my internal motto when playing chess :P


> I think it will be difficult to make a "one size fits all" BitArray that satisfies everybody's wishes.
> E.g.:
> Bit level slice operations versus performance. Value semantics versus D slice semantics. Having compatibility with other parts of phobos versus having a  maximum of 2^35-1 bits on a 32 bit system.
>
> This is not as bad making a one size fits all fixed point integer, but it's not pleasant either.

 I recall trying to satisfy everyone's requests. The only thing that really came up was it was sometimes a reference (>64 bits) and sometimes a value type (<=64 bits) and that was confusing only because it wasn't consistent. That and someone wants to be able to have a static size that probably doesn't rely on allocation at all. Possible but not sure how to make it...

 As someone mentioned before, i think it comes down that we really aren't sure what it is really for, it hasn't been fully defined. Is it a storage container? An array? Is it mostly so we can foreach over the elements for something? Is it for bit packing/unpacking for compression? (Which is how i see it mostly). Encryption?


 Suddenly the idea of having it attached to a portion of memory, then that's converted to a byte array that can be treated as a normal array works fine, then just having it update the original array on a sync call or something comes to mind... That might work... But who owns the data? If the GC owns it, then you create a portion of memory and assign it, or you tell the BitArray this memory goes to this slice of the data, and 'do this' with it.

 Hmmm new ideas... but doesn't seem easy to do...
August 08, 2014
On Fri, Aug 08, 2014 at 11:30:07PM +0000, Era Scarecrow via Digitalmars-d wrote: [...]
>  I recall trying to satisfy everyone's requests. The only thing that
> really came up was it was sometimes a reference (>64 bits) and
> sometimes a value type (<=64 bits) and that was confusing only because
> it wasn't consistent.  That and someone wants to be able to have a
> static size that probably doesn't rely on allocation at all. Possible
> but not sure how to make it...
> 
>  As someone mentioned before, i think it comes down that we really
> aren't sure what it is really for, it hasn't been fully defined. Is it
> a storage container? An array? Is it mostly so we can foreach over the
> elements for something? Is it for bit packing/unpacking for
> compression? (Which is how i see it mostly). Encryption?
[...]

I think this indicates that perhaps there is not a single BitArray type that will satisfy everybody, but perhaps 2-3 related types that share some algorithms.  At least the following types come to mind (these are all temporary names so please don't bust out the rainbow bikeshed just yet):

- SmallBitArray: fits inside a ulong, and has value semantics. Behaves
  like a ulong: minimal cost for passing it around, making multiple
  copies of it, etc.. Basically for juggling bits in a ulong.

- StaticBitArray: a value type that fits inside n ulong's. Appends only
  work up to the max capacity of n ulong's. This one is mainly for
  people who need to manipulate bitmasks, perform bit operations en
  masse, etc.. Backed by an embedded static array, so it has value
  semantics, and you can pass it around, copy it, etc., without worrying
  about aliasing.

- DynamicBitArray: a reference type backed by the GC. This one is for
  people who are after something that behaves like D dynamic arrays;
  slicing is supported (by means of extra info about how many bits at
  the beginning/end of the array is actually part of an array, so you
  can slice it at arbitrary bit positions not aligned with word
  boundaries), as is arbitrary appending, etc.. The emphasis for this
  one is memory efficiency (e.g. for people who need to store massive
  numbers of bools without wasting an entire byte per bool), at the cost
  of potentially slower copying / overhead of maintaining start/end bit
  indices, etc..

Obviously, these types are closely related, so they will probably share quite a good number of common algorithms. So potentially they can be implemented as a core of common functions that can work on all 3 BitArray types, with a few customized methods for each type.


T

-- 
I've been around long enough to have seen an endless parade of magic new techniques du jour, most of which purport to remove the necessity of thought about your programming problem.  In the end they wind up contributing one or two pieces to the collective wisdom, and fade away in the rearview mirror. -- Walter Bright
August 09, 2014
On Friday, 8 August 2014 at 23:57:17 UTC, H. S. Teoh via Digitalmars-d wrote:
> - SmallBitArray: fits inside a ulong, and has value semantics. Behaves
>   like a ulong: minimal cost for passing it around, making multiple
>   copies of it, etc.. Basically for juggling bits in a ulong.

 Easy enough to make a struct that doesn't care, would be able to foreach over it and do direct access as long as you don't have to worry about where it starts/ends (the length). Just forcibly convert a type to a small bitarray and it works... slicing and other advanced features would be missing...

> - StaticBitArray: a value type that fits inside n ulong's. Appends only
>   work up to the max capacity of n ulong's. This one is mainly for
>   people who need to manipulate bitmasks, perform bit operations en
>   masse, etc.. Backed by an embedded static array, so it has value
>   semantics, and you can pass it around, copy it, etc., without worrying
>   about aliasing.

 I remember starting something like this, it started to become a template hell... And code bloat... I ended up deciding it was simpler to have one type that allowed slicing as part of it's design.

> - DynamicBitArray: a reference type backed by the GC. This one is for
>   people who are after something that behaves like D dynamic arrays;
>   slicing is supported (by means of extra info about how many bits at
>   the beginning/end of the array is actually part of an array, so you
>   can slice it at arbitrary bit positions not aligned with word
>   boundaries), as is arbitrary appending, etc.. The emphasis for this
>   one is memory efficiency (e.g. for people who need to store massive
>   numbers of bools without wasting an entire byte per bool), at the cost
>   of potentially slower copying / overhead of maintaining start/end bit
>   indices, etc..
>
> Obviously, these types are closely related, so they will probably share quite a good number of common algorithms. So potentially they can be implemented as a core of common functions that can work on all 3 BitArray types, with a few customized methods for each type.

 So UFCS rather than be template functions inside of template structs... that would simplify separating the storage vs shared algorithm/design. Yeah i think i can do that... And hopefully satisfy all three basic types.

 So to decide how it identifies these, what makes up a BitArray. For simplicity probably readable fields, being bitslicestart and bitslicelength. Finally a flag that says if it's dynamic or not. bitslice_isdynamic... Probably those three things (Unless someone else has a better idea).

 After that what operators do they all support... Definitely basic opIndex get/set. And being able to foreach over them, or a wrapper to make it an input/output range.

 If anyone wants to help with the exact details for what the requirements of all the BitArray types are, and a good list of what kind of unittests we should use to verify it's usage and correctness then i'm all ears.
August 09, 2014
 But naturally before i can do anything i need to re-familiarize myself with Git and GitHub... and immediately remember why i pretty much rage-quit 18 months ago...

 Is there anyone who i can refer to via IM/PM or some quick method when i start using this stuff to get it sorted out (rather than these forums)?

 Although it's pretty likely i'll either start a brand new fork/repository of phobos or destroy my current saved changes (commits, fixes, everything) as they will be a pain to figure out since i never could get it to work last time... even after i rebased it last time...