July 29, 2012
On 29-Jul-12 03:16, Era Scarecrow wrote:
> On Saturday, 28 July 2012 at 23:02:17 UTC, Dmitry Olshansky wrote:
>> On 29-Jul-12 02:31, Era Scarecrow wrote:
>>> On Saturday, 28 July 2012 at 22:19:05 UTC, Dmitry Olshansky wrote:
>>>> Mhm? Not getting you it at all. What's locking it ?
>>>>
>>>> What's wrong with:
>>>> struct BitArray
>>>> {
>>>>
>>>> union {
>>>>    struct {
>>>>        size_t[] data;
>>>>        ulong length;    //in bits
>>>>
>>>> //length in words would be just
>>>> // (length + size_t.sizeof*4) / // (size_t.sizeof*8);
>>>> // as it's a power of 2 it should get optimized
>>>>
>>>>    }
>>>>    size_t[2+ulong.sizeof/size_t.sizeof] compact;
>>>>
>>>> // and say 1 bit is used for isCompact would be great if it could use
>>>> // the least significant bit of pointer from data[], dunno if it's
>>>> easy enough
>>>
>>> I'd rather not rely on the pointer at all as a reference for if it's
>>> compact.
>>
>> No problem, I redesigned a bit:
>>
>> struct BitArray
>> {
>>
>> union {
>>     struct {
>>         ulong length;    //in bits, the top bit used as isCompact
>>         size_t[] data;
>>     }
>>     struct{
>>     ushort smallLength; //here as well
>>     ubyte[size_t.sizeof*2+ulong.sizeof-2] compact;
>>     }
>> }
>>
>> Up 2^63 of bits would occupy 2^60 of bytes should be good enough for
>> the years to come.
>>  I'm not seeing it as a problem as at very least processors have to
>> actually be able to address more then 42 (or was it 48?) bits that
>> they can now. (see AMD or Intel or ARM datasheets on 64 bit chips)
>
> 2^63, or drop 8 so it takes up no more than 2^55 bytes of memory. Yes
> that's quite a large chunk.
>
>   Curiously enough the current structure is already smaller than the
> proposed unions.

OK. Now back to the biggest question:
Slices use references (sometimes) to previous bitarray. Since it's a slice (like an array) this could be expected.

The "sometimes" part is not god enough! The problem is that small string optimization (or small array optimization) doesn't work with reference semantics. You may dig up discussion on proposal for small string optimization for char[] and string that was shot down on these grounds.
Bulit-in like semantics call for simple ptr/ptr+length struct and you can't get that can you?.

E.g.

void mess_with(BitArray ba)
{
ba ~= true;
ba ~= false;
 //does it change anything outside this function? *maybe*
ba[0] ^= true;
ba[1] ^= true;
auto x = ba; //even if used ref BitArray ba, x *maybe* references ba
//is x now a reference or a copy? *maybe*

}

Thus BitArray either becomes value type (that may come as unexpected, see your option c/d). Or it becomes full reference type as in option a (though it sucks).
Sorry but b is no go, as it makes code unpredictable I'd rather take my idea with separate ranges then b.

Solutions:
 a) Drop the union, make all functions @safe (compared to @trusted), and they are all ref/slices
 b) Leave as sometimes ref, sometimes not; Use .dup when you NEED to ensure different unique copies.
 c) Apply COW (Copy-On-Write) where a known slice type (canExpand is false) when a change would apply, it replaces the present slice into it's own unique array.
 d) All slices become new dup'ed allocated arrays (Most expensive of them all)

Because of d, I proposed that slices be separate type that is always a reference to original. Then only coping arrays themselves involves dup.

>
>> Begin takes extra ulong I'm not sure it is needed in the container
>> itself.
>
>   No it doesn't. bitfields are used to handle the beginning/ending
> offsets, equaling a total of 14 bits. It's the most confusing part
> (aside from [offset + sliceoffset + i[ formulas). canExpand and
> isCompact fill the other two bits.

OK, finally I think I understand how your current version works. It's clever but leaves you in half-way value-reference state.

To point out few problems with unification of slice type and BitArray:

- Every time a small slice is taken say [1..30] it becomes a small array(?) thus it is a value type and doesn't affect original array

- No idea if a = b make a copy will make programmers nervous, esp if one is writing a library function as in this case there is no assumption that fits all

- Same thing but more, does this work?
	auto c = ba; //if ba is compact
	ba ~= true;
	assert(c.length = ba.length-1);
	//c should remain with the same length unaffected (at least like built in arrays)

-- 
Dmitry Olshansky
July 29, 2012
On Sunday, 29 July 2012 at 06:27:32 UTC, Dmitry Olshansky wrote:
> OK. Now back to the biggest question:
> Slices use references (sometimes) to previous bitarray. Since it's a slice (like an array) this could be expected.
>
> The "sometimes" part is not god enough! The problem is that small string optimization (or small array optimization) doesn't work with reference semantics. You may dig up discussion on proposal for small string optimization for char[] and string that was shot down on these grounds.

> Bulit-in like semantics call for simple ptr/ptr+length struct and you can't get that can you?.>
> E.g.
>
> void mess_with(BitArray ba)
> {
> ba ~= true;
> ba ~= false;
>  //does it change anything outside this function? *maybe*

If it was a previous slice, then it never chances outside and appending would automatically allocate new memory.

> ba[0] ^= true;
> ba[1] ^= true;
> auto x = ba; //even if used ref BitArray ba, x *maybe* references ba
> //is x now a reference or a copy? *maybe*
>
> Thus BitArray either becomes value type (that may come as unexpected, see your option c/d). Or it becomes full reference type as in option a (though it sucks).

> Sorry but b is no go, as it makes code unpredictable I'd rather take my idea with separate ranges then b.

> Solutions:
>  a) Drop the union, make all functions @safe (compared to @trusted), and they are all ref/slices
>  b) Leave as sometimes ref, sometimes not; Use .dup when you NEED to ensure different unique copies.
>  c) Apply COW (Copy-On-Write) where a known slice type (canExpand is false) when a change would apply, it replaces the present slice into it's own unique array.
>  d) All slices become new dup'ed allocated arrays (Most expensive of them all)

 Agreed, B is a bad choice. I'm going to suggest c myself, as changes would only propagate to a new true copy, otherwise it's read-only as a slice.

> Because of d, I proposed that slices be separate type that is always a reference to original. Then only coping arrays themselves involves dup.

 Mmmm... i think you're right...

>> No it doesn't. bitfields are used to handle the beginning/ending offsets, equaling a total of 14 bits. It's the most confusing part (aside from [offset + sliceoffset + i] formulas). canExpand and isCompact fill the other two bits.
>
> OK, finally I think I understand how your current version works. It's clever but leaves you in half-way value-reference state.

 I didn't want to originally use the 1-byte per startBit/endingBit, but it does actually work when you have the functions working.

> To point out few problems with unification of slice type and BitArray:
>
> - Every time a small slice is taken say [1..30] it becomes a small array(?) thus it is a value type and doesn't affect original array

 No, slices of compact arrays remain compact arrays. slices of larger arrays are reference until length is set, dupped, or appended to (forcing a resize/dup)

> - No idea if a = b make a copy will make programmers nervous, esp if one is writing a library function as in this case there is no assumption that fits all

 If it's compact, it's an auto separate copy/valuetype. If it's not compact, it's a reference to the original.

> - Same thing but more, does this work?
> 	auto c = ba; //if ba is compact
> 	ba ~= true;

 ba may or may not be a normal/compact at this point

> 	assert(c.length = ba.length-1);
> 	//c should remain with the same length unaffected (at least like built in arrays)

 Most likely it would be compact afterwards (Even if it allocated memory).

 I guess after going this over, two things need to happen. First it needs to be non-reference type, I don't want to go option d route, so I'll go for c instead. On a copy I'll add a postblit that makes the 'copy' now a slice which will create a new array when it changes in any way. Only question is when you pass the BitArray to a function as you showed, it would still think it's the original, unless the postblit/opAssign (or similar) can handle that. Hmmm time for some testing.
July 29, 2012
On Sunday, 29 July 2012 at 06:27:32 UTC, Dmitry Olshansky wrote:
> Thus BitArray either becomes value type (that may come as unexpected, see your option c/d). Or it becomes full reference type as in option a (though it sucks). Sorry but b is no go, as it makes code unpredictable I'd rather take my idea with separate ranges then b.

 I've had a thought. Why not make it fully reference and at the same time not? Consider making a slice which is the real BitArray, then you have an outer container that when you want value semantics you use it and it would make appropriate copies every time it needed to (including when first put in the container). It would be implicitly convertible to a slice. So...

BitArraySlice {
  //real bitarray stuff and pointer ref
}

BitArray { //value semantics wrapper?
  BitArraySlice ba;
  alias ba this;

  //include wrapper versions of binary operators

  this(this) {
  }
}

 Doing it this way the arguments being passed as reference slices would ensure unneeded copies wouldn't be made. On the other hand a smaller portion of the code would need to be updated. Along with that a slice could be set as COW allowing quite a bit of flexibility.

 Thoughts?
July 29, 2012
On 29-Jul-12 12:55, Era Scarecrow wrote:
> On Sunday, 29 July 2012 at 06:27:32 UTC, Dmitry Olshansky wrote:
>> Thus BitArray either becomes value type (that may come as unexpected,
>> see your option c/d). Or it becomes full reference type as in option a
>> (though it sucks). Sorry but b is no go, as it makes code
>> unpredictable I'd rather take my idea with separate ranges then b.
>
>   I've had a thought. Why not make it fully reference and at the same
> time not? Consider making a slice which is the real BitArray, then you
> have an outer container that when you want value semantics you use it
> and it would make appropriate copies every time it needed to (including
> when first put in the container). It would be implicitly convertible to
> a slice. So...
>
> BitArraySlice {
>    //real bitarray stuff and pointer ref
> }
>
> BitArray { //value semantics wrapper?
>    BitArraySlice ba;
>    alias ba this;
>
>    //include wrapper versions of binary operators
>
>    this(this) {
>    }
> }
>
>   Doing it this way the arguments being passed as reference slices would
> ensure unneeded copies wouldn't be made. On the other hand a smaller
> portion of the code would need to be updated. Along with that a slice
> could be set as COW allowing quite a bit of flexibility.
>
>   Thoughts?

Hm... and small array optimizations go where? BitArraySlice won't be a full reference type if it does it.

-- 
Dmitry Olshansky
July 29, 2012
On Sunday, 29 July 2012 at 09:03:08 UTC, Dmitry Olshansky wrote:
> On 29-Jul-12 12:55, Era Scarecrow wrote:
>> Doing it this way the arguments being passed as reference slices would ensure unneeded copies wouldn't be made. On the other hand a smaller portion of the code would need to be updated. Along with that a slice could be set as COW allowing quite a bit of flexibility.
>>
>>  Thoughts?
>
> Hm... and small array optimizations go where? BitArraySlice won't be a full reference type if it does it.

 I was thinking about that; If need be the slice can point to another slice which holds the actual small array (Kinda a long workaround). The way to small array optimization to work would be if BitArray generated it, since when you go to slicing you either reference the original data in the slice, or you could drop the small array optimization; although a loss it would simplify code until you come back to it once the code is working the way it should.

 Another idea is to convert the struct into a class instead, which would allow some options, but won't be a perfect fit, and inheritance could be used; which would solve some of the small array optimization, but at what cost?
July 29, 2012
On 29-Jul-12 13:18, Era Scarecrow wrote:
> On Sunday, 29 July 2012 at 09:03:08 UTC, Dmitry Olshansky wrote:
>> On 29-Jul-12 12:55, Era Scarecrow wrote:
>>> Doing it this way the arguments being passed as reference slices
>>> would ensure unneeded copies wouldn't be made. On the other hand a
>>> smaller portion of the code would need to be updated. Along with that
>>> a slice could be set as COW allowing quite a bit of flexibility.
>>>
>>>  Thoughts?
>>
>> Hm... and small array optimizations go where? BitArraySlice won't be a
>> full reference type if it does it.
>
>   I was thinking about that; If need be the slice can point to another
> slice which holds the actual small array (Kinda a long workaround). The
> way to small array optimization to work would be if BitArray generated
> it, since when you go to slicing you either reference the original data
> in the slice, or you could drop the small array optimization; although a
> loss it would simplify code until you come back to it once the code is
> working the way it should.
>
>   Another idea is to convert the struct into a class instead, which
> would allow some options, but won't be a perfect fit, and inheritance
> could be used; which would solve some of the small array optimization,
> but at what cost?

I have simpler suggestion. Maybe doing it as 2 containers:

BitSet is a plain value type with small array optimization (what you called BitArray in last 2 posts) and no slicing (only as opSlice() i.e. as a whole). Best used for small things under < 1000 bits (as even dup will be cheap enough).

BitArray is class or struct that has reference smenatics and drops small array optimization in favor of having simple & fast slicing
Best used for big arrays with no less then 100-200 bits.


-- 
Dmitry Olshansky
July 29, 2012
On Sunday, 29 July 2012 at 09:30:06 UTC, Dmitry Olshansky wrote:
>
> I have simpler suggestion. Maybe doing it as 2 containers:
>
> BitSet is a plain value type with small array optimization (what you called BitArray in last 2 posts) and no slicing (only as opSlice() I.e. as a whole). Best used for small things under < 1000 bits (as even dup will be cheap enough).
>
> BitArray is class or struct that has reference smenatics and drops small array optimization in favor of having simple & fast slicing Best used for big arrays with no less then 100-200 bits.

 As long as you only use one or the other then it sounds great; I'm sure a conversion/constructor would be available to convert from one to another. That does seem like the best solution, but also feels like we're building it twice (Little code re-use).

 Might be a third option: A template struct that represents a BitArray type (with 90% of the code); you then insert a structure with key functions that would handle the value/reference & duplication part.

 I'll meditate on it for a few days and see if anything pops up.
July 29, 2012
On 29-Jul-12 14:28, Era Scarecrow wrote:
> On Sunday, 29 July 2012 at 09:30:06 UTC, Dmitry Olshansky wrote:
>>
>> I have simpler suggestion. Maybe doing it as 2 containers:
>>
>> BitSet is a plain value type with small array optimization (what you
>> called BitArray in last 2 posts) and no slicing (only as opSlice()
>> I.e. as a whole). Best used for small things under < 1000 bits (as
>> even dup will be cheap enough).
>>
>> BitArray is class or struct that has reference smenatics and drops
>> small array optimization in favor of having simple & fast slicing Best
>> used for big arrays with no less then 100-200 bits.
>
>   As long as you only use one or the other then it sounds great; I'm
> sure a conversion/constructor would be available to convert from one to
> another. That does seem like the best solution, but also feels like
> we're building it twice (Little code re-use).
>

Yup.

>   Might be a third option: A template struct that represents a BitArray
> type (with 90% of the code); you then insert a structure with key
> functions that would handle the value/reference & duplication part.
>

Sounds a lot like mixin template. I'd try to invent some small set of primitive operations and auto-magically construct the others from it.

>   I'll meditate on it for a few days and see if anything pops up.

And just to keep your mind busy :) how about adding SIMD to speed it all up:
http://dlang.org/simd.html
It's kind of new & cool thing but not very well documented.

-- 
Dmitry Olshansky
July 29, 2012
On Sunday, 29 July 2012 at 10:34:50 UTC, Dmitry Olshansky wrote:
>> Might be a third option: A template struct that represents a BitArray type (with 90% of the code); you then insert a structure with key functions that would handle the value/reference & duplication part.
>>
>
> Sounds a lot like mixin template. I'd try to invent some small set of primitive operations and auto-magically construct the others from it.

 Pretty much...  syntactical sugar, just add tea (T) I made a funny! :D

>> I'll meditate on it for a few days and see if anything pops up.
>
> And just to keep your mind busy :) how about adding SIMD to speed it all up:
> http://dlang.org/simd.html
> It's kind of new & cool thing but not very well documented.

 Mmm I'll look it over. If it can work for larger bulk operations I don't see why not.
July 29, 2012
On Sunday, 29 July 2012 at 09:30:06 UTC, Dmitry Olshansky wrote:
> I have simpler suggestion. Maybe doing it as 2 containers:
>
> BitSet is a plain value type with small array optimization (what you called BitArray in last 2 posts) and no slicing (only as opSlice() I.e. as a whole). Best used for small things under < 1000 bits (as even dup will be cheap enough).
>
> BitArray is class or struct that has reference semantics and drops small array optimization in favor of having simple & fast slicing Best used for big arrays with no less then 100-200 bits.

 Another thought coming to mind, is to have the BitArray with a few different modes it could operate in. One of the modes could be 'reference' which would make slices and them all work exactly as you would expect, while with referencing off it would act as a normal value type struct, since the mode/flags would modify how the postblit and assignment operators worked, along with the length. (Reference wouldn't use small array optimization).

 But having them statically separated by name/type seems much more likely to be safer in the long run with reliable results.