repack ubyte[] to use only 7 bits

Is there a standard way to do this?  The code below is untested, as I haven't yet written the x7to8 routine, and came up with a better way to do what this was to accomplish, but it feels as if this should be somewhere in the standard library, if I could only find it.

/** Repack the data from an array of ubytes into an array of ubytes of
 * which only the last 7 are significant.  The high bit will be set only
 * if the byte would otherwise be zero.    */
byte[]    x8to7 (ubyte[] bin)
{
    ubyte[] bout;
    //    bit masks:    0 => 0xfe = 11111110, 0x00 = 00000000
    //                1 => 0x7f = 01111111, 0x00 = 00000000
    //                2 => 0x3f = 00111111, 0x80 = 10000000
    //                3 => 0x1f = 00011111, 0xc0 = 11000000
    //                4 => 0x0f = 00001111, 0xe0 = 11100000
    //                5 => 0x07 = 00000111, 0xf0 = 11110000
    //                6 => 0x03 = 00000011, 0xf8 = 11111000
    //                7 => 0x01 = 00000001, 0xfc = 11111100
    if (bin.length < 1)    return    bout;
    int    fByte, fBit;
    while    (fByte < bin.length)
    {    if (fByte + 1 == bin.length && fBit > 1)  break;
        ubyte    b;
        switch (fBit)
        {    case    0:
                b    =    bin[fByte]    / 2;
                break;
            case    1:
                b    =    bin[fByte] & 0x7f;
                break;
            case    2:
                ubyte    b1    =    (bin[fByte] & 0x3f) << 1;
                ubyte    b2    =    (bin[fByte + 1] & 0x80) >>> 7;
                b    ~=    (b1 | b2);
                break;
            case    3:
                ubyte    b1    =    (bin[fByte] & 0x1f) << 2;
                ubyte    b2    =    (bin[fByte + 1] & 0xc0) >>> 6;
                b    ~= (b1 | b2);
                break;
            case    4:
                ubyte    b1    =    (bin[fByte] & 0x0f) << 3;
                ubyte    b2    =    (bin[fByte + 1] & 0xe0) >>> 5;
                b    ~= (b1 | b2);
                break;
            case    5:
                ubyte    b1    =    (bin[fByte] & 0x07) << 4;
                ubyte    b2    =    (bin[fByte + 1] & 0xf0) >>> 4;
                b    ~= (b1 | b2);
                break;
            case    6:
                ubyte    b1    =    (bin[fByte] & 0x03) << 5;
                ubyte    b2    =    (bin[fByte + 1] & 0xf8) >>> 3;
                b    ~= (b1 | b2);
                break;
            case    7:
                ubyte    b1    =    (bin[fByte] & 0x01) << 6;
                ubyte    b2    =    (bin[fByte + 1] & 0xfc) >>> 2;
                b    ~= (b1 | b2);
                break;
            default:
                assert (false, "This path should never be taken");
        }    //    switch (fBit)
        if    (b == 0)    bout    ~=    0x80;
        else            bout    ~=    b;
        fBit    =    fBit + 7;
        if    (fBit > 7)
        {    fByte++;
            fBit -=    7;
        }
    }
}

December 07, 2014

Re: repack ubyte[] to use only 7 bits

Posted by Charles Hixson
in reply to bearophile

Permalink

Charles Hixson

Posted in reply to bearophile

Permalink

Your comments would be reasonable if this were destined for a library, but I haven't even finished checking it (and probably won't since I've switched to a simple zero elimination scheme).  But this is a bit specialized for a library...a library should probably deal with arbitrary ints from 8 to 64 or 128 [but I don't think that the 128 bit type is yet standard, only reserved].  I just thought that something like that should be available, possibly along the lines of Python's pack and unpack, and wondered where it was and what it was called.)

Additionally, I'm clearly not the best person to write the library version, as I still have LOTS of trouble with D templates.  And I have not successfully wrapped my mind around D ranges...which is odd, because neither Ruby nor Python ranges give me much trouble. Perhaps its the syntax.

As for " pure", "@safe", and "nothrow" ... I'd like to understand that I COULD use those annotations.  (The "in" I agree should be applied.  I understand that one.)

As for size_t for indexes...here I think we disagree.  It would be a bad mistake to use an index that size.  I even considered using short or ushort, but I ran into a comment awhile back saying that one should never use those for local variables.  This *can't* be an efficient enough way that it would be appropriate to use it for a huge array...but that should probably be documented if it were for a library.  (If I were to use size_t indexing, I'd want to modify things so that I could feed it a file as input, and that's taking it well away from what I was building it for:  converting input to a redis database so that I could feed it raw serial data streams without first converting it into human readable formats.  I wanted to make everything ASCII-7 binary data, which, when I thought about it more, was overkill.  All I need to do is eliminate internal zeros, since C handles various extended character formats by ignoring them.

I'm not clear what you mean by a "final switch".  fBit must adopt various different values during execution.  If you mean it's the same as a nest of if...else if ... statements, that's all I was really expecting, but I thought switch was a bit more readable.

Binary literals would be more self-documenting, but would make the code harder to read.  If I'd though of them I might have used them...but maybe not.

Output range?  Here I'm not sure what you're suggesting, probably because I don't understand D ranges.

The formatting got a bit messed up during pasting from the editor to the mail message.  I should have looked at it more carefully.  My standard says that unless the entire block is on a single line, the closing brace should align with the opening brace.  And I use tabs for spacing which works quite well in the editor, but I *do* need to remember to convert it to spaces before doing a cut and paste.

Thanks for your comments.  I guess that means that there *isn't* a standard function that does this.
Charles

On 12/06/2014 03:01 PM, bearophile via Digitalmars-d-learn wrote:
> Charles Hixson:
>
>> byte[]    x8to7 (ubyte[] bin)
>
> Better to add some annotations, like pure, @safe, nothrow, if you can, and to annotate the bin with an "in".
>
>
>>     int    fByte, fBit;
>
> It's probably better to define them as size_t.
>
>
>
>>         switch (fBit)
>
> I think D doesn't yet allow this switch to be _meaningfully_ a final switch.
>
>
>>                 b    =    bin[fByte] & 0x7f;
>
> D allows binary number literals as 0b100110010101.
>
>
>>                 b    ~=    (b1 | b2);
>
> Perhaps an output range is better?
>
>
>>         if    (b == 0)    bout    ~= 0x80;
>>         else            bout    ~=    b;
>>         fBit    =    fBit + 7;
>>         if    (fBit > 7)
>>         {    fByte++;
>>             fBit -=    7;
>
> The formatting seems a bit messy.
>
> Bye,
> bearophile
>

Forums