Variable-Length Bit-Level Encoding

Nov 12, 2016

Nordlöw

Nov 12, 2016

Nordlöw

Nov 12, 2016

Nov 13, 2016

Nov 13, 2016

Nov 14, 2016

I'm looking for libraries/snippets (either in D or similar languages) that perform variable-length encoding of unsigned integers onto a bit-stream. Requirement is that smaller inputs (integer values) should be encoded with equal or fewer bits. This 0 => [0] 1 => [1,0] 2 => [1,1,0] is easy but assumes a too extreme input value distribution. Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed?

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote: > Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed? I don't recall the name, but there is an algorithm for encoding data of an arbitrary number of bytes/bits into a stream of octets. It reserves the MSB of each octet for use as a marker. If it is set to 1, then there are yet more bits to read. If it is set to 0, then this is the last group of bits. This enables each octet to carry 7 bits at a time, while allowing you to encode data of any bit size into an octet stream. You just need to break your data down into groups of 7 bits.

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote: > 0 => [0] > 1 => [1,0] > 2 => [1,1,0] > > is easy but assumes a too extreme input value distribution. > > Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed? Hmm off hand I'm recalling there's a different encoding that uses 0's to determine how long the encoding is, and then the first 1 breaks it off to the actual encoding. Great for short ints. 1 = 0 010 = 1 011 = 2 00101 = 4 000011111 = 30 https://en.wikipedia.org/wiki/Exponential-Golomb_coding If you know the range of the data, I'd almost be more inclined to do range encoding mixed with arithmetic coding.

November 13, 2016

Re: Variable-Length Bit-Level Encoding

Posted by LiNbO3
in reply to James Buren

Permalink

LiNbO3

Posted in reply to James Buren

Permalink

On Saturday, 12 November 2016 at 19:19:58 UTC, James Buren wrote:
> On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
>> Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed?
>
> I don't recall the name, but there is an algorithm for encoding data of an arbitrary number of bytes/bits into a stream of octets. It reserves the MSB of each octet for use as a marker. If it is set to 1, then there are yet more bits to read. If it is set to 0, then this is the last group of bits. This enables each octet to carry 7 bits at a time, while allowing you to encode data of any bit size into an octet stream. You just need to break your data down into groups of 7 bits.

Maybe the LEB128 encoding [1] ?

[1] https://en.wikipedia.org/wiki/LEB128

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote: > I'm looking for libraries/snippets (either in D or similar languages) that perform variable-length encoding of unsigned integers onto a bit-stream. Requirement is that smaller inputs (integer values) should be encoded with equal or fewer bits. > > This > > 0 => [0] > 1 => [1,0] > 2 => [1,1,0] > > is easy but assumes a too extreme input value distribution. > > Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed? If you have a sample of your data, perhaps Huffman codes (https://en.wikipedia.org/wiki/Huffman_coding) might be an option?

Forums