Thread overview | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 14, 2018 Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
๐, I'm glad to announce that ecoji-d - pure D implementation of ecoji encoding version 1๏ธโฃ.0๏ธโฃ.0๏ธโฃ is finally releasedโ What is ecoji? Ecoji encodes data as base1024 with an emoji character set. It can be used instead of boring and old base64 ๐คฎ๐คฎ๐คฎ. Encoding example: --- $ echo "Base64 is so 1999, isn't there something better?" | ecoji-d ๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ๐ผ๐ฆ๐๐ฅด๐ --- And decoding: --- $ echo -n "๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ๐ผ๐ฆ๐๐ฅด๐" | ecoji-d -d Base64 is so 1999, isn't there something better? --- Ecoji-d's features: โ๏ธ Range interface โ๏ธ Lazy encoding/decoding โ๏ธ Low memory usage โ๏ธ @safe and pure when possible โ๏ธ Many tests โ๏ธ Can be used as a library and as a CLI utility API consists of just 2๏ธโฃ functions: ๐ `encode`, which does encoding ๐ `decode`, which does decoding Links: ๐ฆ DUB package page: http://code.dlang.org/packages/ecoji-d ๐ GitHub repository: https://github.com/ohdatboi/ecoji-d ๐ค GitHub repository of the reference Go implementation: https://github.com/keith-turner/ecoji |
March 15, 2018 Re: Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anton Fediushin | On Wednesday, 14 March 2018 at 17:30:18 UTC, Anton Fediushin wrote:
> ๐, I'm glad to announce that ecoji-d - pure D implementation of ecoji encoding version 1๏ธโฃ.0๏ธโฃ.0๏ธโฃ is finally releasedโ
>
> What is ecoji?
>
> Ecoji encodes data as base1024 with an emoji character set. It can be used instead of boring and old base64 ๐คฎ๐คฎ๐คฎ.
>
> Encoding example:
>
> ---
> $ echo "Base64 is so 1999, isn't there something better?" | ecoji-d
> ๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ๐ผ๐ฆ๐๐ฅด๐
> ---
>
> And decoding:
>
> ---
> $ echo -n "๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ๐ผ๐ฆ๐๐ฅด๐" | ecoji-d -d
> Base64 is so 1999, isn't there something better?
> ---
>
>
> Ecoji-d's features:
>
> โ๏ธ Range interface
> โ๏ธ Lazy encoding/decoding
> โ๏ธ Low memory usage
> โ๏ธ @safe and pure when possible
> โ๏ธ Many tests
> โ๏ธ Can be used as a library and as a CLI utility
>
>
> API consists of just 2๏ธโฃ functions:
>
> ๐ `encode`, which does encoding
> ๐ `decode`, which does decoding
>
>
> Links:
>
> ๐ฆ DUB package page: http://code.dlang.org/packages/ecoji-d
> ๐ GitHub repository: https://github.com/ohdatboi/ecoji-d
> ๐ค GitHub repository of the reference Go implementation: https://github.com/keith-turner/ecoji
Fun, but seems pretty useless in practice.
|
March 15, 2018 Re: Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
Posted in reply to bauss | On Thursday, 15 March 2018 at 09:32:50 UTC, bauss wrote:
> Fun, but seems pretty useless in practice.
I disagree. Ecoji (base1024) has bigger character set meaning that it can encode more information per emoji than base64 can encode per character.
For example ecoji encoded "abcde" looks like this: "๐๐ธ๐ฆ๐ญ"
And base64 encoded one looks like this: "YWJjZGU=".
Even though each emoji is 4 bytes long, there is a noticable difference in size when we are talking about larger chunks of data:
---
$ dd if=/dev/urandom bs=4K count=16K of=test.raw
16384+0 records in
16384+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 1.90423 s, 35.2 MB/s
$ dd if=test.raw | ./ecoji-d | wc -c
67108864 bytes (67 MB, 64 MiB) copied, 6.7699 s, 9.9 MB/s
71591534 # Size increased just by 6%
$ dd if=test.raw | base64 | wc -c
67108864 bytes (67 MB, 64 MiB) copied, 0.750174 s, 89.5 MB/s
90655837 # 35%(!) increase in size
---
And if we move to real word scenarios, where web pages are gzip'ped most of the time:
---
$ dd if=test.raw | gzip -c | wc -c
67108864 bytes (67 MB, 64 MiB) copied, 5.49022 s, 12.2 MB/s
67119122 # Raw files are terrible for compression
$ dd if=test.raw | ./ecoji-d | gzip -c | wc -c
67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s
32178275 # 48% improvement
$ dd if=test.raw | base64 | gzip -c | wc -c
67108864 bytes (67 MB, 64 MiB) copied, 10.3381 s, 6.5 MB/s
68892893 # Pretty bad, yeah
---
So yeah, ecoji is better than base64 in everything but speed. Speed will be improved. Later.
|
March 16, 2018 Re: Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anton Fediushin | On Thursday, 15 March 2018 at 18:45:51 UTC, Anton Fediushin wrote:
> On Thursday, 15 March 2018 at 09:32:50 UTC, bauss wrote:
>> Fun, but seems pretty useless in practice.
>
> I disagree. Ecoji (base1024) has bigger character set meaning that it can encode more information per emoji than base64 can encode per character.
>
> For example ecoji encoded "abcde" looks like this: "๐๐ธ๐ฆ๐ญ"
> And base64 encoded one looks like this: "YWJjZGU=".
>
> Even though each emoji is 4 bytes long, there is a noticable difference in size when we are talking about larger chunks of data:
>
> ---
> $ dd if=/dev/urandom bs=4K count=16K of=test.raw
> 16384+0 records in
> 16384+0 records out
> 67108864 bytes (67 MB, 64 MiB) copied, 1.90423 s, 35.2 MB/s
> $ dd if=test.raw | ./ecoji-d | wc -c
> 67108864 bytes (67 MB, 64 MiB) copied, 6.7699 s, 9.9 MB/s
> 71591534 # Size increased just by 6%
> $ dd if=test.raw | base64 | wc -c
> 67108864 bytes (67 MB, 64 MiB) copied, 0.750174 s, 89.5 MB/s
> 90655837 # 35%(!) increase in size
> ---
>
> And if we move to real word scenarios, where web pages are gzip'ped most of the time:
>
> ---
> $ dd if=test.raw | gzip -c | wc -c
> 67108864 bytes (67 MB, 64 MiB) copied, 5.49022 s, 12.2 MB/s
> 67119122 # Raw files are terrible for compression
> $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c
> 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s
> 32178275 # 48% improvement
> $ dd if=test.raw | base64 | gzip -c | wc -c
> 67108864 bytes (67 MB, 64 MiB) copied, 10.3381 s, 6.5 MB/s
> 68892893 # Pretty bad, yeah
> ---
>
> So yeah, ecoji is better than base64 in everything but speed. Speed will be improved. Later.
If your care about size of data then you're not going to encode anyway.
Same goes for speed.
Besides your encoding isn't going to work with actual web-pages anyway, because your encoder doesn't have browser support.
Sure you can encode your data and gzip it, but once it reaches the browser and it unzips it, then what? The browser doesn't know what to do with the data. You can't even use base64 for http headers.
At most it could be used for email clients, since they do support "Content-Transfer-Encoding" but browsers don't. They only support "Content-Encoding" which at most can be compressions such as gzip.
|
March 16, 2018 Re: Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anton Fediushin |
On 15/03/2018 19:45, Anton Fediushin wrote:
> $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c
> 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s
> 32178275 # 48% improvement
If you can compress random data to 52% of the original data, you should repeat this step until there is a single byte left.
|
March 18, 2018 Re: Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anton Fediushin | On 2018-03-14 18:30, Anton Fediushin wrote:
> ๐, I'm glad to announce that ecoji-d - pure D implementation of ecoji encoding version 1๏ธโฃ.0๏ธโฃ.0๏ธโฃ is finally releasedโ
>
> What is ecoji?
>
> Ecoji encodes data as base1024 with an emoji character set. It can be used instead of boring and old base64 ๐คฎ๐คฎ๐คฎ.
>
> Encoding example:
>
> ---
> $ echo "Base64 is so 1999, isn't there something better?" | ecoji-d
> ๐๐ฉ๐ฆ๐๐๐๐ฏ๐๐๐ฝ๐๐๐ฑ๐ฅ๐๐ฑ๐๐ญ๐ฎ๐ต๐ข๐ฅ๐ญ๐ธ๐๐ฒ๐ฆ๐ถ๐ข๐ฅ๐ฎ๐บ๐๐ธ๐ฎ๐ผ๐ฆ๐๐ฅด๐
>
Useful feature: Easy manual verification.
|
March 17, 2018 Re: Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anton Fediushin Attachments:
| On 15 March 2018 at 11:45, Anton Fediushin via Digitalmars-d-announce < digitalmars-d-announce@puremagic.com> wrote: > > Even though each emoji is 4 bytes long, there is a noticable difference in size when we are talking about larger chunks of data: > This doesn't make sense. For every 10 bits, you're emitting 32 bits... you're more than tripling the size of the data. Base64 takes 6 bits and emits 8 bits, which is a third larger. 1.333x is smaller than 3.2x. O_o |
March 18, 2018 Re: Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anton Fediushin | On Thursday, 15 March 2018 at 18:45:51 UTC, Anton Fediushin wrote:
> $ dd if=test.raw | gzip -c | wc -c
> 67108864 bytes (67 MB, 64 MiB) copied, 5.49022 s, 12.2 MB/s
> 67119122 # Raw files are terrible for compression
> $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c
> 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s
> 32178275 # 48% improvement
> $ dd if=test.raw | base64 | gzip -c | wc -c
> 67108864 bytes (67 MB, 64 MiB) copied, 10.3381 s, 6.5 MB/s
> 68892893 # Pretty bad, yeah
Randomness isn't compressible. The fact that ecoji-d compresses anything above 1% shows only that there is a bug in your library:
```
$ dd if=/dev/urandom bs=4K count=16K of=test.raw
16384+0 records in
16384+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 0.373423 s, 180 MB/s
$ dd if=test.raw | ./ecoji-d | gzip -c | gzip -cd | ./ecoji-d -d > test2.raw
131072+0 records in
131072+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 24.9523 s, 2.7 MB/s
$ wc -c test.raw test2.raw
67108864 test.raw
11185155 test2.raw
```
So definitely not the same files before and after compression/decompression. However the beginning is the same:
```
$ xxd test.raw | head
00000010: a05f c801 bf01 13c1 04a2 556a 6d79 a09c ._........Ujmy..
00000020: 8032 523e 851d 419a b0d3 0c4f e7ba 93e1 .2R>..A....O....
00000030: 9fdc 7c55 2645 f6e7 3f9e f5db bc92 1e29 ..|U&E..?......)
00000040: 457a a3b9 c274 3b08 6bde 486a 1798 f281 Ez...t;.k.Hj....
00000050: 9d91 e97a f13f db8b 5d0c 114a 27be 2154 ...z.?..]..J'.!T
00000060: a9a2 3a17 36e4 9181 64f2 35b6 aa91 064d ..:.6...d.5....M
00000070: 863a ddbd 8776 f87d 3eb2 634f 12dc 6e7f .:...v.}>.cO..n.
00000080: 46c9 bc95 2620 b315 e84d 9ee4 8651 d172 F...& ...M...Q.r
00000090: 836d 7bf8 9e1c 09c3 0e10 b787 7e06 bc39 .m{.........~..9
$ xxd test2.raw | head
00000010: a05f c801 bf01 13c1 04a2 556a 6d79 a09c ._........Ujmy..
00000020: 8032 523e 851d 419a b0d3 0c4f e7ba 93e1 .2R>..A....O....
00000030: 9fdc 7c55 2645 f6e7 3f9e f5db bc92 1e29 ..|U&E..?......)
00000040: 457a a3b9 c274 3b08 6bde 486a 1798 f281 Ez...t;.k.Hj....
00000050: 9d91 e97a f13f db8b 5d0c 114a 27be 2154 ...z.?..]..J'.!T
00000060: a9a2 3a17 36e4 9181 64f2 35b6 aa91 064d ..:.6...d.5....M
00000070: 863a ddbd 8776 f87d 3eb2 634f 12dc 6e7f .:...v.}>.cO..n.
00000080: 46c9 bc95 2620 b315 e84d 9ee4 8651 d172 F...& ...M...Q.r
00000090: 836d 7bf8 9e1c 09c3 0e10 b787 7e06 bc39 .m{.........~..9
```
So I think ecoji-d just truncates its input at some point.
|
March 18, 2018 Re: Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
Posted in reply to bauss | On Friday, 16 March 2018 at 08:25:30 UTC, bauss wrote: > Besides your encoding isn't going to work with actual web-pages anyway, because your encoder doesn't have browser support. Well, encoding is not *mine*, only D implementation is. What do you mean by "browser support"? Indeed, ecoji-d cannot be used on the client side, but since algorithm is simple and code is publically available anyone can implement decoding in JavaScript or any other language. > Sure you can encode your data and gzip it, but once it reaches the browser and it unzips it, then what? The browser doesn't know what to do with the data. You can't even use base64 for http headers. Then you use client-side decoder, of course! |
March 18, 2018 Re: Ecoji-d v1.0.0 is released - Base1024 using emojis ๐๐ | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cym13 | On Sunday, 18 March 2018 at 11:25:45 UTC, Cym13 wrote:
> So I think ecoji-d just truncates its input at some point.
Indeed, there's an error somewhere. For some reason it stops after 7457792 bytes. I'll create an issue for that and will look into this later
|
Copyright © 1999-2021 by the D Language Foundation