December 28, 2021

On Monday, 27 December 2021 at 21:38:03 UTC, Era Scarecrow wrote:

>

Well to add functionality with say ANSI you entered an escape code and then stuff like offset, color, effect, etc. UTF-8 automatically has escape codes being anything 128 or over, so as long as the terminal understand it, it should be what's handling it.

https://www.robvanderwoude.com/ansi.php

In the end it's all just a binary string of 1's and 0's.

Thanks for that post!! I already knew about some of this "escape codes" but I full list of them will come in handy ;)

December 28, 2021

On Monday, 27 December 2021 at 07:12:24 UTC, rempas wrote:

>

I don't understand that. Based on your calculations, the results should have been different. Also how are the numbers fixed? Like you said the amount of bytes of each encoding is not always standard for every character. Even if they were fixed this means 2-bytes for each UTF-16 character and 4-bytes for each UTF-32 character so still the numbers doesn't make sense to me. So still the number of the "length" property should have been the same for every encoding or at least for UTF-16 and UTF-32. So are the sizes of every character fixed or not?

Your string is represented by 8 codepoints. The number of codeunits to represent them in memory depends on the encoding. D supports to work with 3 different encodings (in the Unicode standard there are more than these 3)

string  utf8s  = "Hello 😂\n";
wstring utf16s = "Hello 😂\n"w;
dstring utf32s = "Hello 😂\n"d;

Here the canonical Unicode representation of your string

   H      e      l      l      o             😂     \n
U+0048 U+0065 U+006C U+006C U+006F U+0020 U+1F602 U+000a

let's see how these 3 variable are represented in memory:

utf8s : 48 65 6C 6C 6F 20 F0 9F 98 82 0a

11 char in memory using 11 bytes

utf16s: 0048 0065 006C 006C 006F 0020 D83D DE02 000A

9 wchar in memory using 18 bytes

utf16s: 00000048 00000065 0000006C 0000006C 0000006F 00000020 0001F602 0000000A

8 dchar in memory using 32 bytes

As you can see, the most compact form is generally UTF-8, that's why it is the preferred encoding for Unicode.

UTF-16 is supported because of legacy support reason like it is used in the Windows API and also internally in Java.

UTF-32 has one advantage, in that it has a 1 to 1 mapping between codepoint and array index. In practice it is not that much of an advantage as codepoints and characters are disjoint concepts. UTF-32 uses a lot of memory for practically no benefit (when you read in the forum about the big auto-decode error of D it is linked to this).

December 28, 2021
On Tuesday, 28 December 2021 at 07:03:25 UTC, rempas wrote:
> I already knew about some of this "escape codes" but I full list of them will come in handy ;)

https://invisible-island.net/xterm/ctlseqs/ctlseqs.html

and that's not quite full either..... it really is a mess from hell
December 28, 2021
On Tuesday, 28 December 2021 at 06:46:57 UTC, ag0aep6g wrote:
> It's actually just the first byte that tells you how many are in the sequence. The continuation bytes don't have redundancies for that.

Right, but they do have that high bit set and next bit clear so you can tell you're in the middle and thus either go backward to the count byte to recover this character or go forward to the next count byte and drop this char while recovering the stream. My brain mixed this up with the rest of it and wrote it poorly lol.
December 28, 2021
On Tuesday, 28 December 2021 at 06:51:52 UTC, rempas wrote:
> That's pretty nice. In this case is even better because at least for now, I will not work on Windows by myself because making the library work on Linux is a bit of a challenge itself.

What is your library? You might be able to just use my terminal.d too....

>> The Windows API is an absolute pleasure to work with next to much of the trash you're forced to deal with on Linux.
>
> Whaaaat??? Don't crash my dreams sempai!!! I mean, this may sound stupid but which kind of API you are referring to? Do you mean system library stuff (like "unistd.h" for linux and "windows.h" for Windows) or low level system calls?

Virtually all of it; Windows is just way easier to develop for. You'll see if you get deeper in this terminal stuff.... reading mouse data from Windows is a simple read of input event structs. Doing it from a Linux system is....... not simple.
December 28, 2021
On Tuesday, 28 December 2021 at 12:56:11 UTC, Adam D Ruppe wrote:
>
> https://invisible-island.net/xterm/ctlseqs/ctlseqs.html
>
> and that's not quite full either..... it really is a mess from hell

Still less complicated and organized than my life...
December 28, 2021
On Tuesday, 28 December 2021 at 13:04:26 UTC, Adam D Ruppe wrote:
> What is your library? You might be able to just use my terminal.d too....

My library will be "libd" it will be like "libc" but better and cooler! And it will be native to D! And of course it will not depend on "libc" and it will not require and special runtime support as it will be "betterC". I don't plan to replace the other "default libs" like "libm", librt", "libpthread" etc. tho. At least not for now...

> Virtually all of it; Windows is just way easier to develop for. You'll see if you get deeper in this terminal stuff.... reading mouse data from Windows is a simple read of input event structs. Doing it from a Linux system is....... not simple.

Well sucks to be me....
December 28, 2021
On Tuesday, 28 December 2021 at 14:53:57 UTC, rempas wrote:
> On Tuesday, 28 December 2021 at 12:56:11 UTC, Adam D Ruppe wrote:
>>
>> https://invisible-island.net/xterm/ctlseqs/ctlseqs.html
>>
>> and that's not quite full either..... it really is a mess from hell
>
> Still less complicated and organized than my life...

"Less complicated and more organized" is what I wanted to say. Damn I can't even make a joke right...
1 2 3
Next ›   Last »