Why is char initialized to 0xFF ? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Why is char initialized to 0xFF ?

Thread overview

Why is char initialized to 0xFF ?
Jun 08, 2019 James Blachly
Jun 08, 2019 Adam D. Ruppe
Jun 09, 2019 Andrej Mitrovic
Jun 09, 2019 KnightMare
Jun 09, 2019 KnightMare
Jun 09, 2019 Mike Parker
Jun 09, 2019 KnightMare
Jun 09, 2019 Patrick Schluter
Jun 09, 2019 KnightMare
Jun 09, 2019 Jonathan M Davis
Jun 10, 2019 KnightMare
Jun 09, 2019 Ola Fosheim Grøstad
Jun 09, 2019 James Blachly
Jun 09, 2019 Patrick Schluter
Jun 09, 2019 lithium iodate

June 08, 2019

Why is char initialized to 0xFF ?

Posted by James Blachly

James Blachly

Disclaimer: I am not a unicode expert.

Background: I have added UTF8 character type support to lldb in conjunction with adding support for D string/wstring/dstring.

Dlang char is analogous to C++20 char8_t[1] AFAICT.

The default initialization value in C++20 is u8'\0', whereas in D char.init is '\xFF'[2]. Likewise, wchar .init is 0xFFFF and dchar is 0x0000FFFF.

char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification.

What is the reasoning behind this? Is it related to zero-termination of C strings? Should it be considered for change?

It is surprising that these do not init to the null value, which is valid UTF.

Kind regards
James


[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r6.html
[2] https://dlang.org/spec/type.html
[3] https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

June 08, 2019

Re: Why is char initialized to 0xFF ?

Posted by Adam D. Ruppe
in reply to James Blachly

Adam D. Ruppe

Posted in reply to James Blachly

On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:
> char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification.

And that is exactly why it is the default: the idea here is to make uninitialized variables obvious, because they will be a predictable, but invalid value when they appear.

Same reason why floats are nan and classes are null btw. `int` is the exception as being default initialized as something that happens to be really useful.

(and arrays kinda are special too. technically they are null, but the runtime will automatically allocate null arrays when needed, so it works transparently anyway... and ends up being super useful)

June 09, 2019

Re: Why is char initialized to 0xFF ?

Posted by Andrej Mitrovic
in reply to Adam D. Ruppe

Andrej Mitrovic

Posted in reply to Adam D. Ruppe

On Saturday, 8 June 2019 at 18:04:46 UTC, Adam D. Ruppe wrote:
> On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:
>> char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification.
>
> And that is exactly why it is the default: the idea here is to make uninitialized variables obvious, because they will be a predictable, but invalid value when they appear.

To me they are not really obvious or useful, especially when I interface with C/C++. I pass some default-initialized char or float to a C/C++ library (by mistake), and I get some weird output written in some distant data field. The end result is either broken data somewhere down the line, or garbled output in the UI.

I much prefer default values which are correct for 99% of the intended use-cases. I make full use of the fact integers default-initialize to zero, I think it's a great "feature". If there was a NaN for integers, I'd probably hate it..

I would prefer it if the compiler (or a tool!) had a switch --check-use-before-initialize or something of the sort, with code-flow analysis and all that good stuff.

June 09, 2019

Re: Why is char initialized to 0xFF ?

Posted by KnightMare
in reply to Adam D. Ruppe

KnightMare

Posted in reply to Adam D. Ruppe

On Saturday, 8 June 2019 at 18:04:46 UTC, Adam D. Ruppe wrote:
> On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:
>> char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification.
>
> And that is exactly why it is the default: the idea here is to make uninitialized variables obvious, because they will be a predictable, but invalid value when they appear.

double d;
most compilers fire error "using unitialized variable".
another side "I(D compiler) will tell u nothing for that, but u'll get a shit! haha"

ok. lets see structs now
struct S { double d; }
S s;
in most compilers s will contains zeros. in C/C++ - garbage.
men comes to D not as first language, they has troubles with garbage in structs already, and they still forget initialize it right (I do), so rule "all initialization is zeros" is the best and right thing that can be.
if u dont initialize use "= void" - is good too.
but initialize ints as 0, ptrs as null, chars as #FF, doubles as NaN - is was invented under mushrooms

men comes to D and see char=#ff,double=NaN
https://www.youtube.com/watch?v=Qsa41csyNU8

June 09, 2019

Re: Why is char initialized to 0xFF ?

Posted by KnightMare
in reply to KnightMare

KnightMare

Posted in reply to KnightMare

On Sunday, 9 June 2019 at 07:48:46 UTC, KnightMare wrote:
> double d;
> most compilers fire error "using unitialized variable".
not exactly in this line, but when we try to read from it first like "d += ..."

June 09, 2019

Re: Why is char initialized to 0xFF ?

Posted by Mike Parker
in reply to KnightMare

Mike Parker

Posted in reply to KnightMare

On Sunday, 9 June 2019 at 07:48:46 UTC, KnightMare wrote:

>
> ok. lets see structs now
> struct S { double d; }
> S s;

You can set the default initializer in this case:

struct S { double d = 0.0; }

> but initialize ints as 0, ptrs as null, chars as #FF, doubles as NaN - is was invented under mushrooms
>

Not at all. It's quite practical for debugging. Uninitialized variables are a pain in C and C++. Default initializing to invalid values makes them stand out in the debugger. The drawback is that the integrals (and bool) have no invalid value, so we're stuck with 0 (and false).

June 09, 2019

Re: Why is char initialized to 0xFF ?

Posted by Patrick Schluter
in reply to KnightMare

Patrick Schluter

Posted in reply to KnightMare

On Sunday, 9 June 2019 at 07:48:46 UTC, KnightMare wrote:
> On Saturday, 8 June 2019 at 18:04:46 UTC, Adam D. Ruppe wrote:
>> On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:
>>> char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification.
>>
>> And that is exactly why it is the default: the idea here is to make uninitialized variables obvious, because they will be a predictable, but invalid value when they appear.
>
> double d;
> most compilers fire error "using unitialized variable".

Which is technically not possible in D because D always initializes variables. In C and C++ if you'd declare
double d=0.0; you wouldn't get the "using unitialized variable" warning either. Independantly if 0 is the right or the wrong init value.

> another side "I(D compiler) will tell u nothing for that, but u'll get a shit! haha"
>
> ok. lets see structs now
> struct S { double d; }
> S s;
> in most compilers s will contains zeros. in C/C++ - garbage.
> men comes to D not as first language, they has troubles with garbage in structs already, and they still forget initialize it right (I do), so rule "all initialization is zeros" is the best and right thing that can be.

No, by putting NaN in d you hav e a deterministic error. In C and C++ you will have undefined behaviour that will vary with compiler, version, options, OS version, architecture, position of the moon, etc. and sometimes undetectable bugs.

> if u dont initialize use "= void" - is good too.
> but initialize ints as 0, ptrs as null, chars as #FF, doubles as NaN - is was invented under mushrooms

No. If there were an equivalent of NaN for ints it would also be used ( Personnaly I really would prefer int.init == int.int_min and uint.init == uint.uint_max).

Default initialisation of variable is here to have deterministic behaviour between versions and runs, i.e. get rid of nasal demons, not to mind read the appropriate initial value of a variable, that is something the programmer still has the responsibility for.

>
> men comes to D and see char=#ff,double=NaN
> https://www.youtube.com/watch?v=Qsa41csyNU8

June 09, 2019

Re: Why is char initialized to 0xFF ?

Posted by KnightMare
in reply to Mike Parker

KnightMare

Posted in reply to Mike Parker

On Sunday, 9 June 2019 at 08:26:45 UTC, Mike Parker wrote:

> Not at all. It's quite practical for debugging. Uninitialized variables are a pain in C and C++. Default initializing to invalid values makes them stand out in the debugger. The drawback is that the integrals (and bool) have no invalid value, so we're stuck with 0 (and false).

I agree that memory must be initialized unless otherwise stated.
I disagree that garbage(uninit value) should be FF and NaN.
again "all zeroes" is best and right thing.
people are the main resource, they have expectations, the expect zeroes, u can poll they "what values shuold be used for unitialized vars?" and if u think about it u will answer.. what?.. any men on the street. no, in IT-park.

imo coz nobody used FF and Nan in D-code now (so, the default is FF, so I just do "ch += 1" and I've got 00! I am cool hacker!), we can change it to most expecting values (I think it zero). In any case we can do poll between D-users for beggining.
or lets setup tagline for D "We have our own way, dont boomboom our brain!". joke. maybe a little bit trolled.

June 09, 2019

Re: Why is char initialized to 0xFF ?

Posted by KnightMare
in reply to Patrick Schluter

KnightMare

Posted in reply to Patrick Schluter

On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:

I read the bible too. I know reasons why leaders decided use NaN and FF.
but
what is the best solution:
do some math and get garbage in C++ or NaN in D?
or compiler will tell "using unitialized variable" before any math?

June 09, 2019

Re: Why is char initialized to 0xFF ?

Posted by Ola Fosheim Grøstad
in reply to Patrick Schluter

Ola Fosheim Grøstad

Posted in reply to Patrick Schluter

On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:
> No, by putting NaN in d you hav e a deterministic error. In C and C++ you will have undefined behaviour that will vary with compiler, version, options, OS version, architecture, position of the moon, etc. and sometimes undetectable bugs.

I don't think it is undefined though… If something has an arbitrary value, you could still compute with it, if the algorithm takes that into account. Assuming that all bit-patterns provides a defined value (which is the case for IEEE floating point bit-patterns).

Anyway, the obvious advantage with having structs default initialized to all-bits-zero is that you can have an allocator that clears bits in the background (bypassing caches so they are not polluted).

Then you have no penalty when allocating an array of one million struct values. Which is very useful. Just allocate memory-chunks that are already set to zero-bits.

You usually want an array of floating point values to be pre-initialized to zeros. You almost never want an array of floating point values being initialized to NaN.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation