Jump to page: 1 25  
Page
Thread overview
July 05

Today many people have spent some time to try and understand Walter's belief that C is "good enough" for bit fields in terms of guarantees.

I believe I have understood a core component to this.

From the C23 standard:

>

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If
enough space remains, a bit-field that immediately follows another bit-field in a structure shall be
packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that
does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The
order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is
implementation-defined. The alignment of the addressable storage unit is unspecified.

What matters is the initial type in the bit-field, the rest of the types do not matter.

As long as you do not start and finish in two separate memory addresses for that initial type it will be predictable.

I have filed a ticket for dscanner to introduce a warning to tell you that the compiler is going to do a bad thing, that will cause you problems and the compiler will not assist you.

Ideally, we wouldn't allow it for extern(D) code at all.

As of right now, assuming we get Dscanner to give the warning I can withdraw my concerns, although I do think that extern(D) shouldn't be offering you such a heavy foot-gun.

July 05
On 7/5/24 07:37, Richard (Rikki) Andrew Cattermole wrote:
> Today many people have spent some time to try and understand Walter's belief that C is "good enough" for bit fields in terms of guarantees.
> 
> I believe I have understood a core component to this.
> 
>  From the C23 standard:
> 
>> An implementation may allocate any addressable storage unit large enough to hold a bit-field. If
> enough space remains, a bit-field that immediately follows another bit-field in a structure shall be
> packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that
> does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The
> order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is
> implementation-defined. The alignment of the addressable storage unit is unspecified.
> 
> What matters is the _initial_ type in the bit-field, the rest of the types _do not_ matter.
> ...

According to this text,  none of the types matter for layout guarantees. Only the bit sizes matter somewhat. And then the implementation still has way too much leeway in how it allocates things.

Walter's reasoning has been that _in practice_, C implementations are a bit more sane than what the standard allows. I don't think it is fruitful to try and find any useful guarantees in the standard. If there were any, that's what Walter would point to instead.

> As long as you _do not_ start and finish in two separate memory addresses for that initial type it will be predictable.
> ...

According to the standard, no.

E.g.:

int a:7;
int b:25;

According to the standard, this could put `a` in a 1-byte unit and `b` in a subsequent 32-byte unit. It could put `a` in a 1-byte unit, use the last bit for `b`, then put the remaining 24 bits of `b` in a new unit.

It could also put both in separate 4-byte integers. Or it could pack them into a single 4-byte location. It is not specified. In practice, implementations will usually put both of them in a single 4-byte location, and this is what Walter is relying on. The C standard gives you almost nothing (it could even choose to put both `a` and `b` into a 8-byte or larger unit, there is no upper limit on the size, only a lower one.)

And I did not even get into different possible orderings of bit fields within a unit.

July 05
On 7/5/24 11:13, Timon Gehr wrote:
> ...
> It could also put both in separate 4-byte integers.

Actually no, this is one of the few things it cannot do. I got a bit too excited there. Anyway, the point stands.

July 06
On 05/07/2024 9:42 PM, Timon Gehr wrote:
> On 7/5/24 11:13, Timon Gehr wrote:
>> ...
>> It could also put both in separate 4-byte integers.
> 
> Actually no, this is one of the few things it cannot do. I got a bit too excited there. Anyway, the point stands.

Oh oh no, you are so right, I was applying the type there that I shouldn't have been.

Don't read the C standard after you've been awake more than 12 hour folks!

However in saying that, the point that we can mitigate it using a dscanner warning does still stand. Therefore my original post stating I withdraw my concerns is valid.

The only problem is it'll be word size specific and alignment specific check now.

I hate every bit that we need to make such a specific mitigation for what amounts to a brand new feature. It is quite frankly ridicules to need a _mitigation_ for this.
July 05
On 7/4/2024 10:37 PM, Richard (Rikki) Andrew Cattermole wrote:
> Today many people have spent some time to try and understand Walter's belief that C is "good enough" for bit fields in terms of guarantees.

It's straightforward. If you use uint as the field type, you'll get the same layout across every C compiler I've ever heard of.

The reason for this is straightforward:

1. it's the obvious way to do things
2. professional C compiler developers are sensible people
3. professional C compiler developers want to compile existing code and have it behave the same way on the same platform, they don't care to antagonize their users

The differences crop up when using multiple field types *and* porting to a different ecosystem. These problems are trivially avoided. Even so, within a particular ecosystem, the C compilers are all compatible with each other. Why? Because C compiler developers want their compiler to be useful!

Is anyone surprised that gcc/clang/ImportC work exactly the same on each ecosystem?

Consider also that the C standard does not specify the size of a 'char'. There are C compilers for special CPUs that have different char sizes - notably 32 bit chars for some DSP processors, and 10 bit chars for the CPU on a Mattel Intellivision game computer. C on a PDP-10 has 36 bit ints, too! and 18 bit shorts.

I can pretty much guarantee that all C code developed on a conventional CPU will fail to work on those machines.

But so what. When you port to a diverse machine, you expect such problems.
July 05
On Friday, 5 July 2024 at 14:18:43 UTC, Richard (Rikki) Andrew Cattermole wrote:
> Don't read the C standard after you've been awake more than 12 hour folks!
>
> However in saying that, the point that we can mitigate it using a dscanner warning does still stand. Therefore my original post stating I withdraw my concerns is valid.
>

Given that today is July 5, 2024, the publication of the C23 standard is imminent, with the limit date for publication being July 12, 2024. This means that within a week, the C23 standard should be officially published, marking a significant milestone for the C programming language and for D.

Is it a good time to start planning for any necessary updates to our existing codebases or libraries to ensure compatibility with C23? Can we say that DMD will also support this in parallel with the developments?

SDB@79
July 06
On 06/07/2024 4:48 AM, Salih Dincer wrote:
> Is it a good time to start planning for any necessary updates to our existing codebases or libraries to ensure compatibility with C23? Can we say that DMD will also support this in parallel with the developments?

As of right now, the only thing planned is the changing of our identifiers to match the C23 identifier tables that is UAX31 based.

I've implemented and has been in a release, although we are not transitioned over, the breakage is expected as of 2.119 (the tables are both bigger and smaller than C99 *sigh*, right now we are in a recombination of all the different tables).

Walter really does not want the normalization stuff that UAX31 and with that C23 requires and some of it was implemented, but alas.

But the other things like different float types are not currently planned to be supported as far as I know. We should probably discuss that at some point as a community.

Other things like nodiscard on a function have no D equivalent just yet although we have allowed for it to occur in the future as part of ``@mustuse``.

Apart from identifiers there isn't much you should need to deal with for your code base :)
July 05
On 7/5/24 18:35, Walter Bright wrote:
> 
> Consider also that the C standard does not specify the size of a 'char'.

D does specify it.

> There are C compilers for special CPUs that have different char sizes - notably 32 bit chars for some DSP processors, and 10 bit chars for the CPU on a Mattel Intellivision game computer. C on a PDP-10 has 36 bit ints, too! and 18 bit shorts.
> 
> I can pretty much guarantee that all C code developed on a conventional CPU will fail to work on those machines.
> 
> But so what. When you port to a diverse machine, you expect such problems.

Well, this is the D newsgroup.
July 05

On Friday, 5 July 2024 at 16:35:43 UTC, Walter Bright wrote:

>

On 7/4/2024 10:37 PM, Richard (Rikki) Andrew Cattermole wrote:

>

Today many people have spent some time to try and understand Walter's belief that C is "good enough" for bit fields in terms of guarantees.

It's straightforward. If you use uint as the field type, you'll get the same layout across every C compiler I've ever heard of.

What if you need > 32 bits or want to pack into a ulong? Is the behavior sane across compilers?

-Steve

July 05

On Friday, 5 July 2024 at 19:35:10 UTC, Steven Schveighoffer wrote:

>

What if you need > 32 bits or want to pack into a ulong? Is the behavior sane across compilers?

The following struct has a different layout for different platforms:

struct S { unsigned int x; unsigned long long a:20, b:20, c:24; };

Windows layout:

         0 | struct S
         0 |   unsigned int x
    8:0-19 |   unsigned long long a
   10:4-23 |   unsigned long long b
   13:0-23 |   unsigned long long c
           | [sizeof=16, align=8]

Linux x86_64 layout:

         0 | struct S
         0 |   unsigned int x
    4:0-19 |   unsigned long long a
    8:0-19 |   unsigned long long b
   10:4-27 |   unsigned long long c
           | [sizeof=16, align=8]

Linux i686 layout:

         0 | struct S
         0 |   unsigned int x
    4:0-19 |   unsigned long long a
    6:4-23 |   unsigned long long b
    9:0-23 |   unsigned long long c
           | [sizeof=12, align=4]
« First   ‹ Prev
1 2 3 4 5