Jump to page: 1 2
Thread overview
Actual immutability enforcement by placing immutable data into read-only sections
Dec 19, 2022
Siarhei Siamashka
Dec 19, 2022
Siarhei Siamashka
Dec 19, 2022
bauss
Dec 19, 2022
bauss
Dec 19, 2022
IGotD-
Dec 19, 2022
bauss
Dec 19, 2022
IGotD-
Dec 21, 2022
Nick Treleaven
Dec 19, 2022
Siarhei Siamashka
Dec 19, 2022
Tejas
Dec 19, 2022
Siarhei Siamashka
December 19, 2022

Right now D compilers place string literals into a read-only section, but most of the other types of static immutable data have no protection against rogue writes.

https://forum.dlang.org/post/cmtaeuedmdwxjecpcrjh@forum.dlang.org is an example of a non-obvious case of immutable data corruption. What's happening there is that druntime modifies the static immutable instance of Exception when throwing it.

The old bugreport https://issues.dlang.org/show_bug.cgi?id=12118 is also related to throwing an immutable Exception, but the corruption is done by the user code in the catch block.

Troubleshooting such problems would have been so much easier if immutable objects were actually placed in a read-only section and any write attempts triggered segfaults at runtime. I think that bare metal code for microcontrollers could also potentially benefit from this, because this would enable placing immutable data generated by CTFE into NOR flash instead of wasting SRAM space.

What do you think about it? Does this require a new DIP?

December 19, 2022

On Monday, 19 December 2022 at 12:13:08 UTC, Siarhei Siamashka wrote:

>

What do you think about it? Does this require a new DIP?

BTW, I tried to experiment with DMD code and can place some array literals into a read-only section: https://github.com/ssvb/dmd/commit/44c3a7c312b042fa7fafd357775aedf904ba0700

But much more seems to be needed to get it right. Nested array literals, such as the [1,2] part of [[1,2],[3,4]], don't seem to have the immutable flag set when checked from https://github.com/dlang/dmd/blob/v2.101.1/compiler/src/dmd/todt.d#L456-L490

Additionally, the immutable flag seems to be stripped at https://github.com/dlang/dmd/blob/v2.101.1/compiler/src/dmd/tocsym.d#L189-L219 from immutable class and struct instances if they have constructors. But does this really matter for the data generated by CTFE?

Detecting whether the data was generated by CTFE also doesn't seem to be very obvious. I tried to check the .ownedByCtfe field, but I'm getting strange results.

Can anyone give me some hints?

December 19, 2022

On Monday, 19 December 2022 at 12:13:08 UTC, Siarhei Siamashka wrote:

>

Right now D compilers place string literals into a read-only section, but most of the other types of static immutable data have no protection against rogue writes.

https://forum.dlang.org/post/cmtaeuedmdwxjecpcrjh@forum.dlang.org is an example of a non-obvious case of immutable data corruption. What's happening there is that druntime modifies the static immutable instance of Exception when throwing it.

The old bugreport https://issues.dlang.org/show_bug.cgi?id=12118 is also related to throwing an immutable Exception, but the corruption is done by the user code in the catch block.

Troubleshooting such problems would have been so much easier if immutable objects were actually placed in a read-only section and any write attempts triggered segfaults at runtime. I think that bare metal code for microcontrollers could also potentially benefit from this, because this would enable placing immutable data generated by CTFE into NOR flash instead of wasting SRAM space.

What do you think about it? Does this require a new DIP?

Isn't it going to be difficult to properly implement? Since you can't really place data into read-only memory, but you have to protect whole pages ex. VirtualProtect() on Windows. Esepcially with how immutable data can still be allocated through GC. Or am I not understanding something about this at all?

December 19, 2022

On Monday, 19 December 2022 at 14:06:50 UTC, bauss wrote:

>

On Monday, 19 December 2022 at 12:13:08 UTC, Siarhei Siamashka wrote:

>

Right now D compilers place string literals into a read-only section, but most of the other types of static immutable data have no protection against rogue writes.

https://forum.dlang.org/post/cmtaeuedmdwxjecpcrjh@forum.dlang.org is an example of a non-obvious case of immutable data corruption. What's happening there is that druntime modifies the static immutable instance of Exception when throwing it.

The old bugreport https://issues.dlang.org/show_bug.cgi?id=12118 is also related to throwing an immutable Exception, but the corruption is done by the user code in the catch block.

Troubleshooting such problems would have been so much easier if immutable objects were actually placed in a read-only section and any write attempts triggered segfaults at runtime. I think that bare metal code for microcontrollers could also potentially benefit from this, because this would enable placing immutable data generated by CTFE into NOR flash instead of wasting SRAM space.

What do you think about it? Does this require a new DIP?

Isn't it going to be difficult to properly implement? Since you can't really place data into read-only memory, but you have to protect whole pages ex. VirtualProtect() on Windows. Esepcially with how immutable data can still be allocated through GC. Or am I not understanding something about this at all?

Of course literals can be placed in read-only and should be, so in that case I think this would be good, BUT I don't think it's possible to really do for all immutable data.

December 19, 2022

On Monday, 19 December 2022 at 14:07:49 UTC, bauss wrote:

>

Of course literals can be placed in read-only and should be, so in that case I think this would be good, BUT I don't think it's possible to really do for all immutable data.

https://dlang.org/articles/const-faq.html

What is immutable good for?

Immutable data, once initialized, is never changed. This has many uses:

  • Access to immutable data need not be synchronized when multiple threads read it.
  • Data races, tearing, sequential consistency, and cache consistency are all non-issues when working with immutable data.
  • *When doing a deep copy of a data structure, the immutable portions need not be copied.
  • Invariance allows a large chunk of data to be treated as a value type even if it is passed around by reference (strings are the most common case of this).*
  • Immutable types provide more self-documenting information to the programmer.
  • Immutable data can be placed in hardware protected read-only memory, or even in ROMs.
  • If immutable data does change, it is a sure sign of a memory corruption bug, and it is possible to automatically check for such data integrity.
  • Immutable types provide for many program optimization opportunities.

const acts as a bridge between the mutable and immutable worlds, so a single function can be used to accept both types of arguments.

I always interpreted immutable as something that must be constructed during compile time and put in the RO section of the program.

December 19, 2022

On Monday, 19 December 2022 at 15:25:17 UTC, IGotD- wrote:

>

On Monday, 19 December 2022 at 14:07:49 UTC, bauss wrote:

>

Of course literals can be placed in read-only and should be, so in that case I think this would be good, BUT I don't think it's possible to really do for all immutable data.

https://dlang.org/articles/const-faq.html

What is immutable good for?

Immutable data, once initialized, is never changed. This has many uses:

  • Access to immutable data need not be synchronized when multiple threads read it.
  • Data races, tearing, sequential consistency, and cache consistency are all non-issues when working with immutable data.
  • *When doing a deep copy of a data structure, the immutable portions need not be copied.
  • Invariance allows a large chunk of data to be treated as a value type even if it is passed around by reference (strings are the most common case of this).*
  • Immutable types provide more self-documenting information to the programmer.
  • Immutable data can be placed in hardware protected read-only memory, or even in ROMs.
  • If immutable data does change, it is a sure sign of a memory corruption bug, and it is possible to automatically check for such data integrity.
  • Immutable types provide for many program optimization opportunities.

const acts as a bridge between the mutable and immutable worlds, so a single function can be used to accept both types of arguments.

I always interpreted immutable as something that must be constructed during compile time and put in the RO section of the program.

Yes, but it's not the reality. Immutable data can be constructed at runtime and it happens all the time in shared static constructors etc. I think it would be a too big breaking change that you suddenly can't do that anymore.

Ex. the following program is valid:

import std.stdio : writeln;
import std.datetime : Clock;

immutable int a;

shared static this()
{
    a = Clock.currTime().year;
}

void main()
{
    writeln(a);
}

In the above example "a" cannot be placed in read-only memory.

Of course my example isn't something you would do in an every day program, BUT it could be substituted for values loaded from a file etc.

December 19, 2022

On Monday, 19 December 2022 at 15:34:35 UTC, bauss wrote:

>

Yes, but it's not the reality. Immutable data can be constructed at runtime and it happens all the time in shared static constructors etc. I think it would be a too big breaking change that you suddenly can't do that anymore.

Ex. the following program is valid:

import std.stdio : writeln;
import std.datetime : Clock;

immutable int a;

shared static this()
{
    a = Clock.currTime().year;
}

void main()
{
    writeln(a);
}

In the above example "a" cannot be placed in read-only memory.

Of course my example isn't something you would do in an every day program, BUT it could be substituted for values loaded from a file etc.

Couldn't D could just have used the 'const' keyword for such data.

December 19, 2022

On Monday, 19 December 2022 at 14:06:50 UTC, bauss wrote:

>

On Monday, 19 December 2022 at 12:13:08 UTC, Siarhei Siamashka wrote:

>

[...]

Isn't it going to be difficult to properly implement? Since you can't really place data into read-only memory, but you have to protect whole pages ex. VirtualProtect() on Windows. Esepcially with how immutable data can still be allocated through GC. Or am I not understanding something about this at all?

That is why he specified static immutable rather than immutable only

December 19, 2022

On Monday, 19 December 2022 at 14:06:50 UTC, bauss wrote:

> >

What do you think about it? Does this require a new DIP?

Isn't it going to be difficult to properly implement? Since you can't really place data into read-only memory, but you have to protect whole pages ex. VirtualProtect() on Windows. Esepcially with how immutable data can still be allocated through GC. Or am I not understanding something about this at all?

I did mention static immutable and CTFE in my initial message. Some of the immutable data is generated at compile time and can safely go into read-only sections. Right now I'm only interested in trying to improve just this.

But since you mentioned catching write accesses to the immutable data backed by GC allocations, this can be done with some help from extra tools or instrumentation. For example, I did use valgrind to debug the code from https://forum.dlang.org/post/cmtaeuedmdwxjecpcrjh@forum.dlang.org

#include <stddef.h>
#include <valgrind/memcheck.h>

void vg_mark_block(void *p, size_t size)
{
    int valgrind_handle = VALGRIND_CREATE_BLOCK(p, size, "MARKED BLOCK");
    VALGRIND_MAKE_MEM_NOACCESS(p, size);
}
extern(C) void vg_mark_block(void *p, size_t size) @nogc;

void main() @nogc {
    try {
        static immutable e = new Exception("test");
        vg_mark_block(cast(void*)e, __traits(classInstanceSize, typeof(e)));
        throw e;
    } catch (Exception e) {
        assert(e.msg == "test");
    }
}
==3369== Invalid write of size 8
==3369==    at 0x4D5BEAE: _d_createTrace (in /usr/lib64/libphobos2.so.0.99.1)
==3369==    by 0x4D5D4F9: _d_throwdwarf (in /usr/lib64/libphobos2.so.0.99.1)
==3369==    by 0x1091C2: _Dmain (in /tmp/test/test)
==3369==    by 0x4D5CEBE: void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).runAll().__lambda2() (in /usr/lib64/libphobos2.so.0.99.1)
==3369==    by 0x4D5CD6D: void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).tryExec(scope void delegate()) (in /usr/lib64/libphobos2.so.0.99.1)
==3369==    by 0x4D5CE46: void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).runAll() (in /usr/lib64/libphobos2.so.0.99.1)
==3369==    by 0x4D5CD6D: void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).tryExec(scope void delegate()) (in /usr/lib64/libphobos2.so.0.99.1)
==3369==    by 0x4D5CCD6: _d_run_main2 (in /usr/lib64/libphobos2.so.0.99.1)
==3369==    by 0x4D5CA9F: _d_run_main (in /usr/lib64/libphobos2.so.0.99.1)
==3369==    by 0x10923F: main (in /tmp/test/test)
==3369==  Address 0x10c098 is 56 bytes inside a MARKED BLOCK of size 76 client-defined
==3369==    at 0x1095A1: vg_mark_block (in /tmp/test/test)
==3369==    by 0x1091B3: _Dmain (in /tmp/test/test)

Unfortunately valgrind reports both read and write accesses to this area in the log, so the noise about "invalid reads" needs to be filtered out. It doesn't support marking an address range as read-only out of the box: https://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs (but maybe this can be improved?).

ASAN instrumentation could be also potentially useful in the future for catching write accesses to the immutable data backed by GC allocations. And I'm pleasantly surprised to see that ASAN is already available in LDC. However just like valgrind, right now ASAN doesn't support poisoning a memory area as read-only: https://www.mail-archive.com/address-sanitizer@googlegroups.com/msg01948.html

December 19, 2022

On Monday, 19 December 2022 at 15:34:35 UTC, bauss wrote:

>

On Monday, 19 December 2022 at 15:25:17 UTC, IGotD- wrote:

>

[...]
I always interpreted immutable as something that must be constructed during compile time and put in the RO section of the program.

Yes, but it's not the reality. Immutable data can be constructed at runtime and it happens all the time in shared static constructors etc. I think it would be a too big breaking change that you suddenly can't do that anymore.

Ex. the following program is valid:

import std.stdio : writeln;
import std.datetime : Clock;

immutable int a;

shared static this()
{
    a = Clock.currTime().year;
}

void main()
{
    writeln(a);
}

In the above example "a" cannot be placed in read-only memory.

The compiler will reject your constructor if you change "immutable int a;" to "immutable int a = 2030;":

test.d(8): Error: cannot modify `immutable` expression `a`

If a variable is both declared and initialized simultaneously, then it's probably safe to be placed into a read-only section. Please correct me if I'm wrong.

« First   ‹ Prev
1 2