June 08, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

--- Comment #10 from Iain Buclaw <ibuclaw@gdcproject.org> ---
What the Mem.xmalloc/xrealloc calls are telling me is that the D front-end with `-lowmem` is reusing some memory that was previously allocated (and subsequently freed) for some other purpose.

---

Despite non-determinism, some things are always constant:

1. The object that causes segfault is a ThisDeclaration

2. The AA struct always has 9 nodes, and a bucket size 32.

3. It's always array index 7 that has a value assigned seemingly from out of nowhere.

---

Is it plausible that there might still be references within the AST to memory xrealloc'd or xfree'd by the front-end?  I could at least believe that can happen.

Why did it take the switch from function _d_newclass to template _d_newclassT to hit this?  Still haven't a clue, but it is very clear that before `_d_newclassT`, it is impossible to hit this segfault.

--
June 08, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

--- Comment #11 from Iain Buclaw <ibuclaw@gdcproject.org> ---
(In reply to Vladimir Panteleev from comment #7)
> I have not been able to reproduce this. It would be good to have a test case which more reliably reproduces the bug. I tried wrapping the code into a static foreach, but that did not help.

(In reply to Vladimir Panteleev from comment #8)
> Also because we need something to put in the test suite to prevent this from regressing again.
Can you drop this into compiler/test/compilable/test23978.d?

---
// REQUIRED_ARGS: -preview=dip1021 -lowmem
// PERMUTE_ARGS: -debug=A -debug=B -debug=C -debug=D -debug=E -debug=F -debug=G
-debug=H
class LUBench { }
void lup(ulong , ulong , int , int = 1)
{
    new LUBench;
}
void lup_3200(ulong iters, ulong flops)
{
    lup(iters, flops, 3200);
}
void raytrace()
{
    struct V
    {
        float x, y, z;
        auto normalize() { }
        struct Tid { }
        auto spawnLinked() { }
        string[] namesByTid;
        class MessageBox { }
        auto cross() { }
    }
}
---

The long list of permutations should make the test compile ~256 times.  Enough to ensure that it never succeeds on any of my dev machines.

--
June 09, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

Dennis <dkorpel@live.nl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dkorpel@live.nl
            Summary|[REG 2.103.0] ICE:          |[REG 2.103.0] ICE:
                   |Segmentation fault in       |EscapeBy[] is malloced, but
                   |dmd.root.aav.dmd_aaGetRvalu |contains GC-allocated
                   |e at src/dmd/root/aav.d:127 |objects

--
June 09, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

Dlang Bot <dlang-bot@dlang.rocks> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |pull

--- Comment #12 from Dlang Bot <dlang-bot@dlang.rocks> ---
@dkorpel created dlang/dmd pull request #15302 "Fix 23978 - ICE: EscapeBy[] is malloced, but contains GC-allocated objects" fixing this issue:

- Fix 23978 - ICE: EscapeBy[] is malloced, but contains GC-allocated objects

https://github.com/dlang/dmd/pull/15302

--
June 09, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

--- Comment #13 from Iain Buclaw <ibuclaw@gdcproject.org> ---
>From valgrind.
---
==1582870== Invalid read of size 8
==1582870==    at 0x6A33B5:
_D3dmd4root3aav15dmd_aaGetRvalueFNaNbNiPSQBnQBmQBk2AAPvZQd (aav.d:127)
==1582870==    by 0x6A36D7:
_D3dmd4root3aav__T10AssocArrayTCQBe10identifier10IdentifierTCQCh7dsymbol7DsymbolZQCl7opIndexMFNaNbNixCQDwQCsQCjZQCa
(aav.d:313)
==1582870==    by 0x512F97: DsymbolTable::lookup(Identifier const*)
(dsymbol.d:2408)
==1582870==    by 0x510B6D: ScopeDsymbol::search(Loc const&, Identifier*, int)
(dsymbol.d:1470)
==1582870==    by 0x50D2D7: StructDeclaration::search(Loc const&, Identifier*,
int) (dstruct.d:279)
==1582870==    by 0x5E8A0A:
_D3dmd6opover15search_functionFCQBe7dsymbol12ScopeDsymbolCQCe10identifier10IdentifierZCQDhQCd7Dsymbol
(opover.d:1424)
==1582870==    by 0x49DC50:
_D3dmd5clone19hasIdentityOpEqualsFCQBh9aggregate20AggregateDeclarationPSQCs6dscope5ScopeZCQDk4func15FuncDeclaration
(clone.d:462)
==1582870==    by 0x49DF98:
_D3dmd5clone13buildOpEqualsFCQBb7dstruct17StructDeclarationPSQCh6dscope5ScopeZCQCz4func15FuncDeclaration
(clone.d:519)
==1582870==    by 0x523A57: DsymbolSemanticVisitor::visit(StructDeclaration*)
(dsymbolsem.d:4790)
==1582870==    by 0x50D9E1: StructDeclaration::accept(Visitor*) (dstruct.d:502)
==1582870==    by 0x514E65: dsymbolSemantic(Dsymbol*, Scope*)
(dsymbolsem.d:131)
==1582870==    by 0x576A2B: ExpressionSemanticVisitor::visit(DeclarationExp*)
(expressionsem.d:5607)
==1582870==  Address 0x20ec8348ec8b485d is not stack'd, malloc'd or (recently)
free'd
---

Prodding this in vgdb
---
(gdb) p aa.b
$10 = (dmd.root.aav.aaA **) 0x5ebb990
(gdb) monitor who_points_at 0x5ebb990
==1582870== Searching for pointers to 0x5ebb990
==1582870== *0x5ef8600 points at 0x5ebb990
 Address 0x5ef8600 is in a rw- anonymous segment
(gdb) p aa
$11 = (dmd.root.aav.AA *) 0x5ef8600
---

There is nobody referencing the base address that was GC.realloc'd but the AA. So blaming xrealloc is the wrong thing here.

Next info to retrieve, look at each address between &aa.b[0] .. &aa.b[b_length].

--
June 09, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

--- Comment #14 from Iain Buclaw <ibuclaw@gdcproject.org> ---
Hit!

---
(gdb) monitor who_points_at 0x5ebb998              // &aa.b[1]
==1582870== Searching for pointers to 0x5ebb998
(gdb) monitor who_points_at 0x5ebb9a0              // &aa.b[2]
==1582870== Searching for pointers to 0x5ebb9a0
(gdb) monitor who_points_at 0x5ebb9a8              // &aa.b[3]
==1582870== Searching for pointers to 0x5ebb9a8
(gdb) monitor who_points_at 0x5ebb9b0              // &aa.b[4]
==1582870== Searching for pointers to 0x5ebb9b0
(gdb) monitor who_points_at 0x5ebb9b8              // &aa.b[5]
==1582870== Searching for pointers to 0x5ebb9b8
(gdb) monitor who_points_at 0x5ebb9c0              // &aa.b[6]
==1582870== Searching for pointers to 0x5ebb9c0
(gdb) monitor who_points_at 0x5ebb9c8              // &aa.b[7]
==1582870== Searching for pointers to 0x5ebb9c8
==1582870== *0x5eef430 points at 0x5ebb9c8
 Address 0x5eef430 is in a rw- anonymous segment
==1582870== *0xd964580 points at 0x5ebb9c8
 Address 0xd964580 is in a rw- anonymous segment
---


No hint as to where those references are at run-time however, it is clear that the ThisDeclaration/VarDeclaration object is a live object in the AST.

---
(gdb) p aa.b[5]
$57 = (dmd.root.aav.aaA *) 0x5efa4a0
(gdb) p aa.b[6]
$58 = (dmd.root.aav.aaA *) 0x0
(gdb) p aa.b[7]
$59 = (dmd.root.aav.aaA *) 0xde2c100
(gdb) p aa.b[8]
$60 = (dmd.root.aav.aaA *) 0x5efa480
(gdb) monitor who_points_at 0x5efa4a0              // aa.b[5]
==1582870== Searching for pointers to 0x5efa4a0
==1582870== *0x5ebb9b8 points at 0x5efa4a0         // &aa.b[5] points at
(gdb) monitor who_points_at 0x5efa480              // aa.b[8]
==1582870== Searching for pointers to 0x5efa480
==1582870== *0x5ebb9d0 points at 0x5efa480         // &aa.b[8] points at
 Address 0x5ebb9d0 is in a rw- anonymous segment
(gdb) monitor who_points_at 0xde2c100              // aa.b[7]
==1582870== Searching for pointers to 0xde2c100
==1582870== *0x5ebb9c8 points at 0xde2c100         // &aa.b[7] points at
 Address 0x5ebb9c8 is in a rw- anonymous segment
==1582870== *0xdddef80 points at 0xde2c100         // and many more...
 Address 0xdddef80 is in a rw- anonymous segment
==1582870== *0xdddef90 points at 0xde2c100
 Address 0xdddef90 is in a rw- anonymous segment
==1582870== *0xde1bec0 points at 0xde2c100
 Address 0xde1bec0 is in a rw- anonymous segment
==1582870== *0xde25490 points at 0xde2c100
 Address 0xde25490 is in a rw- anonymous segment
==1582870== *0xde254b0 points at 0xde2c100
 Address 0xde254b0 is in a rw- anonymous segment
==1582870== *0xde254c8 points at 0xde2c100
 Address 0xde254c8 is in a rw- anonymous segment
==1582870== *0xde2c3c8 points at 0xde2c100
 Address 0xde2c3c8 is in a rw- anonymous segment
==1582870== *0xde2d0e8 points at 0xde2c100
 Address 0xde2d0e8 is in a rw- anonymous segment
==1582870== *0xde30550 points at 0xde2c100
 Address 0xde30550 is in a rw- anonymous segment
==1582870== *0xde30880 points at 0xde2c100
 Address 0xde30880 is in a rw- anonymous segment
==1582870== *0xde35428 points at 0xde2c100
 Address 0xde35428 is in a rw- anonymous segment
==1582870== *0xde35568 points at 0xde2c100
 Address 0xde35568 is in a rw- anonymous segment
==1582870== *0xde35668 points at 0xde2c100
 Address 0xde35668 is in a rw- anonymous segment
==1582870== *0xde3d980 points at 0xde2c100
 Address 0xde3d980 is in a rw- anonymous segment
==1582870== *0xde694a8 points at 0xde2c100
 Address 0xde694a8 is in a rw- anonymous segment
==1582870== *0xde73b28 points at 0xde2c100
 Address 0xde73b28 is in a rw- anonymous segment
==1582870== *0xde795a8 points at 0xde2c100
 Address 0xde795a8 is in a rw- anonymous segment
==1582870== *0xde7c168 points at 0xde2c100
 Address 0xde7c168 is in a rw- anonymous segment
==1582870== tid 1 register RAX pointing at 0xde2c100

--
June 09, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

--- Comment #15 from Iain Buclaw <ibuclaw@gdcproject.org> ---
The new title seems wrong to me.  EscapeBy[] is GC allocated.

--
June 09, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

Dennis <dkorpel@live.nl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[REG 2.103.0] ICE:          |[REG 2.103.0] ICE: dip1021
                   |EscapeBy[] is malloced, but |memory corruption
                   |contains GC-allocated       |
                   |objects                     |

--
June 09, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

--- Comment #16 from Dennis <dkorpel@live.nl> ---
I changed it back to something more generic. My initial diagnosis was wrong indeed, but it still seems to be related to the `Mem.xrealloc` call in dmd.escape.checkMutableArguments, since changing that makes the test case pass.

--
June 10, 2023
https://issues.dlang.org/show_bug.cgi?id=23978

--- Comment #17 from Iain Buclaw <ibuclaw@gdcproject.org> ---
(In reply to Dennis from comment #16)
> I changed it back to something more generic. My initial diagnosis was wrong indeed, but it still seems to be related to the `Mem.xrealloc` call in dmd.escape.checkMutableArguments, since changing that makes the test case pass.
Indeed, I still stand by my initial assessment that there are live references to memory being marked as free by the GC.

To clarify, these are being explicitly marked free by the program, rather than the GC scan failing to find live references.

Valgrind/vgdb confirms this at the moment in the program immediately before the segfault occurs.

```
(gdb) monitor who_points_at 0x5ebb9c8
==1582870== Searching for pointers to 0x5ebb9c8
==1582870== *0x5eef430 points at 0x5ebb9c8
 Address 0x5eef430 is in a rw- anonymous segment
==1582870== *0xd964580 points at 0x5ebb9c8
 Address 0xd964580 is in a rw- anonymous segment
```

Expected output is for valgrind to find no references, because the memory block is actively being used as part of a dynamic array (starting at 0x5ebb990).

We know that the memory block was first allocated for another purpose, then subsequently marked as free'd in the GC from the initial printf debug traces of malloc/realloc addresses.

```
Mem.xrealloc((nil), 624) = 0x5ebb990
...
Mem.xrealloc(0x5ebb990, 832) = 0x5e63000
...
Mem.xmalloc(768) = 0x5ebb990
```

Catching the moment xrealloc is called the second time and dumping all memory references would confirm or disprove my suspicions.

Both GC.realloc and GC.free mark the base pointer as "free" in the GC regardless of whether there are any other live references to the memory block.

This makes the use of both Mem.xrealloc and Mem.xfree unsafe if it being used for non-trivial data structures.

--