Thread overview
App hangs, GC.collect() fixet it. Why?
June 05
I've been tracking down a hang in our pilot app. Using writeln, it appears to hang at newing a slice. After many hours of trying things, I discovered that program flow would continue past that point when I inserted a call to `GC.collect()` just before. Then it stalled again at a call to Win32 `SetMenu()`. Again, inserting `GC.collect()` before that made the problem go away.

This band-aid isn't going to scale in the long run. I feel I'm treating symptoms, and wonder what the cause is. Any ideas?

I know the GC is not disabled somehow because if I print `GC.profileStats()`, I see that there are collections even without my explicit calls to `GC.collect()`.

Thanks,

Bastiaan.
June 05
On 6/5/20 1:57 PM, Bastiaan Veelo wrote:
> I've been tracking down a hang in our pilot app. Using writeln, it appears to hang at newing a slice. After many hours of trying things, I discovered that program flow would continue past that point when I inserted a call to `GC.collect()` just before. Then it stalled again at a call to Win32 `SetMenu()`. Again, inserting `GC.collect()` before that made the problem go away.
> 
> This band-aid isn't going to scale in the long run. I feel I'm treating symptoms, and wonder what the cause is. Any ideas?
> 
> I know the GC is not disabled somehow because if I print `GC.profileStats()`, I see that there are collections even without my explicit calls to `GC.collect()`.

1. collections happen automatically when you allocate memory and it can't find any free memory to allocate with.
2. Even if it can't find any free memory after a collection, and it runs out of memory, it should throw an Error instead of hanging.

The only thing I can think of is to open in a debugger and see what it is doing.

This kind of sounds like a codegen bug, a race condition, or (worst case) memory corruption.

-Steve
September 28
On Friday, 5 June 2020 at 21:20:09 UTC, Steven Schveighoffer wrote:
> This kind of sounds like a codegen bug, a race condition, or (worst case) memory corruption.

I think it must have been memory corruption: I had not realized that our old Pascal compiler aligns struct members on one byte boundaries, and also uses ubyte as the base type for enumerations (or ushort if required) instead of uint. When using memory mapped files this binary incompatibility likely caused the corruption.

But, after correcting that mistake, suddenly things broke that had been working for a long time. Having no idea what could be wrong this time, I spent quite some time dustmiting (thanks Vladimir!) and manually reducing the code. Voilà:


import std.stdio;
import core.memory;

struct Nothing
{
}

struct Info
{
align(1):
  ubyte u;
  Nothing*[2] arr;
}

Info* info;

void main()
{
  info = new Info;
  writeln("1");
  GC.collect();
  info.arr[0] = new Nothing;
  writeln("2");
  GC.collect();
  info.arr[1] = new Nothing;
  writeln("info.arr[0]  = ", info.arr[0]);
  writeln("info.arr[1]  = ", info.arr[1]);
  assert(info.arr[0] != info.arr[1], "Live object was collected!");
}


(The assert triggers on Windows, not on run.dlang.org.) Unfortunately for me, I cannot blame this on the compiler. It violates the requirements from the spec:

  "Do not misalign pointers if those pointers may point into the GC heap" (https://dlang.org/spec/garbage.html)

I am glad to have found the cause of the breakage finally, but it won't be easy to find a generic solution...

-Bastiaan.
September 28
On 9/28/20 8:57 AM, Bastiaan Veelo wrote:

> I am glad to have found the cause of the breakage finally, but it won't be easy to find a generic solution...

Obviously, this isn't a real piece of code, but there is no way around this. You have to align your pointers. The other option is to not use the GC and use manual memory management.

If this is a compatibility thing between D and Pascal, and you absolutely have to have the same layout, is there a way to adjust the structure in Pascal? Like put the elements that misalign the pointers at the end of the structure?

Another totally drastic approach would be to supply your own even-more-conservative GC which will scan misaligned pointers. Probably going to hurt performance quite a bit. You might be able to get away with marking only certain blocks as having misaligned pointers, but you will have to scan all the stacks with this assumption.

Some more information about the setup you are using might help (I'm assuming D and Pascal are using the same memory in the same process, otherwise this wouldn't be a problem). In particular, where does the data come from, and how malleable is it in your system? Are there times where references to the D data only exist in Pascal?

-Steve
September 28
On Monday, 28 September 2020 at 15:44:44 UTC, Steven Schveighoffer wrote:
> On 9/28/20 8:57 AM, Bastiaan Veelo wrote:
>
>> I am glad to have found the cause of the breakage finally, but it won't be easy to find a generic solution...
>
> Obviously, this isn't a real piece of code, but there is no way around this. You have to align your pointers. The other option is to not use the GC and use manual memory management.
>
> If this is a compatibility thing between D and Pascal, and you absolutely have to have the same layout, is there a way to adjust the structure in Pascal? Like put the elements that misalign the pointers at the end of the structure?
>
> Another totally drastic approach would be to supply your own even-more-conservative GC which will scan misaligned pointers. Probably going to hurt performance quite a bit. You might be able to get away with marking only certain blocks as having misaligned pointers, but you will have to scan all the stacks with this assumption.
>
> Some more information about the setup you are using might help (I'm assuming D and Pascal are using the same memory in the same process, otherwise this wouldn't be a problem). In particular, where does the data come from, and how malleable is it in your system? Are there times where references to the D data only exist in Pascal?
>
> -Steve

Thanks a lot for thinking with me. I’m not linking any Pascal objects, so I don’t need to maintain binary compatibility in memory; Only compatibility of data files. The problem arises when those files are read using memory mapped files, from which structs are memcpy’d over. This is of course the result of machine translation of the current Pascal implementation.

Manual memory management is an option and would be straightforward in principle, as we’ve done that for ages. The only thing is that this memory cannot contain other allocations on the GC heap, such as strings or other slices, unless they are both aligned and their root is registered.

Fixing the alignment in Pascal is possible in principle, but any old files would then need to first be processed by the last Pascal version of the programs, which we then would need to keep around indefinitely. There would also be issues when we port from 32 bit to 64 bit.

Another option could be to use 1-byte aligned structs for I/O, and copy the members over in default aligned versions. But this cannot be part of the automated transcompilation.

Thanks for suggesting a custom gc, which I had not thought of.

I’m leaning towards ditching the memory mapped I/O on the D end, and replace it by regular serialisation/deserialisation. That will be a manual rewrite though, which is a bit of bummer as memory mapped files are widely used in our Pascal code. But this will probably give the best end result.

-Bastiaan.
September 28
On 9/28/20 3:28 PM, Bastiaan Veelo wrote:
> I’m leaning towards ditching the memory mapped I/O on the D end, and replace it by regular serialisation/deserialisation. That will be a manual rewrite though, which is a bit of bummer as memory mapped files are widely used in our Pascal code. But this will probably give the best end result.

2 things:

1. I agree this is the answer. If you ever ditch the old Pascal code, then you can reactivate the memory-mapped code.
2. You can possibly do the translation outside of your programs. That is, it wouldn't be entirely impossible to simply have a process running that ensures the "D view" and the "Pascal view" of the same file is kept in sync. Then you can keep the memory mapped code the same, and just define sane structures in your D code.

If you aren't required to have both Pascal and D programs reading and writing the file at the same time, this shouldn't be a problem.

BTW, one further thing I don't understand -- if this is memory mapped data, how come it has issues with the GC? And what do the "pointers" mean in the memory mapped data? I'm sure there's good answers, and your actual code is more complex than the simple example, but I'm just curious.

-Steve
September 29
On Monday, 28 September 2020 at 21:58:31 UTC, Steven Schveighoffer wrote:
> On 9/28/20 3:28 PM, Bastiaan Veelo wrote:
>> I’m leaning towards ditching the memory mapped I/O on the D end, and replace it by regular serialisation/deserialisation. That will be a manual rewrite though, which is a bit of bummer as memory mapped files are widely used in our Pascal code. But this will probably give the best end result.
>
> 2 things:
>
> 1. I agree this is the answer. If you ever ditch the old Pascal code, then you can reactivate the memory-mapped code.
> 2. You can possibly do the translation outside of your programs. That is, it wouldn't be entirely impossible to simply have a process running that ensures the "D view" and the "Pascal view" of the same file is kept in sync. Then you can keep the memory mapped code the same, and just define sane structures in your D code.
>
> If you aren't required to have both Pascal and D programs reading and writing the file at the same time, this shouldn't be a problem.

There is no need to run both versions concurrently. The issue is that design offices typically maintain a library of past designs for as long as they are in existence, to build new designs off of. So being able to read or import the files that were written with an ancient version of our software is very valuable. Our old compiler offered two alternatives for file i/o: one where all elements are of the same type, the other one (memory mapped files) being the "only" option for files of mixed type. Ideally, the structs that are used for i/o do not have any pointers in them, and certainly in the more recent file versions that would be the case. In older versions that might not be the case; then the pointers obviously would be given meaningful values after the structs would have been read back in. These cases we would be able to work around, though, by converting the old structs to new ones upon import.

> BTW, one further thing I don't understand -- if this is memory mapped data, how come it has issues with the GC? And what do the "pointers" mean in the memory mapped data? I'm sure there's good answers, and your actual code is more complex than the simple example, but I'm just curious.

The main problem is that the transpiler doesn't know which structs are used for i/o and would need 1-byte alignment, and which structs have pointers into GC memory and must not be 1-byte aligned. The alternative to switching to serialisation/deserialisation is to stay with the automated translation of the memory mapped file implementation, not automatically 1-byte align every struct but manually align the ones that are used in i/o. This is however sensitive to mistakes, and the translated mmfile implementation has a bit of a smell to it. It is also not portable, as it uses the WinAPI directly. Still, it may be the quickest route to get us back on track.

I am very glad to have identified the problem, and there being ways to deal with it. I just hope this will be the last big hurdle :-)

-Bastiaan.