Jump to page: 1 2
Thread overview
Known reasons why D crashes without any message?
Sep 13, 2017
Thorsten Sommer
Sep 13, 2017
Vladimir Panteleev
Sep 13, 2017
rikki cattermole
Sep 13, 2017
Moritz Maxeiner
Sep 14, 2017
qznc
Sep 14, 2017
Ali Çehreli
Sep 15, 2017
Thorsten Sommer
Sep 15, 2017
Daniel Kozak
Sep 16, 2017
Johan Engelen
Sep 15, 2017
Swoorup Joshi
Sep 15, 2017
Suliman
Sep 16, 2017
Swoorup Joshi
Sep 17, 2017
Daniel Kozak
Sep 15, 2017
apz28
Sep 16, 2017
Adam D. Ruppe
Sep 15, 2017
Neia Neutuladh
Sep 16, 2017
Thorsten Sommer
September 13, 2017
Dear Community,

My student assistant and I working on an artificial intelligence library in D for a while. This library is part of my PhD thesis in order to perform several experiments to push the state of the art.

(Yes, after the thesis is published, the entire library gets open source on GitHub including novel algorithms)

Right now, we are done with the development and ready to start experiments. Until now, almost anything runs fine with our unit tests.

Besides the unit tests, the main program is now able to startup but crashes after a while without any message at all. No stack trace, no exception, nothing. Obviously, this makes it hard to debug anything...

To get a roughly impression, what the code uses (maybe this information will help to limit the possibilities):

- External dependencies: fluent-asserts, requests and our own library quantum-random for physical randomness
- Many meta-programming e.g. with templates across 9,000 lines of code
- The code was designed to be OOP... composition, inheritance, delegation, polymorphism...
- We call many instances of an external Go program with a Maze simulation (the task for the AI) by using pipeProcess()
- We use parallel foreach loops for scaling (we have issues with that also -- may I open another thread for it)
- We send thousands of HTTP requests using the requests library
- The entire simulation runs in Docker containers on huge servers (144 CPU Cores, ~470 GB RAM). Base image uses DMD 2.076.0 + Ubuntu Server 16.04

Are there any well-known circumstances, bugs, etc. where an abrupt interruption of a D program without any message is possible? My expectation was, that I would receive at least a stack trace. For debugging, I disabled parallelism at all in order to eliminate effects like exceptions are hidden in threads, missing/wrong variable sharing, etc.

I would be pleased about any idea, as I am currently stuck and no longer know how and where to continue debugging.


Best regards
Thorsten

September 13, 2017
On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:
> Are there any well-known circumstances, bugs, etc. where an abrupt interruption of a D program without any message is possible?

A stack overflow is one.

Why not run the program under a debugger?
September 13, 2017
1) You really need to switch to ldc, even for small neural networks, it makes a MASSIVE difference!
2) In release mode, who knows what'll happen. Add some logging in maybe (versioned/debug of course) to help figure out where things are going on.
3) Wrap it up with try catch and write out the message yourself. You want Error not Exception FYI.

Not terribly helpful, but a good place to begin anyway.
Of course if somebody is calling the c exit function, it may very well bypass D's exception handling all together.
September 13, 2017
On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:
> [...]
>
> Besides the unit tests, the main program is now able to startup but crashes after a while without any message at all. No stack trace, no exception, nothing. Obviously, this makes it hard to debug anything...
>
> [...]
>
> Are there any well-known circumstances, bugs, etc. where an abrupt interruption of a D program without any message is possible? My expectation was, that I would receive at least a stack trace. For debugging, I disabled parallelism at all in order to eliminate effects like exceptions are hidden in threads, missing/wrong variable sharing, etc.
>
> [...]

Things D generally depends on the platform to deal with (such as null pointer dereferences) won't yield you a message from the D side.
What is the exit code of the program? If it's of the form `128+n` with `n == SIGXYZ` you know more of why it crashed [1]. If the exit code is 139 e.g., you know some code tried to access memory via an invalid reference (as SIGSEGV == 11 on Linux x64), which often means you dereferenced a null pointer somewhere.

[1] http://www.tldp.org/LDP/abs/html/exitcodes.html
September 14, 2017
On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:
> Right now, we are done with the development and ready to start experiments. Until now, almost anything runs fine with our unit tests.
>
> Besides the unit tests, the main program is now able to startup but crashes after a while without any message at all. No stack trace, no exception, nothing. Obviously, this makes it hard to debug anything...

I assume you see a return code which is nonzero, because you say it "crashes". Which one?

Most likely would be a segmentation fault (invalid memory access, stack overflow, null pointer dereferenced, etc). Use a debugger. Compile with debug info and execute wrapped in gdb. It should stop right where it crashes and can show you a stack trace. If necessary, inspect the value of variables.

If gdb does not stop on its own, someone is calling exit to terminate prematurely. Set a breakpoint at exit to get a stack trace.

If you cannot use gdb on your server and you cannot trigger the crash on your desktop, maybe you can let it coredump on the server? Then use gdb to inspect the dump.

Did you try to annotate your code with @safe? It helps to avoid errors leading to segmentation faults.

September 14, 2017
On 09/13/2017 03:20 AM, Thorsten Sommer wrote:

> No stack trace, no exception, nothing.

Maybe the OOM Killer if running on Linux.

Ali

September 15, 2017
Thank you very much for the different approaches. Vladimir, I installed the GDB today and try to gain new insights with it. Rikki, we are aware of the advantages of LDC. But first of all we want the program to run with DMD. After that we would then switch to LDC.

I have already introduced try-catch blocks on "Throwable" around all program parts, which unfortunately does not work. We also use logging. Unfortunately, these measures do not work.

Moritz, thank you for the idea of checking the exit code. I have adjusted the Dockerfile accordingly: Our code leads to at least one segmentation fault. I hope to be able to identify the position with GDB.

Qznc, we just put your suggestion into practice. Hope to find out more with GDB now. Installed GDB in the Docker container and automated the launch. Should actually work, the test is running while I am writing this text.

Ali, thanks for the tip with OOM Killer. I never knew that fact before. At the moment it is the case that segmentation fault occurs before we even begin to reach a memory limit. However, I will keep this in mind for further work and testing.

Thank you all so much. We will now work with GDB and hopefully solve the problem.
September 15, 2017
On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:
> Dear Community,
>
> My student assistant and I working on an artificial intelligence library in D for a while. This library is part of my PhD thesis in order to perform several experiments to push the state of the art.
>
> [...]

I had the same issue trying to use the std.experimental.xml library.

* Ran an example
* Crashes at some posix, C library writing to a file.
* Gave up, now looking at other programming language (rust)
September 15, 2017
http://vibed.org/docs#handling-segmentation-faults
this should help

On Fri, Sep 15, 2017 at 8:17 AM, Thorsten Sommer via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> Thank you very much for the different approaches. Vladimir, I installed the GDB today and try to gain new insights with it. Rikki, we are aware of the advantages of LDC. But first of all we want the program to run with DMD. After that we would then switch to LDC.
>
> I have already introduced try-catch blocks on "Throwable" around all program parts, which unfortunately does not work. We also use logging. Unfortunately, these measures do not work.
>
> Moritz, thank you for the idea of checking the exit code. I have adjusted the Dockerfile accordingly: Our code leads to at least one segmentation fault. I hope to be able to identify the position with GDB.
>
> Qznc, we just put your suggestion into practice. Hope to find out more with GDB now. Installed GDB in the Docker container and automated the launch. Should actually work, the test is running while I am writing this text.
>
> Ali, thanks for the tip with OOM Killer. I never knew that fact before. At the moment it is the case that segmentation fault occurs before we even begin to reach a memory limit. However, I will keep this in mind for further work and testing.
>
> Thank you all so much. We will now work with GDB and hopefully solve the problem.
>


September 15, 2017
On Friday, 15 September 2017 at 06:22:01 UTC, Swoorup Joshi wrote:
> On Wednesday, 13 September 2017 at 10:20:48 UTC, Thorsten Sommer wrote:
>> Dear Community,
>>
>> My student assistant and I working on an artificial intelligence library in D for a while. This library is part of my PhD thesis in order to perform several experiments to push the state of the art.
>>
>> [...]
>
> I had the same issue trying to use the std.experimental.xml library.
>
> * Ran an example
> * Crashes at some posix, C library writing to a file.
> * Gave up, now looking at other programming language (rust)

What did you expect from unofficial alpha package?
« First   ‹ Prev
1 2