December 23, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com


--- Comment #9 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-12-23 01:58:21 PST ---
This one was actually simple to decipher - map recomputes values across
multiple iterations. I instrumented the code to show
a) ref address of threads in start loop and join loop
b) print Mapped each time a functor that creates the thread is run.

I get 2 times and 2 different addresses. The only question left is why it ever worked.

The right aproach would be to call array on result of task map and do not attempt 2nd lazy evluation of it.

Modifed source below:


import std.algorithm ;
import std.datetime ;
import std.range ;
import std.stdio;

import core.thread ;

import output_d ;

shared double sum ;
shared Object sumMutex ;

void partialSum ( immutable int id , immutable int sliceSize , immutable double
delta ) {
  immutable start = 1 + id * sliceSize ;
  immutable end = ( id + 1 ) * sliceSize ;
  auto localSum = 0.0 ;
  foreach ( i ; start .. end + 1 ) {
    immutable x = ( i - 0.5 ) * delta ;
    localSum += 1.0 / ( 1.0 + x * x ) ;
  }
  synchronized ( sumMutex ) { sum += localSum ; }
}

void execute ( immutable int numberOfThreads ) {
  immutable n = 1000000000 ;
  immutable delta = 1.0 / n ;
  StopWatch stopWatch ;
  stopWatch.start ( ) ;
  immutable sliceSize = n / numberOfThreads ;
  sum = 0.0 ;
  auto threads = map ! ( ( int i ) {
      auto closedPartialSum ( ) {
        immutable ii = i ;
        return delegate ( ) { partialSum ( ii , sliceSize , delta ) ; } ;
      }
      writeln("Mapped!");
      return new Thread ( closedPartialSum ) ;
      } ) ( iota ( numberOfThreads ) ) ;
  foreach ( thread ; threads ) {
    writefln("%x", cast(void*)thread);
    thread.start ( ) ;
  }
  foreach ( thread ; threads ) {
    writefln("%x", cast(void*)thread);
    thread.join ( ) ;
  }
  immutable pi = 4.0 * delta * sum ;
  stopWatch.stop ( ) ;
  immutable elapseTime = stopWatch.peek ( ).hnsecs * 100e-9 ;
  output ( __FILE__ , pi , n , elapseTime , numberOfThreads ) ;
}

int main ( immutable string[] args ) {
  sumMutex = new shared ( Object ) ;
  execute ( 1 ) ;
  execute ( 2 ) ;
  execute ( 8 ) ;
  execute ( 32 ) ;
  return 0 ;
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 23, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774



--- Comment #10 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-12-23 02:32:18 PST ---
(In reply to comment #8)
> Here is a simple test case:
> 
> -----
> 
> module program;
> 
> import std.stdio;
> import core.thread;
> 
> void main () {
> 
>   Thread t1, t2;
> 
>   t1 = new Thread(delegate { t2.start(); });
>   t2 = new Thread(delegate { Thread.sleep(dur!"seconds"(1)); });
> 
>   t1.start();
>   t2.join();
> 
> }
> 
> -----
> 
> http://dpaste.dzfl.pl/0d24dd06
> 
> output:
>   core.thread.ThreadException@src/core/thread.d(780): Unable to join thread
> 
> if t2.join occurs after t2 already finished then exception is not thrown, hence the sleep

This one is a genuine race condition: t2.join could be called before t2 is actually started by t1. And as far as I can tell this is the *most* *probable* outcome. So it can't be seriously taken as test case without proper synchronization between threads.

What it shows though is that you can't join a thread that isn't started and the error is "Unable to join thread".

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 25, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774


Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@digitalmars.com


--- Comment #11 from Walter Bright <bugzilla@digitalmars.com> 2012-12-25 03:36:13 PST ---
Dmitry, does this mean this is not a bug in the compiler or library?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 25, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774



--- Comment #12 from Alex Rønne Petersen <alex@lycus.org> 2012-12-25 12:40:19 CET ---
Well, we probably should make sure that by the time Thread.start() returns, the
thread is joinable (waiting for the thread to start in Thread.join() would
probably be undesirable).

In any case, it probably isn't a bug, but a very reasonable enhancement request - I know I'd expect it to behave like that.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 25, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774



--- Comment #13 from luka8088 <luka8088@owave.net> 2012-12-25 04:42:33 PST ---
(In reply to comment #10)
> (In reply to comment #8)
> > Here is a simple test case:
> > 
> > -----
> > 
> > module program;
> > 
> > import std.stdio;
> > import core.thread;
> > 
> > void main () {
> > 
> >   Thread t1, t2;
> > 
> >   t1 = new Thread(delegate { t2.start(); });
> >   t2 = new Thread(delegate { Thread.sleep(dur!"seconds"(1)); });
> > 
> >   t1.start();
> >   t2.join();
> > 
> > }
> > 
> > -----
> > 
> > http://dpaste.dzfl.pl/0d24dd06
> > 
> > output:
> >   core.thread.ThreadException@src/core/thread.d(780): Unable to join thread
> > 
> > if t2.join occurs after t2 already finished then exception is not thrown, hence the sleep
> 
> This one is a genuine race condition: t2.join could be called before t2 is actually started by t1. And as far as I can tell this is the *most* *probable* outcome. So it can't be seriously taken as test case without proper synchronization between threads.
> 
> What it shows though is that you can't join a thread that isn't started and the error is "Unable to join thread".


Yes, you are correct, it does not make much sense, but there is another issue. The following code throws an exception on 64-bit linux (32-bit linux and 32-bit windows executes without throwing an exception). On 64-bit linux t2 is never started.

-----

module program;

import std.stdio;
import core.thread;

void main () {

  Thread t1, t2;
  bool runned = false;

  t1 = new Thread(delegate { t2.start(); });
  t2 = new Thread(delegate { runned = true; });

  t1.start();
  Thread.sleep(dur!"seconds"(1));
  assert(runned);

}

-----

http://dpaste.dzfl.pl/5dc9733e


Should I file a new bug report ?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 25, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774



--- Comment #14 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-12-25 06:03:35 PST ---
(In reply to comment #11)
> Dmitry, does this mean this is not a bug in the compiler or library?

From what I can gather the bug is not in compiler but in the original code making wrong assumptions about behavior of map in Phobos(it doesn't cache any value but computes them on demand).

I suspect that this problem with the code was apparently hidden by the fact that join didn't always throw exception on the wrong join (e.g. on a thread that isn't started). In other words the code worked by pure luck and it needs fixing.

However I'm still getting access violation after fixing the original bug. I'll investigate further as to what is the cause. It may be an unrelated bug with a closure as there are 3 level of these.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 25, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774



--- Comment #15 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-12-25 06:14:16 PST ---
(In reply to comment #13)
> 
> Yes, you are correct, it does not make much sense, but there is another issue. The following code throws an exception on 64-bit linux (32-bit linux and 32-bit windows executes without throwing an exception). On 64-bit linux t2 is never started.
> 
How did you get to this conclusion?

> -----
> 
> module program;
> 
> import std.stdio;
> import core.thread;
> 
> void main () {
> 
>   Thread t1, t2;
>   bool runned = false;
> 
>   t1 = new Thread(delegate { t2.start(); });
>   t2 = new Thread(delegate { runned = true; });
> 
>   t1.start();
>   Thread.sleep(dur!"seconds"(1));
>   assert(runned);
> 
> }
> 
> -----
> 
> http://dpaste.dzfl.pl/5dc9733e
> 
> 
> Should I file a new bug report ?

And again there is no guarantee that:
a) access to runned is properly guarded and made visible in all threads as it's
on stack and not declared as shared nor there are memory barriers. TEchnically
compiler is free to do what the heck it wants to.
b) sleep is not a tool to coordinate threads, use locks and condition variables
(or spin on atomic variables) as there is *always* a *chance* to get to
assert(runned) before t2 is executed.

With all that being said it very well *may* be a bug. Yet the test is still bogus. Again until proper synchronization is enacted it's wrong to conclude *anything* aside from the fact that it's ultimately indeterministic.

So if you'd get this in a form where there are no probabilities involved then sure it's a bug and the report is appreciated.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 26, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|2.059 worked 2.060 does     |2.059 worked 2.060 does
                   |not: Unable to join thread  |not: nested delegate memory
                   |                            |corruption


--- Comment #16 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-12-26 03:57:08 PST ---
Okay I drilled this down and the bug is not related to threading.

Short-list for Russel:
- the bug in the original code is fixed by .array on result of map
- to workaround a strange stackframe corruption bug pass 'i' instead of 'ii' in
partialSum and it works.


Now on to the real problem discovered, I guess Walter has some work to do though as it doesn't look related to threading and run-time at all.

The minimized test case with threading removed,
shows that variable sliceSize gets corrupted:

import std.stdio;
import std.algorithm;
import std.range;

int sliceBefore;

void partialSum ( immutable int id , immutable int sliceSize , immutable double
delta ) {
  writeln("PSUM: ", id, "  SLICE: ", sliceSize, " DELTA:", delta);
  assert(sliceSize == sliceBefore);
}

void corrupt( immutable int numberOfThreads ) {
    immutable n = 1000000000 ;
    immutable delta = 1.0 / n ;
    immutable sliceSize = n / numberOfThreads ;
    sliceBefore = sliceSize;
    writeln("REAL SLICE SIZE: ", sliceSize);
    auto threads = map ! ( ( int i ) {
        auto closedPartialSum ( ) {
            immutable ii = i ;
            return delegate ( ) {
                                //change ii to i and it works
                partialSum (ii , sliceSize , delta ) ;
            };
        }
        return closedPartialSum;
    }) ( iota ( numberOfThreads ) ).array;
    foreach(dg; threads)
        dg();
}

void main(){
  corrupt(1);
}


Yields the following (exact corrupted number in SLICE varies):

REAL SLICE SIZE: 1000000000
PSUM: 0  SLICE: 1588842464 DELTA:1e-09
core.exception.AssertError@quadrature(15): Assertion failure
....

Can somebody having 2.059 check if this simplified test case passes there.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 26, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774



--- Comment #17 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-12-26 05:19:36 PST ---
Spent some more time & removed Phobos dependencies.
The sample below runs fine with -version=OK and triggers assert without.
Curiously the only difference is a = a[1..$] vs the same via popFront (where
array is passed by ref).

void popFront(A)(ref A[] a)
{
    a = a[1 .. $];
}

private struct MapResult(alias fun, Range)
{
    alias Range R;
    R _input;

    this(R input)
    {
        _input = input;
    }

    @property bool empty()
    {
        return _input.length == 0;
    }

    void popFront()
    {
        version(OK)
            _input = _input[1 .. $];
        else
            _input.popFront();
        //
    }

    @property auto ref front()
    {
        return fun(_input[0]);
    }

    @property auto length()
    {
        return _input.length;
    }

    alias length opDollar;
}

auto map(alias fun, Range)(Range r)
{
    return MapResult!(fun, Range)(r);
}

int sliceBefore;

void corrupt( immutable int numberOfThreads ) {
    immutable n = 1000000000 ;
    immutable delta = 1.0 / n ;
    immutable sliceSize = n / numberOfThreads ;
    sliceBefore = sliceSize;
    auto threads = map!( ( int i ) {
        auto closedPartialSum() {
            immutable ii = i ;
            return delegate ( ) {
                assert(sliceSize == sliceBefore);
            };
        }
        return closedPartialSum();
    }) ( [0] );

    //create array of delegates and copy them using range-based foreach
    auto dgs = new void delegate()[threads.length];
    size_t i=0;
    foreach(dg; threads) //empty-front-popFront
        dgs[i++] = dg;

    foreach(dg; dgs)
        dg();
}

void main(){
  corrupt(1);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 26, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8774


Maxim Fomin <maxim@maxim-fomin.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maxim@maxim-fomin.ru


--- Comment #18 from Maxim Fomin <maxim@maxim-fomin.ru> 2012-12-26 05:28:43 PST ---
This looks like template lambda bug, but strictly speaking it is not. However, symptoms are same - usage of map with delegates/lambdas corrupts values.

May be related: issue 8899 and
- 8514 : delegate and map leads to segfault, importing standard module affects
behavior
- 8854 : as described above
- 8832 : similar code which uses map and delegate has different and incorrect
behavior
- 5064 (???): program crashes when using map and delegate
- 7978 : map, take and iota functions with delegates

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------