| Thread overview | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
May 24, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
On 2013-05-24, 15:26, Joseph Rushton Wakeling wrote: > Hello all, > > Are there any recommended strategies for passing large or complex data > structures (particularly reference types) to threads? > > For the purpose of this discussion we can assume that it's read-only data, so if > we're talking about just an array (albeit perhaps a large one) I guess just > passing an .idup copy would be best. However, the practical situation I have is > a data structure of the form, > > Tuple!(size_t, size_t)[][] > > ... which I _could_ .idup, but it's a little bit of a hassle to do so, so I'm > wondering if there are alternative ways or suggestions. First, *is* it read-only? If so, store it as immutable and enjoy free sharing. If not, how and why not? -- Simen | ||||
May 26, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
On 05/24/2013 04:39 PM, Simen Kjaeraas wrote:
> First, *is* it read-only? If so, store it as immutable and enjoy free sharing. If not, how and why not?
I can confess that it's as simple as feeling extremely uncomfortable dealing with immutable where it relates to any kind of complex data structure.
I mean, for that double array structure I'd have to do something like,
Tuple!(size_t, size_t)[][] dataCopy;
foreach(x; data) {
Tuple!(size_t, size_t)[] xCopy;
foreach(y; x) {
immutable(Tuple!(size_t, size_t)) yCopy = y.idup;
xCopy ~= cast(Tuple!(size_t, size_t)) yCopy;
}
immutable xImm = assumeUnique(xCopy);
dataCopy ~= cast(Tuple!(size_t, size_t)[]) xImm;
}
immutable dataImm = assumeUnique(dataCopy);
... no? Which feels like a lot of hassle compared to just being able to pass each thread the information to independently load the required data.
I'd be delighted to discover I'm wrong about the hassle of converting the data to immutable -- I don't think I understand how to use it at all well, bad experiences in the past have meant that I've tended to avoid it.
| ||||
May 26, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
On Sun, 26 May 2013 14:06:39 +0200, Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> wrote: > On 05/24/2013 04:39 PM, Simen Kjaeraas wrote: >> First, *is* it read-only? If so, store it as immutable and enjoy free >> sharing. If not, how and why not? > > I can confess that it's as simple as feeling extremely uncomfortable dealing > with immutable where it relates to any kind of complex data structure. > > I mean, for that double array structure I'd have to do something like, > > Tuple!(size_t, size_t)[][] dataCopy; > > foreach(x; data) { > Tuple!(size_t, size_t)[] xCopy; > foreach(y; x) { > immutable(Tuple!(size_t, size_t)) yCopy = y.idup; > xCopy ~= cast(Tuple!(size_t, size_t)) yCopy; > } > immutable xImm = assumeUnique(xCopy); > dataCopy ~= cast(Tuple!(size_t, size_t)[]) xImm; > } > > immutable dataImm = assumeUnique(dataCopy); > > ... no? Which feels like a lot of hassle compared to just being able to pass > each thread the information to independently load the required data. > > I'd be delighted to discover I'm wrong about the hassle of converting the data > to immutable -- I don't think I understand how to use it at all well, bad > experiences in the past have meant that I've tended to avoid it. That looks very complex for what it purports to do. I understand data is the original data before sharing? If so, will that a -- Simen | ||||
May 26, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
On Sun, 26 May 2013 14:06:39 +0200, Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> wrote: > On 05/24/2013 04:39 PM, Simen Kjaeraas wrote: >> First, *is* it read-only? If so, store it as immutable and enjoy free >> sharing. If not, how and why not? > > I can confess that it's as simple as feeling extremely uncomfortable dealing > with immutable where it relates to any kind of complex data structure. > > I mean, for that double array structure I'd have to do something like, > > Tuple!(size_t, size_t)[][] dataCopy; > > foreach(x; data) { > Tuple!(size_t, size_t)[] xCopy; > foreach(y; x) { > immutable(Tuple!(size_t, size_t)) yCopy = y.idup; > xCopy ~= cast(Tuple!(size_t, size_t)) yCopy; > } > immutable xImm = assumeUnique(xCopy); > dataCopy ~= cast(Tuple!(size_t, size_t)[]) xImm; > } > > immutable dataImm = assumeUnique(dataCopy); > > ... no? Which feels like a lot of hassle compared to just being able to pass > each thread the information to independently load the required data. > > I'd be delighted to discover I'm wrong about the hassle of converting the data > to immutable -- I don't think I understand how to use it at all well, bad > experiences in the past have meant that I've tended to avoid it. That looks very complex for what it purports to do. I understand data is the original data before sharing? If so, will that array ever change again? I think a bit more information is needed. I'm going to assume this is (roughly) how things work: 1. Read from file/generate/load from database/create data. 2. Share data with other threads. 3. Never change data again. If this is correct, this should work: Tuple!(size_t, size_t)[][] data = createData(); immutable dataImm = assumeUnique(data); data = null; // Simply to ensure no mutable references exist. sendToOtherThreads(dataImm); And that's it. If nobody's going to change the data again, it's perfectly safe to tell the compiler 'this is now immutable'. No copies need to be made, no idup, no explicit casting (except that done internally by assumeUnique), no troubles. -- Simen | ||||
May 26, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Simen Kjaeraas | On 05/26/2013 05:38 AM, Simen Kjaeraas wrote: > Tuple!(size_t, size_t)[][] data = createData(); > immutable dataImm = assumeUnique(data); > data = null; // Simply to ensure no mutable references exist. The last line is not needed. assumeUnique already does that. :) Ali | |||
May 26, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Sun, 26 May 2013 17:59:32 +0200, Ali Çehreli <acehreli@yahoo.com> wrote: > On 05/26/2013 05:38 AM, Simen Kjaeraas wrote: > > > > Tuple!(size_t, size_t)[][] data = createData(); > > immutable dataImm = assumeUnique(data); > > data = null; // Simply to ensure no mutable references exist. > > The last line is not needed. assumeUnique already does that. :) Cool. I thought it might. -- Simen | |||
May 27, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On 05/26/2013 05:59 PM, Ali Çehreli wrote:
> On 05/26/2013 05:38 AM, Simen Kjaeraas wrote:
>
>
>> Tuple!(size_t, size_t)[][] data = createData();
>> immutable dataImm = assumeUnique(data);
>> data = null; // Simply to ensure no mutable references exist.
>
> The last line is not needed. assumeUnique already does that. :)
That's fantastic, thank you both very much. Does that also work for arbitrary data structures (e.g. also associative arrays, complex structs/classes etc.)?
Related question -- assume that I now want to store that immutable data inside a broader storage class, but I want that storage class to be agnostic as to whether the data is immutable, const or mutable.
Something like this:
class MyDataStore
{
float[] someData;
uint[] someMoreData;
Tuple!(size_t, size_t)[][] importedData;
this(float[] sd, uint[] smd, Tuple!(size_t, size_t)[][] id)
{
someData = sd;
someMoreData = smd;
importedData = id;
}
}
... which of course fails if you try passing it immutable data for any of the parameters. So, is there a way to make this broader storage class type-qualifier-agnostic?
I guess applying "inout" to the input parameters is necessary, but it's clearly not sufficient as the code then fails when trying to assign to the class' internal variables.
| |||
May 27, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
On Mon, 27 May 2013 14:08:12 +0200, Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> wrote: > On 05/26/2013 05:59 PM, Ali Çehreli wrote: >> On 05/26/2013 05:38 AM, Simen Kjaeraas wrote: >> >> >>> Tuple!(size_t, size_t)[][] data = createData(); >>> immutable dataImm = assumeUnique(data); >>> data = null; // Simply to ensure no mutable references exist. >> >> The last line is not needed. assumeUnique already does that. :) > > That's fantastic, thank you both very much. Does that also work for arbitrary > data structures (e.g. also associative arrays, complex structs/classes etc.)? Absolutely. So long as your code does not squirrel away other, mutable references to the data, assumeUnique is perfectly safe. > Related question -- assume that I now want to store that immutable data inside a > broader storage class, but I want that storage class to be agnostic as to > whether the data is immutable, const or mutable. > > Something like this: > > class MyDataStore > { > float[] someData; > uint[] someMoreData; > Tuple!(size_t, size_t)[][] importedData; > > this(float[] sd, uint[] smd, Tuple!(size_t, size_t)[][] id) > { > someData = sd; > someMoreData = smd; > importedData = id; > } > } > > ... which of course fails if you try passing it immutable data for any of the > parameters. So, is there a way to make this broader storage class > type-qualifier-agnostic? > > I guess applying "inout" to the input parameters is necessary, but it's clearly > not sufficient as the code then fails when trying to assign to the class' > internal variables. A few questions: Why use a class? Will MyDataStore be subclassed? Will you have some instances of MyDataStore that will be mutated, and others that will always stay the same? If the answer was yes, will these be in the same array? Short answer: If you will have mixed arrays, no. There's no way to make that safe. If you don't have mixed arrays, there are ways. This will work: import std.stdio : writeln; import std.exception : assumeUnique; import std.typecons : Tuple, tuple; class MyDataStore { float[] someData; uint[] someMoreData; Tuple!(size_t, size_t)[][] importedData; inout this(inout float[] sd, inout uint[] smd, inout Tuple!(size_t, size_t)[][] id) { someData = sd; someMoreData = smd; importedData = id; } } void main( ) { float[] sdMut = [1,2,3]; uint[] smdMut = [4,5,6]; Tuple!(size_t, size_t)[][] idMut = [[tuple(0u, 0u), tuple(0u, 1u)],[tuple(1u, 0u), tuple(1u, 1u)]]; immutable float[] sdImm = [1,2,3]; immutable uint[] smdImm = [4,5,6]; immutable Tuple!(size_t, size_t)[][] idImm = [[tuple(0u, 0u), tuple(0u, 1u)],[tuple(1u, 0u), tuple(1u, 1u)]]; auto a = new MyDataStore(sdMut, smdMut, idMut); immutable b = new immutable MyDataStore(sdImm, smdImm, idImm); const c = new const MyDataStore(sdImm, smdMut, idImm); } (Tested with 2.063 beta, it's possible there are complications in 2.062) -- Simen | ||||
May 28, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
On 05/27/2013 11:33 PM, Simen Kjaeraas wrote: > A few questions: > > Why use a class? Will MyDataStore be subclassed? It was important to me that it have reference semantics, in particular that a = b implies a is b. > Will you have some instances of MyDataStore that will be mutated, and others that will always stay the same? > > If the answer was yes, will these be in the same array? I'm not sure I understand the question. If you mean, are there certain members of the class that will be mutated and some that won't, then yes. So, I don't think I can follow your example of an immutable instance of the whole class. I'll give a longer explanation of what I'm trying to do, just for context. I'm carrying out Monte Carlo simulations and have been trying to write a fairly generic set of code for that purpose. Essentially I define a range which covers successive steps of the Monte Carlo process. What we're doing here is simulating a model on a system with a given configuration. The Monte Carlo process randomly mutates the configuration and alternatively selects or rejects the mutation depending on a given fitness function. So, the process needs to be handed two sets of data. The first, which I call "state", defines the variables that change when the model is run. The second set, which I call the "seed", are the parameters that are constant relative to the model being examined, but that can be mutated by the Monte Carlo process. (So, for example, one can optimize the configuration according to certain criteria for how we want the model to behave.) Now, depending on the models I'm examining, obviously the contents of the state and the seed may vary. The solution I found was to define a struct of the form, struct MonteCarlo(State, Seed /* some other parameters */) { this(ref State st, ref Seed sd /* other stuff */) { } } ... and internally, this stores three different State and Seed instances: one which stores the optimal solution found; one which stores the current selected state and seed; and finally, one which stores the mutated state and seed. I guess there could be other ways to handle these kinds of variable input data in a generic way, but the easiest I could think of was just to define State and Seed storage classes that would gather together all the relevant variables. Both have forms along the lines of class StateInstance { double[] a; double[] b; size_t c; } class SeedInstance { double[] d; size_t[] e; Tuple!(size_t, size_t)[][] f; } Now, _some_ of what goes into the Seed can be data imported from file, that really will never change, and it's convenient to pass it to threads as immutable. But I don't want to force it to _always_ be immutable inside the Seed class, because there could be other cases where it's that data that's being mutated by the Monte Carlo process. All of this feels like a lot of fuss over not a lot, because I have working solutions -- it just needs to be edited and recompiled in order to run with different input data, which is not actually that onerous for my use case. But it'd be nice to be able to tidy everything up a bit in case it can be useful to other people when it gets released. Hence the question about passing data to threads, and then the problem of how to incorporate that data into a storage class. > Short answer: If you will have mixed arrays, no. There's no way to make that safe. If you don't have mixed arrays, there are ways. So you mean there's no way to have one member variable be immutable, the rest mutable, without hardcoding that into the class/struct design? | ||||
May 28, 2013 Re: Passing large or complex data structures to threads | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On 05/27/2013 06:55 PM, Joseph Rushton Wakeling wrote:
> On 05/27/2013 11:33 PM, Simen Kjaeraas wrote:
>> Short answer: If you will have mixed arrays, no. There's no way to make
>> that safe. If you don't have mixed arrays, there are ways.
>
> So you mean there's no way to have one member variable be immutable, the rest
> mutable, without hardcoding that into the class/struct design?
That is a difficult situation to manage though: What operations are valid under the 8 total mutable combinations of 3 members? The compiler must know what operations to be applied on what type of data so that it can both check the code and compile it accordingly.
void foo(MyDataStore my)
{
my.someData[0] = 1.5; // valid if someData is mutable
my.someMoreData[0] = 42; // valid if someMoreData is mutable
}
Clearly, the code above can be compiled only if the two members of MyDataStore are mutable. The compiler does not have anything other than the static definition of MyDataStore, which implies that the type qualifiers of the members must be known at compile time.
To have such a flexibility, the type must be templatized. When it is templatized though, different instantiations of the template are different types, which means that they cannot take part in a collection of a specific type, unless they implement a certain interface like DataStore below:
import std.typecons;
interface DataStore
{
void doWork();
}
class MyDataStoreImpl(SomeDataT,
SomeMoreDataT,
ImportedDataT) : DataStore
{
SomeDataT[] someData;
SomeMoreDataT[] someMoreData;
ImportedDataT[][] importedData;
this(SomeDataT[] sd, SomeMoreDataT[] smd, ImportedDataT[][] id)
{
someData = sd;
someMoreData = smd;
importedData = id;
}
void doWork()
{
// What can we do here?
// There are 8 combinations of type qualifiers.
}
}
auto makeMyDataStore(SomeDataT,
SomeMoreDataT,
ImportedDataT)(SomeDataT[] sd,
SomeMoreDataT[] smd,
ImportedDataT[][] id)
{
return new MyDataStoreImpl!(SomeDataT,
SomeMoreDataT,
ImportedDataT)(sd, smd, id);
}
void foo(DataStore[] store)
{
foreach (data; store) {
data.doWork();
}
}
void main()
{
DataStore[] store;
// The combination: mutable, const, immutable
store ~= makeMyDataStore(new float[1],
new const(double)[3],
[ [ immutable(Tuple!(size_t, size_t))() ] ]);
// Another combination: immutable, mutable, const
store ~= makeMyDataStore(new immutable(float)[2],
new double[4],
[ [ const(Tuple!(size_t, size_t))() ] ]);
foo(store);
}
There is the big question of how to implement MyDataStoreImpl.doWork(). How do we support every valid combination?
Conditional compilation is one way:
class MyDataStoreImpl(SomeDataT,
SomeMoreDataT,
ImportedDataT) : DataStore
{
// ...
void doWork()
{
static if (is (SomeMoreDataT == const) &&
is (ImportedDataT == immutable)) {
// ...
writeln("case 1");
} else static if(is (SomeDataT == immutable) &&
is (ImportedDataT == const)) {
// ...
writeln("case 2");
} else {
// ...
}
}
}
Perhaps mixins can be used to inject different functionality depending on different type qualifiers if the functionality is as simple as in the following case:
import std.stdio;
// This template contains code for a mutable slice and the function that goes
// with it:
template Foo(T)
if (!is (T == immutable) &&
!is (T == const))
{
T data[];
void doWork()
{
writeln("called for mutable data");
if (data.length < 1) {
data.length = 1;
}
data[0] = T.init;
}
}
// This one is for immutable and const data:
template Foo(T)
if (is (T == immutable) ||
is (T == const))
{
T[] data;
void doWork()
{
// We cannot modify data[0] here...
writeln("called for immutable data");
}
}
interface DataStore
{
void doWork();
}
class MyDataStoreImpl(T) : DataStore
{
mixin Foo!T;
}
void main()
{
DataStore[] store;
store ~= new MyDataStoreImpl!(double)();
store ~= new MyDataStoreImpl!(immutable(double))();
foreach (data; store) {
data.doWork();
}
}
Ali
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply