Jump to page: 1 2
Thread overview
A question about move() and a rant about shared
Jan 24, 2014
Stanislav Blinov
Jan 24, 2014
Stanislav Blinov
Jan 24, 2014
Dmitry Olshansky
Jan 24, 2014
Stanislav Blinov
Jan 25, 2014
Dmitry Olshansky
Jan 25, 2014
Stanislav Blinov
Jan 25, 2014
Dmitry Olshansky
Jan 25, 2014
Stanislav Blinov
Jan 26, 2014
Stanislav Blinov
Jan 26, 2014
Stanislav Blinov
Jan 26, 2014
Stanislav Blinov
January 24, 2014
Ok, this is going to be a long one, so please bear with me.

I'll start with a question.

1. std.algorithm.move() and std.container

TDPL describes when a compiler can and cannot perform a move automatically.
For cases when it isn't done automatically but we explicitly require a move, we have
std.algorithm.move(). This function comes in extremely handy especially when we
want to pass around data that cannot be otherwise copied (disabled this(this)).
For example, sinking some unique value into a thread or storing it in a container.
Or even both.

But why is there no practical way of storing such uncopyable data in standard
containers? I.e. both Array and DList do try perform a copy when insert() is called,
and happily fail when this(this) is @disabled. Same with access: front() returns
by value, so again no luck with @disabled this(this).

What is interesting though is that range interfaces for containers
do allow for moveFront() et al., and for some containers they're even defined.
So it's safe to move contents *out* but not *in*?
Is there some deeper technical reasoning behind this that I fail to see?




Below is a medium rant that's somewhat unrelated to the above, and is
aimed at receiving insights from those who're interested,
so if you're not, just skip it :)

I'll use quotes here to distinguish words from language qualifier.
This is mostly my current thoughts on "shared" and its usage, and I'd like
that you could point out where I'm wrong in this sensitive topic, any feedback
is greately appreciated.


2. "shared" is transitive. How transitive?

Declaring something as "shared" means that all its representation is also "shared".
This is a good thing, right?. But it does have certain implications. Consider a shared
data structure (for example, a (multi)producer (multi)consumer queue). If it's designed
to store anything that has indirections (pointers, references), those are better be
either provably unique (not possble in D except for immutable data),
or "shared". In fact, the whole stored type should just be "shared",
which is enforced by the compiler. Thus, we come to this:

shared class Queue(T) {
	private Container!T q;  // that'll be shared(Container!T),
	                        // which will in turn store shared(T)
	alias shared(T) Type;

	// push and pop are of course synchronized
	void push(Type) { ... }
	Type pop() { ... }
}

But in case when there are no indirections (i.e. a primitive type, or, more practically,
a struct with some primitive fields and a bunch of methods that reason about that data
or maybe do something with it) it all comes down to usage. In case of that queue, no two threads
could possibly access the same data simultaneously.
Let me define it real quick (somewhat contrived but should state the intent):

struct Packet {
	ulong ID;
	ubyte[32] header;
	ubyte[64] data;

	string type() inout @property { ... }
	ulong checkSum() inout @property { ... }
	Variant payload() inout @property { ... }
}

Note that I do have arrays in there, but they cannot possibly introduce any aliasing,
since they're static.

As soon as a producer pushes such value, it releases ownership of it, and some consumer
later gains ownership. Remember, there are no indirections, so no two threads could race
against the same data. But I cannot just declare a plain struct and then start
pushing it into that queue. It wouldn't work, because queue expects "shared" type.

One solution would be to use a cast. On one hand, this is feasible: such data is
really only logically shared when it's somewhere "in-between" threads, i.e. sits in a queue.
The "shared" queue owns the data for a moment, and thus makes the data iself "shared".
As soon as a consumer pops that value off a queue, it can be cast back to non-"shared".

This is ugly: it imposes certain convention in handling one "shared"
type (the queue) with another non-"shared" one (the struct): i.e. "always cast when push or pop".
Convention is not a reasonable justification for overlooking type system.

Another solution would be to instantiate those structs as "shared" in the first place.
But that won't work either: now any methods that those structs have must also have
"shared" overloads. In other words, I suddenly need to provide "shared" interface
for my struct. Well ok, I can do that trivially, by just declaring the whole struct type
as "shared". But this is wrong. "shared" advertises certain promise: this data is allowed to be
accessed by more than one thread at a time. This implies that access to the data
is better be synchronized. In other words, I would have to actually *write* the synchronization
for something that would never *need* synchronization. If I don't do it and simply
leave the struct declared as "shared" (or have all relevant "shared" overloads), I'm
shooting someone in the foot: imagine that later someone starts using my types, sees that this struct
is declared "shared" and happily assumes that it can be used concurrently. In that
contrived example it would be easy to see that's not actually the case. But reality is cruel.
Bang.

So ideally I'd want the queue to handle this situation for me, and luckily I can:

shared class Queue(T) {
	static if (hasUnsharedAliasing!T) {
		private Container!T q;           // as before
		alias shared(T) Type;
	} else {
		private __gshared Container!T q; // nothing is "shared" here
		alias T Type;
	}

	// push and pop are still synchronized :)
	void push(Type) { ... }
	Type pop() { ... }
}

But that's not the end of it. As seen from that definition, I'm using some
container (Container(T)) as actual storage. Current definition of Queue
requires that Container(T) have "shared" interface. Either that, or implement
the whole storing business myself right there in the Queue. The latter is certainly not
feasible, especially since, depending on requirements for the Queue,
I may need different storage capabilities.
In short: I don't need that container to be "shared" at all (provided it's a sane container
that doesn't do anything else with the data except for storing it). And in fact, if it were
"shared" already, I wouldn't need to define Queue at all, I'd just use Container directly.

Therefore, final iteration of Queue would look like this:

shared class Queue(T) {
	static if (hasUnsharedAliasing!T)
		alias shared(T) Type;
	else
		alias T Type;

	private __gshared Container!Type q;

	// still synchronized :)
	void push(Type) { ... }
	Type pop() { ... }	
}

All synchronization issues are handled by Queue, Container merely stores the data,
which is in turn "shared" or not depending on its representation.

This is conceptually how std.concurrency works: it allows you to send and receive
plain value types without imposing any "shared" qualification on them, but as soon
as you try to send a non-"shared" reference type or a struct with non-"shared" pointers,
it won't allow it.

There is, however, a nag with this: __gshared is not @safe. But getting rid of it
would mean only one thing: the Queue could only ever store shared(T), which
kind of kills initial message.

So, is "shared" really not as transitive as D wants it to be?

I imagine by now you already have a big list of "you're incorrect"s to stick in my face,
or you probably have already stopped reading :)

I have some more doubts regarding my handling of "shared", but I'll leave them for later so as
to not bore you to death.
January 24, 2014
Ouch. Please excuse the formatting. I didn't pay attention when pasting this :|
January 24, 2014
24-Jan-2014 21:07, Stanislav Blinov пишет:
> Ok, this is going to be a long one, so please bear with me.
>
> I'll start with a question.

2 unrelated questions would be better as 2 nice smaller posts.
Just saying.

> 1. std.algorithm.move() and std.container

[snip]

> But why is there no practical way of storing such uncopyable data in
> standard
> containers? I.e. both Array and DList do try perform a copy when
> insert() is called,
> and happily fail when this(this) is @disabled. Same with access: front()
> returns
> by value, so again no luck with @disabled this(this).

Consider that std.container is an incomplete piece of work that is going to get
an overhaul sometime "soon". Reasons are numerous but one is that sealed container concept couldn't be implemented well enough back then (hence need to copy or explicitly move on access). The second one is that the lack of allocators has pretty much blocked the development of std.container.

> What is interesting though is that range interfaces for containers
> do allow for moveFront() et al., and for some containers they're even
> defined.
> So it's safe to move contents *out* but not *in*?
> Is there some deeper technical reasoning behind this that I fail to see?

moveFront/moveBack/moveAt is a shameful and unknown thing about ranges usually kept in a basement during public talks. Ask Andrei of where these things go.

If I understand the current direction is to go with ref T and make some language
level/compiler level changes to the escaping reference.

> 2. "shared" is transitive. How transitive?
>
> Declaring something as "shared" means that all its representation is
> also "shared".
> This is a good thing, right?.

It is a necessary thing.

> But in case when there are no indirections (i.e. a primitive type, or,
> more practically,
> a struct with some primitive fields and a bunch of methods that reason
> about that data
> or maybe do something with it) it all comes down to usage.
> In case of
> that queue, no two threads
> could possibly access the same data simultaneously.
> Let me define it real quick (somewhat contrived but should state the
> intent):
>
> struct Packet {
>      ulong ID;
>      ubyte[32] header;
>      ubyte[64] data;
>
>      string type() inout @property { ... }
>      ulong checkSum() inout @property { ... }
>      Variant payload() inout @property { ... }
> }
>
> Note that I do have arrays in there, but they cannot possibly introduce
> any aliasing,
> since they're static.

They can. To copy the thing out of somewhere you need to reach into memory area  that is shared between threads and then all bets are off.
//Example:
shared Packet[4] buffer;

There are easily races on fields ID, header and data here. High level invariants that deal with ownership are not represented in D typesystem and hence exist only in the head of people writing code/comments and better be encapsulated in something manageable.

> As soon as a producer pushes such value, it releases ownership of it,
> and some consumer
> later gains ownership. Remember, there are no indirections, so no two
> threads could race
> against the same data. But I cannot just declare a plain struct and then
> start
> pushing it into that queue. It wouldn't work, because queue expects
> "shared" type.

Depends on how you've defined push/pop. The thing is that both operation either accept local Packet or produce local Packet.
void push(ref Packet p);  //copy from local memory to the queue
Packet pop(); //pop to the local memory

But it is intrinsic to the way queue which is local (T1) --> shared --> local (T2) bridge of sorts T1, T2 being some threads.

>
> One solution would be to use a cast. On one hand, this is feasible: such
> data is
> really only logically shared when it's somewhere "in-between" threads,
> i.e. sits in a queue.
> The "shared" queue owns the data for a moment, and thus makes the data
> iself "shared".
> As soon as a consumer pops that value off a queue, it can be cast back
> to non-"shared".

> This is ugly: it imposes certain convention in handling one "shared"
> type (the queue) with another non-"shared" one (the struct): i.e.
> "always cast when push or pop".
> Convention is not a reasonable justification for overlooking type system.

To simplify you put shared qualifier on type and then suddenly it doesn't do magic. Sad but you have to think at what happens where with sharing. Casts or no casts is an internal thing and sadly at the time pretty much required because compiler can't grasp ownership for you.

> Another solution would be to instantiate those structs as "shared" in
> the first place.

And it doesn't work nicely because they are in fact not, only inside of the queue they are shared.


> In short: I don't need that container to be "shared" at all (provided
> it's a sane container
> that doesn't do anything else with the data except for storing it).

There you are horribly wrong. All containers need to be aware of the fact that they are shared between threads else some ugly things are going to happen. (Simultaneous insert and removal from an shared array?)

>
> Therefore, final iteration of Queue would look like this:
>
> shared class Queue(T) {
>      static if (hasUnsharedAliasing!T)
>          alias shared(T) Type;
>      else
>          alias T Type;
>
>      private __gshared Container!Type q;
>
>      // still synchronized :)
>      void push(Type) { ... }
>      Type pop() { ... }
> }

In essence all you need is a locking wrapper container that gives you shared interface around a mutable container or a container that is shared-aware by its nature (lock-free or whatever it may be inside).

The need is well recognized and you are not alone in this.

> There is, however, a nag with this: __gshared is not @safe. But getting
> rid of it
> would mean only one thing: the Queue could only ever store shared(T), which
> kind of kills initial message.

Forget __gshared it's nothing but a transitive kludge.

> So, is "shared" really not as transitive as D wants it to be?

It is. The problem with shared and immutable is that we humans by our very nature expect:

immutable(SomeContainer!T)

to mean the same as
ImmutableSomeContainer!T

Where ImmutableSomeContainer is almost the same thing but in fact is a different type that has immutable interface and handles immutable data well.

Example - ImmutableVector vs Vector.
ImmutableVector need not to known about things such as capacity, has different kind of range and is a different type in general not just Vector with some auto-magic restrictions bolted on top of it.

The same could be said about shared.


-- 
Dmitry Olshansky
January 24, 2014
On Friday, 24 January 2014 at 19:54:31 UTC, Dmitry Olshansky wrote:

> 2 unrelated questions would be better as 2 nice smaller posts.
> Just saying.

I know. Sorry if it got you irritated. I have a feeling I'll be coming back with questions that are based on both move() and "shared" tied up together, hence I posted these together too.

> Consider that std.container is an incomplete piece of work...

Understood. Thanks.

> moveFront/moveBack/moveAt is a shameful and unknown thing about ranges usually kept in a basement during public talks. Ask Andrei of where these things go.

Now you've got me intrigued :)

> If I understand the current direction is to go with ref T and make some language
> level/compiler level changes to the escaping reference.

Escaping... where? By "go with ref T" you mean return value of e.g. front()?

>> struct Packet {
>>     ulong ID;
>>     ubyte[32] header;
>>     ubyte[64] data;
>> }
>>
>> Note that I do have arrays in there, but they cannot possibly introduce any aliasing, since they're static.
>
> They can. To copy the thing out of somewhere you need to reach into memory area  that is shared between threads and then all bets are off.
> //Example:
> shared Packet[4] buffer;

Aha, I see you responded as you went through my post :) I've got this case covered further down. Of course they can this way. But instantiating them as "shared" doesn't make sense anyway, exactly because they are not "shared"-aware.

> High level invariants that deal with ownership are not represented in D typesystem and hence exist only in the head of people writing code/comments and better be encapsulated in something manageable.

But they actually are. immutable === doesn't care: share or own, you only ever can read it. Value type === single owner. Because you can only ever give away the value entirely (move() it) or a copy of it (which is coincidentally both shallow and deep copy). References (any kind thereof) === sharing. Granted, this system is lacking (i.e. it'd be nice to have unique references). But that's at least a bare base.

>> It wouldn't work, because queue expects "shared" type.
>
> Depends on how you've defined push/pop.

I'm losing you here. Without any __gshared "tricks" we have this:

shared class Queue(T) {
    private Container!T q;

    void push(T);
    T pop();
}

Note that the class Queue is "shared"-qualified, meaning that "q" is also "shared", meaning that (due to transitive nature of "shared"), the container ultimately stores shared(T). In fact, the definition above would even fail to compile in this case, because push should take shared(T) and pop should return shared(T) (since Queue will be dealing with a container of shared Ts, right?)

> The thing is that both operation either accept local Packet or produce local Packet.
> void push(ref Packet p);  //copy from local memory to the queue
> Packet pop(); //pop to the local memory

> But it is intrinsic to the way queue which is local (T1) --> shared --> local (T2) bridge of sorts T1, T2 being some threads.

Umm... I did post an interface of Queue, did I not? Where'd that "ref Packet" come from? :) push() should take an rvalue, pop() should return an rvalue. The fact that pushing thread managed to create an rvalue to push means precisely that it no longer has any aliases to pushed data (remember, we're talking value types here). Same with popping thread: as soon as it gets the value, Queue no longer stores it.
If Packet weren't value type (e.g. contained references), it'd break the first invariant I imposed in my original post and would fall into "must be "shared"" category.

>> One solution would be to use a cast...
>> This is ugly...
>> Convention is not a reasonable justification for overlooking type system.
>
> To simplify you put shared qualifier on type and then suddenly it doesn't do magic.

:D Was I that bad at explaining? Shared qualifier doesn't do any magic by itself anyway, it's our task as programmers to make it do what it means. I.e. put synchronization where it belongs.

> Sad but you have to think at what happens where with sharing.

I do. Lately, probably too much :) That's why I posted this.

> Casts or no casts is an internal thing and sadly at the time pretty much required because compiler can't grasp ownership for you.

It can't, and therefore it's my task to solve. I mean, in all of my example the only rock-solid shared thing is the queue. The queue would be the object all threads access simultaneously, the queue would be declared somewhere as "shared(Queue) theQueue;", the queue would be pushed and popped and fed and milked dry of its contents. But the contents, each individual packet, would *never* be accessed by more than one thread at a time, this is by design in this particular case. Generalizing that design to "always support shared" just doesn't make sense: why would I need to implement shared interface (and thus, shared access and all that comes with it) for those Packets if they're never used concurrently?

>> Another solution would be to instantiate those structs as "shared" in
>> the first place.
>
> And it doesn't work nicely because they are in fact not, only inside of the queue they are shared.

Eggz-actly. And even this "sharing" is purely incidental, due to the very nature of both such queues and this particular type Packet. Consider an already existing example of all that jazz with producer-consumer queue: std.concurrency. Only in that case that would be multi producer single consumer. It is perfectly safe to send() value types all day. You don't have to design your structs (or your ints :) ) as shared in order to pass them from one thread to another in a message. That is as long as your types don't have any references or pointers, which would *have* to be "shared".

> There you are horribly wrong. All containers need to be aware of the fact that they are shared between threads else some ugly things are going to happen. (Simultaneous insert and removal from an shared array?)

Umm... I don't follow the logic. As you suggest further below, all I'm doing is designing a container type instances of which can be safely shared between threads. I can't build that container using only other "shared"-enabled types, and why should I? The underlying "storage" container is forever encapsulated in my shared one, the shared one deals with synchronization.

> In essence all you need is a locking wrapper container that gives you shared interface around a mutable container or a container that is shared-aware by its nature (lock-free or whatever it may be inside).

Not exactly. In essence what I need is a "shared" wrapper that deals with all synchronization issues (not necessarily locking). Why does whatever it uses for *storage* have to know anything about sharing? Classical implementations of such queues on top of dlists, ring buffers, other queues...

I know that there are shared structures that can be implemented more efficiently entirely from scratch (Andrei proposes a couple as examples in TDPL, the latter being, in his own words, somewhat barbaric), i.e. that maintain their own storage. But that's not always required nor needed. If I'd reduce my Queue to single producer single consumer (a pipe) it can be happily built on top of a plain array plus a couple of pointers.

Do I understand you correctly that *everything* down to the lowest level should be shared-enabled, i.e. thread-safe, i.e. synchronized? Should Phobos just have e.g. two versions each of Array, DList, RBTree, you-name-it, one shared, one - not? Following that logic, how would you build some shared conatiner on top of D arrays which are by their nature not synchronized?

> The need is well recognized and you are not alone in this.

That's good to hear. That is, if I didn't lose your meaning completely :)

> It is. The problem with shared and immutable is that we humans by our very nature expect:
>
> immutable(SomeContainer!T)
>
> to mean the same as
> ImmutableSomeContainer!T
>
> Where ImmutableSomeContainer is almost the same thing but in fact is a different type that has immutable interface and handles immutable data well.

I'd say that's more of a way of thinking imposed by some other language(s) rather than our human nature. Then again, the boundaries do blur sometimes :)

> Example - ImmutableVector vs Vector.
> ImmutableVector need not to known about things such as capacity, has different kind of range and is a different type in general not just Vector with some auto-magic restrictions bolted on top of it.
>
> The same could be said about shared.

ImmutableVector is a strange concept altogether, but I understand what you mean here. What I still don't completely understand is the general message of "with "shared" everything should be "shared"". How can it be?
January 25, 2014
25-Jan-2014 02:13, Stanislav Blinov пишет:
> On Friday, 24 January 2014 at 19:54:31 UTC, Dmitry Olshansky wrote:
>
>> 2 unrelated questions would be better as 2 nice smaller posts.
>> Just saying.
>
> I know. Sorry if it got you irritated. I have a feeling I'll be coming
> back with questions that are based on both move() and "shared" tied up
> together, hence I posted these together too.
>
>> Consider that std.container is an incomplete piece of work...
>
> Understood. Thanks.
>
>> moveFront/moveBack/moveAt is a shameful and unknown thing about ranges
>> usually kept in a basement during public talks. Ask Andrei of where
>> these things go.
>
> Now you've got me intrigued :)
>
>> If I understand the current direction is to go with ref T and make
>> some language
>> level/compiler level changes to the escaping reference.
>
> Escaping... where? By "go with ref T" you mean return value of e.g.
> front()?
>
>>> struct Packet {
>>>     ulong ID;
>>>     ubyte[32] header;
>>>     ubyte[64] data;
>>> }
>>>
>>> Note that I do have arrays in there, but they cannot possibly
>>> introduce any aliasing, since they're static.
>>
>> They can. To copy the thing out of somewhere you need to reach into
>> memory area  that is shared between threads and then all bets are off.
>> //Example:
>> shared Packet[4] buffer;
>
> Aha, I see you responded as you went through my post :) I've got this
> case covered further down. Of course they can this way. But
> instantiating them as "shared" doesn't make sense anyway, exactly
> because they are not "shared"-aware.
>
>> High level invariants that deal with ownership are not represented in
>> D typesystem and hence exist only in the head of people writing
>> code/comments and better be encapsulated in something manageable.
>
> But they actually are. immutable === doesn't care: share or own, you
> only ever can read it. Value type === single owner. Because you can only
> ever give away the value entirely (move() it) or a copy of it (which is
> coincidentally both shallow and deep copy). References (any kind
> thereof) === sharing. Granted, this system is lacking (i.e. it'd be nice
> to have unique references). But that's at least a bare base.
>
>>> It wouldn't work, because queue expects "shared" type.
>>
>> Depends on how you've defined push/pop.
>
> I'm losing you here. Without any __gshared "tricks" we have this:
>
> shared class Queue(T) {
>      private Container!T q;
>
>      void push(T);
>      T pop();
> }

Which is the same as:

> class Queue(T) {
>      shared private Container!T q;
>
>      void push(T) shared;
>      T pop() shared;
> }
>
> Note that the class Queue is "shared"-qualified, meaning that "q" is
> also "shared", meaning that (due to transitive nature of "shared"), the
> container ultimately stores shared(T).

For functions only this pointer becomes shared. For data - yes, transitivity, turtles all the way down.

> In fact, the definition above
> would even fail to compile in this case, because push should take
> shared(T) and pop should return shared(T) (since Queue will be dealing
> with a container of shared Ts, right?)

No Queue in your definition of Queue it may take local T no problem at all. It all depends on the contents of pop/push to understand if it compiles.

>
>> The thing is that both operation either accept local Packet or produce
>> local Packet.
>> void push(ref Packet p);  //copy from local memory to the queue
>> Packet pop(); //pop to the local memory
>
>> But it is intrinsic to the way queue which is local (T1) --> shared
>> --> local (T2) bridge of sorts T1, T2 being some threads.
>
> Umm... I did post an interface of Queue, did I not? Where'd that "ref
> Packet" come from? :)

Well seeing that Packet is such a fat type, I instinctively went for 'ref' :)

> push() should take an rvalue, pop() should return
> an rvalue. The fact that pushing thread managed to create an rvalue to
> push means precisely that it no longer has any aliases to pushed data
> (remember, we're talking value types here).

If the queue ultimately copies the data to its store it doesn't matter if there are aliases to the original thread's local packet.

From the type system point of view there already can't be aliases in the queue as it's shared an the argument/return type is not :)

> Same with popping thread: as
> soon as it gets the value, Queue no longer stores it.
> If Packet weren't value type (e.g. contained references), it'd break the
> first invariant I imposed in my original post and would fall into "must
> be "shared"" category.
>

Given the interface alone it doesn't break anything. The whole "as soon as it gets value" is the only interesting part. There are different hack/tricks that could be used to make the queue go
shared --copy-> local by hand.

What I can deduce is that shared containers in general that return a copy may do a shallow unqual, i.e. mark one "level" of a indirections as not shared since it is a copy.

>>> One solution would be to use a cast...
>>> This is ugly...
>>> Convention is not a reasonable justification for overlooking type
>>> system.
>>
>> To simplify you put shared qualifier on type and then suddenly it
>> doesn't do magic.
>
> :D Was I that bad at explaining? Shared qualifier doesn't do any magic
> by itself anyway, it's our task as programmers to make it do what it
> means. I.e. put synchronization where it belongs.

Okay.

>> Sad but you have to think at what happens where with sharing.
>
> I do. Lately, probably too much :) That's why I posted this.
>
>> Casts or no casts is an internal thing and sadly at the time pretty
>> much required because compiler can't grasp ownership for you.
>
> It can't, and therefore it's my task to solve. I mean, in all of my
> example the only rock-solid shared thing is the queue. The queue would
> be the object all threads access simultaneously, the queue would be
> declared somewhere as "shared(Queue) theQueue;", the queue would be
> pushed and popped and fed and milked dry of its contents.
> But the
> contents, each individual packet, would *never* be accessed by more than
> one thread at a time, this is by design in this particular case.

Mmm as I've seen above there is no problem modeling it - that is taking local/mutable argument and put it into shared container.

> Generalizing that design to "always support shared" just doesn't make
> sense: why would I need to implement shared interface (and thus, shared
> access and all that comes with it) for those Packets if they're never
> used concurrently?

You don't need shared interface for Packets to store them in a shared container. In this case only because of no indirections property of Packets.

>> In essence all you need is a locking wrapper container that gives you
>> shared interface around a mutable container or a container that is
>> shared-aware by its nature (lock-free or whatever it may be inside).
>
> Not exactly. In essence what I need is a "shared" wrapper that deals
> with all synchronization issues (not necessarily locking). Why does
> whatever it uses for *storage* have to know anything about sharing?
> Classical implementations of such queues on top of dlists, ring buffers,
> other queues...

Yes, by locking wrapper I meant exactly this. I just have no idea how you'd generally wrap unknown beforehand container otherwise (well there are spinlocks and whatnot, but still locks).

> Do I understand you correctly that *everything* down to the lowest level
> should be shared-enabled, i.e. thread-safe, i.e. synchronized? Should
> Phobos just have e.g. two versions each of Array, DList, RBTree,
> you-name-it, one shared, one - not?

No, for many wrapper is enough or even all the boundary of what we could do. What's needed is more containers that are (by their nature) shared (thread-safe). See Java's ConcurrentHashMap.

> Following that logic, how would you
> build some shared conatiner on top of D arrays which are by their nature
> not synchronized?

New type with a shared interface, lock and hacks with casts inside.

>> The need is well recognized and you are not alone in this.
>
> That's good to hear. That is, if I didn't lose your meaning completely :)

There was a lot of confusion, partly because the problem statement was defined as "rant" ;) I've meant here the need for shared containers and/or wrappers around non-shared ones.

>> Example - ImmutableVector vs Vector.
>> ImmutableVector need not to known about things such as capacity, has
>> different kind of range and is a different type in general not just
>> Vector with some auto-magic restrictions bolted on top of it.
>>
>> The same could be said about shared.
>
> ImmutableVector is a strange concept altogether, but I understand what
> you mean here. What I still don't completely understand is the general
> message of "with "shared" everything should be "shared"". How can it be?

For _data_ it must else you allow low-level races.
"Everything" is bit too general a word.

-- 
Dmitry Olshansky
January 25, 2014
25-Jan-2014 02:13, Stanislav Blinov пишет:
> On Friday, 24 January 2014 at 19:54:31 UTC, Dmitry Olshansky wrote:
>> If I understand the current direction is to go with ref T and make
>> some language
>> level/compiler level changes to the escaping reference.
>
> Escaping... where? By "go with ref T" you mean return value of e.g.
> front()?

Say front returns a reference to a first element in an array.
Say user stores &someArray.front as pointer somewhere, and the does
someArray.insert([1,2,3,4,5]);
In case array got resized the pointer is now dangling. Sealed containers were meant to prevent such class of errors entirely.

Other links:
http://forum.dlang.org/thread/bug-5541-3@http.d.puremagic.com/issues/
https://github.com/D-Programming-Language/dmd/pull/1903
>
>>> struct Packet {
>>>     ulong ID;
>>>     ubyte[32] header;
>>>     ubyte[64] data;
>>> }
>>>
>>> Note that I do have arrays in there, but they cannot possibly
>>> introduce any aliasing, since they're static.
>>
>> They can. To copy the thing out of somewhere you need to reach into
>> memory area  that is shared between threads and then all bets are off.
>> //Example:
>> shared Packet[4] buffer;
>
> Aha, I see you responded as you went through my post :) I've got this
> case covered further down. Of course they can this way. But
> instantiating them as "shared" doesn't make sense anyway, exactly
> because they are not "shared"-aware.

It makes sense as in memory being visible from many threads.
E.g. buffer[0].ID is accessible via atomic CAS operation. Same with header and data fields.

>> High level invariants that deal with ownership are not represented in
>> D typesystem and hence exist only in the head of people writing
>> code/comments and better be encapsulated in something manageable.
>
> But they actually are. immutable === doesn't care: share or own, you
> only ever can read it.

> Value type === single owner.

No. You may have many pointers to one shared value type.
And most certainly not '==='.

Anyway I was talking about ownership as in objects that own other objects.
See for example these 2:
http://bartoszmilewski.com/2009/09/22/ownership-systems-against-data-races/
http://bartoszmilewski.com/2009/06/02/race-free-multithreading-ownership/

> Because you can only
> ever give away the value entirely (move() it) or a copy of it (which is
> coincidentally both shallow and deep copy).
> References (any kind
> thereof) === sharing. Granted, this system is lacking (i.e. it'd be nice
> to have unique references). But that's at least a bare base.
>

And that's the center point. D's type system doesn't statically ensure if there are references present or not. And shared just implies "may be referenced from many threads" and local/mutable is "may be referenced from current thread". Not a single thing in the D type system makes sure e.g. that unique value has exactly one reference to it.

See e.g. Rust where authors introduce such notion in the type system.

-- 
Dmitry Olshansky
January 25, 2014
On Saturday, 25 January 2014 at 10:03:55 UTC, Dmitry Olshansky wrote:

>> In fact, the definition above
>> would even fail to compile in this case, because push should take
>> shared(T) and pop should return shared(T) (since Queue will be dealing
>> with a container of shared Ts, right?)
>
> No Queue in your definition of Queue it may take local T no problem at all. It all depends on the contents of pop/push to understand if it compiles.

It can already be inferred from the interface. The Queue uses some Container as storage. Obviously push()/pop() would actually access that storage to store/remove elements. If Container does not support "shared", we're in trouble.

Let's get a little more concrete:

http://dpaste.dzfl.pl/5bd15697

A dastardly simple implementation of Queue. Note that Queue is being designed as shared type, what it uses for storage is its business. However, due to transitivity of "shared", it cannot really use anything *but* other "shared" containers. Therefore, we either need casts or __gshared :(
Note that "T" for Queue is "Packet", bur for Container it's "shared(Packet)".

Well, ok. Let's take a step back and forget about existing containers. I could implement the storage myself, inside the Queue type, right? Something like that:

class Queue(T) {
   struct Node {
       T value;
       Node* prev, next;
   }

   private Node* head, tail;
   // Implementation follows...
}

I'd implement the whole business with doubly-linked list, maybe even employ some interesting algorithm to make the whole thing lock-free... But again, transitivity of "shared" now dictates that Queue can only store shared(T).

Now about this Queue type, it has a very interesting property: it's a shared data structure that never allows more than one thread access to any single stored item at a time. It allows to push() and pop() concurrently (well, ok, this one is blocking). But it does not have random access, in fact, in its public interface it doesn't have anything *but* push() and pop()! Therefore, even though the Queue is a shared type (and instances of it will always be shared), it can *safely* store non-shared data. This does not mean that it *shouldn't* store shared data, it just means that the transitivity of "shared" can be lifted in one particular case: iff !hasUnsharedAliasing!(T). That's the same basis upon which std.concurrency is built. And that's why I went through that whole business with __gshared in my original post :)

>> Where'd that "ref Packet" come from? :)
>
> Well seeing that Packet is such a fat type, I instinctively went for 'ref' :)

Aww, 'tis not fat, just a little overweight :D

>> push() should take an rvalue, pop() should return
>> an rvalue. The fact that pushing thread managed to create an rvalue to
>> push means precisely that it no longer has any aliases to pushed data
>> (remember, we're talking value types here).
>
> If the queue ultimately copies the data to its store it doesn't matter if there are aliases to the original thread's local packet.

Eggz-actly. But this is only true for data types such as that Packet. If it contained pointers or references... no go.

> From the type system point of view there already can't be aliases in the queue as it's shared an the argument/return type is not :)

You don't know that beforehand. Unlike its guts, which dictate that what it stores is in fact shared(T), it's interface does not say anything about T. T is T. Queue could be instantiated as Queue!(shared(FancyType)) too :)

> Given the interface alone it doesn't break anything. The whole "as soon as it gets value" is the only interesting part. There are different hack/tricks that could be used to make the queue go shared --copy-> local by hand.

> What I can deduce is that shared containers in general that return a copy may do a shallow unqual, i.e. mark one "level" of a indirections as not shared since it is a copy.

Almost my thinking. In fact, there's no sense in *copying* shared data anyway. Moving - yes. OTOH, copying *references* to shared data is perfectly fine.

>> But the
>> contents, each individual packet, would *never* be accessed by more than
>> one thread at a time, this is by design in this particular case.
>
> Mmm as I've seen above there is no problem modeling it - that is taking local/mutable argument and put it into shared container.

Yes, but either by using casts, or __gshared :)

> You don't need shared interface for Packets to store them in a shared container. In this case only because of no indirections property of Packets.

That would depend on a container. As you've demostrated earlier, if a container did allow concurrent access to its items, the Packets could provide a world of hurt. The Queue does not allow such things, therefore it's indeed safe.

> Yes, by locking wrapper I meant exactly this. I just have no idea how you'd generally wrap unknown beforehand container otherwise (well there are spinlocks and whatnot, but still locks).

Of course it's not unknown beforehand. I've already mentioned that such queues could be built on top of dlists, other queues, ring buffers, you-name-it. But if those are not "shared", we again come back to all these shennanigans with casts or __gshared.

>> Do I understand you correctly that *everything* down to the lowest level
>> should be shared-enabled?..
>
> No, for many wrapper is enough or even all the boundary of what we could do. What's needed is more containers that are (by their nature) shared (thread-safe). See Java's ConcurrentHashMap.

Whew :)

>> Following that logic, how would you
>> build some shared conatiner on top of D arrays which are by their nature
>> not synchronized?
>
> New type with a shared interface, lock and hacks with casts inside.

Just like the Queue from my original post :D

>> What I still don't completely understand is the general
>> message of "with "shared" everything should be "shared"". How can it be?
>
> For _data_ it must else you allow low-level races.
> "Everything" is bit too general a word.

Hmm... I'll take that into my reply to your next message ;)
January 25, 2014
On Saturday, 25 January 2014 at 11:51:04 UTC, Dmitry Olshansky wrote:

>> Escaping... where? By "go with ref T" you mean return value of e.g.
>> front()?
>
> Say front returns a reference to a first element in an array.
> Say user stores &someArray.front as pointer somewhere, and the does
> someArray.insert([1,2,3,4,5]);
> In case array got resized the pointer is now dangling. Sealed containers were meant to prevent such class of errors entirely.
>
> Other links:
> http://forum.dlang.org/thread/bug-5541-3@http.d.puremagic.com/issues/
> https://github.com/D-Programming-Language/dmd/pull/1903

Aha, got it. Thanks, I'll be looking into that.

>> I've got this case covered further down. Of course they can [introduce aliasing] this way. But
>> instantiating them as "shared" doesn't make sense anyway, exactly
>> because they are not "shared"-aware.
>
> It makes sense as in memory being visible from many threads.
> E.g. buffer[0].ID is accessible via atomic CAS operation. Same with header and data fields.

Yup, but that's not the case with Queue, as I hope is already established in my previous reply.

>> Value type === single owner.
>
> No. You may have many pointers to one shared value type.
> And most certainly not '==='.

Pointers are references. Whatever one thread does with pointers to its own data is its business and has nothing to do with D's "shared" qualifier. Yes, it's sharing too, only it's not meant in the context "shared between threads".
I didn't talk about ownership as in "one object owns another". I was talking about ownership of data in the context of threads. You cannot share a value type with another thread, unless you give it a pointer. And you can't do that in idiomatic D, because you're not allowed to "send" a pointer-to-non-"shared" to another thread.

> Anyway I was talking about ownership as in objects that own other objects.
> See for example these 2:
> http://bartoszmilewski.com/2009/09/22/ownership-systems-against-data-races/
> http://bartoszmilewski.com/2009/06/02/race-free-multithreading-ownership/

I've read that :)

>> Because you can only
>> ever give away the value entirely (move() it) or a copy of it (which is
>> coincidentally both shallow and deep copy).
>> References (any kind
>> thereof) === sharing. Granted, this system is lacking (i.e. it'd be nice
>> to have unique references). But that's at least a bare base.
>>

> And that's the center point. D's type system doesn't statically ensure if there are references present or not. And shared just implies "may be referenced from many threads" and local/mutable is "may be referenced from current thread". Not a single thing in the D type system makes sure e.g. that unique value has exactly one reference to it.

I've been having fun with various approaches to implementing Unique. I have some ideas on where it can be improved, but without language support, this is gruesome indeed.

> See e.g. Rust where authors introduce such notion in the type system.

I've never investigated Rust, never hurts to take a look, I guess...
January 25, 2014
On 1/25/14 3:51 AM, Dmitry Olshansky wrote:
> 25-Jan-2014 02:13, Stanislav Blinov пишет:
>> On Friday, 24 January 2014 at 19:54:31 UTC, Dmitry Olshansky wrote:
>>> If I understand the current direction is to go with ref T and make
>>> some language
>>> level/compiler level changes to the escaping reference.
>>
>> Escaping... where? By "go with ref T" you mean return value of e.g.
>> front()?
>
> Say front returns a reference to a first element in an array.
> Say user stores &someArray.front as pointer somewhere, and the does
> someArray.insert([1,2,3,4,5]);
> In case array got resized the pointer is now dangling. Sealed containers
> were meant to prevent such class of errors entirely.

We plan to disallow taking address of a ref return from a function.

Andrei

January 26, 2014
On Saturday, 25 January 2014 at 18:47:17 UTC, Andrei Alexandrescu wrote:

>
> We plan to disallow taking address of a ref return from a function.
>

What about sneaky cases? E.g. like this:

NullableRef!T nref(T)(ref T v) { return NullableRef!T(&v); }

struct Container(T) {
    //...
    ref inout(T) front() inout { ... }
}

Container!int cnt;

//...

auto n = nref(cnt.front());
« First   ‹ Prev
1 2