Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc? (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc? (page 4)

March 08, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Andrei Alexandrescu
in reply to Russel Winder

Andrei Alexandrescu

Posted in reply to Russel Winder

On 3/8/14, 3:22 AM, Russel Winder wrote:
> Dataflow is though where "Big Data" is going. There are commercial
> offerings in the JVM space and they are making huge profits on
> licencing, simply because the frameworks work.

Do you have a couple of relevant links describing dataflow?

Andrei

March 08, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Sean Kelly
in reply to logicchains

Sean Kelly

Posted in reply to logicchains

On Saturday, 8 March 2014 at 12:13:07 UTC, logicchains wrote:
> On Saturday, 8 March 2014 at 11:23:17 UTC, Russel Winder wrote:
>> I guess D could be said to have actors already using spawn and the
>> message queue.
>
> In std.concurrency, the documentation states that: "Right now, only in-process threads are supported and referenced by a more specialized handle called a Tid. It is effectively a subclass of Cid, with additional features specific to in-process messaging". Is there any timeline on when out-process threads will be supported? I think that would bring D closer to being able to achieve Erlang style concurrency.

There's already a pull request in place to support green threads. If you mean IPC, we really need serialization first, and it would be nice to have a decent network API as well. But I've been meaning to sort out a prototype anyway. Tid will remain the reference to a thread regardless of which process it lives in, and I'll be adding a Node type.

March 08, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Russel Winder
in reply to Andrei Alexandrescu

Russel Winder

Posted in reply to Andrei Alexandrescu

Attachments:

signature.asc (This is a digitally signed message part)

On Sat, 2014-03-08 at 08:53 -0800, Andrei Alexandrescu wrote:
> On 3/8/14, 3:22 AM, Russel Winder wrote:
> > Dataflow is though where "Big Data" is going. There are commercial offerings in the JVM space and they are making huge profits on licencing, simply because the frameworks work.
> 
> Do you have a couple of relevant links describing dataflow?

First and foremost we have to distinguish dataflow software architectures from dataflow computers. The latter were an alternate hardware architecture that failed to gain traction, but there is an awful lot of literature out there on it. So just searching the Web is likely to give an lot of that especially in the period 1980 to 1995.

The dataflow software architectures are modelled directly on the structural concepts of dataflow hardware and so the terminology is exactly the same. However whereas an operator in hardware mean add, multiply, etc. in a software architecture it just means some sequential computation that requires certain inputs and delivers some outputs. The computation must be a process, so effectively a function with no free variables.

The GPars version of this is at:

http://gpars.codehaus.org/Dataflow http://gpars.org/guide/guide/dataflow.html

GPars needs some more work, but I haven't had chance to focus on it recently.

This introduces all the cute jargon:

http://www.cs.colostate.edu/cameron/dataflow.html

Wikipedia has this page:

http://en.wikipedia.org/wiki/Dataflow_programming

but it is clearly in need of some sub-editing.

Hopefully this does as a start. I can try hunt up some other things if that would help.

The commercial offering I know something of is called DataRush, it's a product from a subgroup in the Pervasive group for the JVM (and optionally Hadoop):

http://en.wikipedia.org/wiki/DataRush_Technology

I played with this in 2008 before it was formally released, and on and off since. GPars dataflow should compete with this but they are a company with resources, and GPars has two fairly non-active (due to work commitments) volunteer developers. We had been hoping the fact that GPars is core Groovy technology required for Grails and allt he other Gr8 technology, that people would step up. However the very concept of a concurrency and parallelism framework seems to frighten off even some of the best programmers.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

March 12, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Sönke Ludwig
in reply to Bienlein

Sönke Ludwig

Posted in reply to Bienlein

Am 07.03.2014 22:11, schrieb Bienlein:
>> One question - doesn't Vibe.d already use green threads?
>
> What they are saying on their web site is that they are using fibers and
> at the same time they say they are using libevent. That is confusing for
> me. On http://vibed.org/features they write: "Instead of using classic
> blocking I/O together with multi-threading for doing parallel network
> and file operations, all operations are using asynchronous operating
> system APIs. By default, >>libevent<< is used to access these APIs
> operating system independently."
>
> Further up on the same page they write: "The approach of vibe.d is to
> use asynchronous I/O under the hood, but at the same time make it seem
> as if all operations were synchronous and blocking, just like ordinary
> I/O. What makes this possible is D's support for so called >>fibers<<".
>
>> It does. Bienlein has a very vague knowledge of topics he
>> comments about.
>
> I thought the vibe.d guys would shed some light at this at the occasion,
> but no luck. What I don't understand is how fibers can listen to input
> that comes in through connections they hold on to. AFAIKS, a fiber only
> becomes active when it's call method is called. So who calls the call
> method in case a connection becomes active? That is then again a kernel
> thread? How does the kernel thread know something arrived through a
> connection? It can't do a blocking wait as the system would run out of
> kernel threads very quickly.

Sorry, I've been busy with some non-programming business over the past days and didn't have a chance to reply. Making a small article about the internal workings of the task/fiber system is planned for a long time now, but there are so many items with higher priority that it unfortunately hasn't happened so far. See my reply [1] in the other thread for a rough outline.

>> I think what Go and Erlang do is to use green threads (or equivalent,
>> goroutines in Go) for the applications side and a kernel thread pool
>> within the runtime doing "work stealing" on the green threads. This is
>> more or less (ish) what the Java Fork/Join framework of Doug Lea does as
>> well.
>
> When in Go a channel runs empty the scheduler detaches the thread that
> served it and attaches it to a non-empty channel. In Go all this is in
> the language and the runtime where it can be done more efficiently than
> in a library. AFAIU, this is a main selling point in Go.

I actually don't see a reason why it can't be just as efficient when done as a library. Taking the example of vibe.d, fibers are currently never moved between threads (although technically, they could), but they are still stored in a free list and reused for later tasks. There is not much more overhead than a few variable assignments and the fiber context switches.

>
>> Vert.x is caliming to be able to handle millions of active connections.
>
> All right, as you can't have millions of threads on the JVM they must do
> that through some asynchronous approach (I guess Java NewIO). I read
> that an asynchronous solution is not as fast as one with many blocking
> threads as in Go or Erlang. I don't understand why. It was just claimed
> that this were the case.

AFAIK they use a combination of callback based asynchronous I/O (mostly for server applications) combined with a thread pool for parallelizing synchronous I/O (mostly for client type applications/tasks). So it's basically a hybrid system that still makes a lot of trade-offs between performance and comfort. Disclaimer: this statement is based only on looking at a few examples and maybe a bog post, I don't have any first hand experience with vert.x.

March 12, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Sönke Ludwig
in reply to Sean Kelly

Sönke Ludwig

Posted in reply to Sean Kelly

Am 07.03.2014 23:29, schrieb Sean Kelly:
> On Friday, 7 March 2014 at 18:58:18 UTC, Russel Winder wrote:
>> On Fri, 2014-03-07 at 16:53 +0000, Sean Kelly wrote:
>> […]
>>> 68K connections is nothing. I'll start getting interested when his
>>> benchmarks are 200K+.  Event-based systems in C can handle millions
>>> of concurrent connections if implemented properly. I'd like to
>>> believe vibe.d can approach this as well.
>>
>> There used to be a 100k problem, i.e maintaining more than 100k active,
>> that means regularly causing traffic, not just being dormant for a few
>> centuries, but so many frameworks can now support that , that it has
>> become a non-metric. I don't know if Spring, JavaEE, can handle this but
>> on the JVM Vert.x certainly, I suspect Node.js can as well. Vert.x is
>> caliming to be able to handle millions of active connections.
>>
>> I suspect it is now at the stage that the OS is the bottle neck not the
>> language of the framework.
>
> I think the biggest issue at very large number of connections is memory
> use. In fact, I don't expect even vibe.d to scale beyond a few hundred K
> if it allocates a fiber per connection. It would have to use a free list
> of fibers and make a top-level read effectively release the current
> fiber into the free list. Scaling at this level in C generally meant
> retaining little to no state per connection basically by necessity.

A free list is already used for fibers actually. Each fiber can be reused for any number of "tasks". This is also why `Fiber` as a type doesn't occur in the public API, but rather the `Task` struct, which internally points to a fiber + a task ID.

But since the memory pages of a fiber's stack are allocated lazily, at least on a 64-bit OS, where address space is not an issue, you can actually scale to very high numbers with a decent amount of RAM. Certainly you don't need to have the amount of RAM that the typical dedicated server for such tasks would have.

Having said that, it may be an interesting idea to offer a callback based overload of waitForData(), so that you can do something like this:

	listenTCP(port, &onConnection);

	void onConnection(TCPConnection conn)
	{
		conn.waitForData(&onData);
		// return (exits the task and puts the fiber
		// into the free list)
	}

	void onData(TCPConnection conn)
	{
		// onData gets called as a new task, so that no fiber is
		// occupied between the wait and the read calls
		conn.read(...);
	}

March 12, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Bienlein
in reply to Sönke Ludwig

Bienlein

Posted in reply to Sönke Ludwig

On Wednesday, 12 March 2014 at 09:26:28 UTC, Sönke Ludwig wrote:

> I actually don't see a reason why it can't be just as efficient when done as a library. Taking the example of vibe.d, fibers are currently never moved between threads (although technically, they could), but they are still stored in a free list and reused for later tasks.

I believe several kernel threads are in the play to call fibers. Then the free list must be synchronized which can make a difference on a heavy loaded system at the end of the day. HawtDispatch (http://hawtdispatch.fusesource.org) applies some tricks to reduce synchronization on its free lists for that reason. But I honestly don't have a clue how that exactly works.

March 12, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Etienne
in reply to Bienlein

Etienne

Posted in reply to Bienlein

On Wednesday, 12 March 2014 at 12:10:04 UTC, Bienlein wrote:
> On Wednesday, 12 March 2014 at 09:26:28 UTC, Sönke Ludwig wrote:
>
>> I actually don't see a reason why it can't be just as efficient when done as a library. Taking the example of vibe.d, fibers are currently never moved between threads (although technically, they could), but they are still stored in a free list and reused for later tasks.
>
> I believe several kernel threads are in the play to call fibers. Then the free list must be synchronized which can make a difference on a heavy loaded system at the end of the day. HawtDispatch (http://hawtdispatch.fusesource.org) applies some tricks to reduce synchronization on its free lists for that reason. But I honestly don't have a clue how that exactly works.

Bypassing the kernel could be more efficient for fibers if it were possible, and using thread affinity it could remove some interruption by setting the maxcpus option in the kernel. The alternative to locking via kernel is queuing using the freeway overpass method described here: http://blog.erratasec.com/2013/02/multi-core-scaling-its-not-multi.html I think HawtDispatch may be using queues to fit into this synchronization method. Snort is also a good example of mostly lock-less multi-core by using "memory mapped regions"

I'm also very interested in optimizing fibers further as it would give D excellence where it already does great

March 12, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Etienne
in reply to Etienne

Etienne

Posted in reply to Etienne

On Wednesday, 12 March 2014 at 15:11:45 UTC, Etienne wrote:
> On Wednesday, 12 March 2014 at 12:10:04 UTC, Bienlein wrote:
>> On Wednesday, 12 March 2014 at 09:26:28 UTC, Sönke Ludwig wrote:
>>
>>> I actually don't see a reason why it can't be just as efficient when done as a library. Taking the example of vibe.d, fibers are currently never moved between threads (although technically, they could), but they are still stored in a free list and reused for later tasks.
>>
>> I believe several kernel threads are in the play to call fibers. Then the free list must be synchronized which can make a difference on a heavy loaded system at the end of the day. HawtDispatch (http://hawtdispatch.fusesource.org) applies some tricks to reduce synchronization on its free lists for that reason. But I honestly don't have a clue how that exactly works.
>
> Bypassing the kernel could be more efficient for fibers if it were possible, and using thread affinity it could remove some interruption by setting the maxcpus option in the kernel. The alternative to locking via kernel is queuing using the freeway overpass method described here: http://blog.erratasec.com/2013/02/multi-core-scaling-its-not-multi.html I think HawtDispatch may be using queues to fit into this synchronization method. Snort is also a good example of mostly lock-less multi-core by using "memory mapped regions"
>
> I'm also very interested in optimizing fibers further as it would give D excellence where it already does great

I think this article puts it well. Bypassing the kernel for fibers should be a long-term plan :)

http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html

March 12, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Iain Buclaw
in reply to Etienne

Iain Buclaw

Posted in reply to Etienne

On 12 March 2014 18:05, Etienne <etcimon@gmail.com> wrote:
> On Wednesday, 12 March 2014 at 15:11:45 UTC, Etienne wrote:
>>
>> On Wednesday, 12 March 2014 at 12:10:04 UTC, Bienlein wrote:
>>>
>>> On Wednesday, 12 March 2014 at 09:26:28 UTC, Sönke Ludwig wrote:
>>>
>>>> I actually don't see a reason why it can't be just as efficient when
>>>> done as a library. Taking the example of vibe.d, fibers are currently never
>>>> moved between threads (although technically, they could), but they are still
>>>> stored in a free list and reused for later tasks.
>>>
>>>
>>> I believe several kernel threads are in the play to call fibers. Then the free list must be synchronized which can make a difference on a heavy loaded system at the end of the day. HawtDispatch (http://hawtdispatch.fusesource.org) applies some tricks to reduce synchronization on its free lists for that reason. But I honestly don't have a clue how that exactly works.
>>
>>
>> Bypassing the kernel could be more efficient for fibers if it were possible, and using thread affinity it could remove some interruption by setting the maxcpus option in the kernel. The alternative to locking via kernel is queuing using the freeway overpass method described here: http://blog.erratasec.com/2013/02/multi-core-scaling-its-not-multi.html I think HawtDispatch may be using queues to fit into this synchronization method. Snort is also a good example of mostly lock-less multi-core by using "memory mapped regions"
>>
>> I'm also very interested in optimizing fibers further as it would give D excellence where it already does great
>
>
> I think this article puts it well. Bypassing the kernel for fibers should be a long-term plan :)
>


Not just fibers, but the entire synchronisation stack - which is currently just a wrap around pthreads/winthreads.

March 13, 2014

Re: Remember that Go vs D MQTT thing and how we wondered about dmd vs gdc?

Posted by Dicebot
in reply to Etienne

Dicebot

Posted in reply to Etienne

On Wednesday, 12 March 2014 at 18:05:38 UTC, Etienne wrote:
> I think this article puts it well. Bypassing the kernel for fibers should be a long-term plan :)
>
> http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html

I have seen one real-world project where it was done. Point is not about specifically fibers though but scheduling as a whole - when all resources of the system are supposed to be devoted to a single service, general-purpose OS scheduling creates problems as it is intended for universal multi-tasking.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation