Tips on TCP socket to postgresql middleware

Tips on TCP socket to postgresql middleware
Feb 19, 2022 Chris Piker
Feb 20, 2022 eugene
Feb 20, 2022 Chris Piker
Feb 20, 2022 eugene
Feb 21, 2022 Chris Piker
Feb 21, 2022 eugene
Feb 22, 2022 Chris Piker
Feb 23, 2022 eugene
Feb 23, 2022 eugene
Feb 24, 2022 Tejas
Feb 24, 2022 eugene
Feb 24, 2022 eugene
Feb 24, 2022 Tejas
Feb 24, 2022 eugene
Feb 20, 2022 Ali Çehreli
Feb 21, 2022 Chris Piker
Feb 20, 2022 eugene
Feb 21, 2022 Chris Piker
Feb 21, 2022 eugene
Feb 24, 2022 eugene
Feb 24, 2022 eugene
Feb 24, 2022 eugene

February 19, 2022

Posted by Chris Piker

Permalink

Chris Piker

Permalink

Hi D

I'm about to start a small program to whose job is:

Connect to a server over a TCP socket
Read a packetized real-time data stream
Update/insert to a postgresql database as data arrive.

In general it should buffer data in RAM to avoid exerting back pressure on the input socket and to allow for dropped connections to the PG database. Since the data stream is at most 1.5 megabits/sec (not bytes) I can buffer for quite some time before running out of space.

So far, I've never written a multi-threaded D program where one thread writes a FIFO and the other reads it so I'm reading the last few chapters of Ali Cehreli's book as background. On top of that preparation, I'm looking for:

general tips on which libraries to examine
gotchas you've run into in your multi-threaded (or just concurrent) programs,
other resources to consult
etc.

Thanks for any advice you want to share.

Best,

February 20, 2022

Re: Tips on TCP socket to postgresql middleware

Posted by eugene
in reply to Chris Piker

Permalink

eugene

Posted in reply to Chris Piker

Permalink

On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:

general tips on which libraries to examine

Most people will probably say this is crazy,
but as to PG, one can do without libraries.

I am doing so during years (in C, not D) and
did not expierienced extremely complex troubles.
I mean I do not use libpq - instead I implement some
subset of the protocol, which is needed for particular program.

What I do not like in all these libs for working
with widely used services (postgres, redis etc) is
the fact that they all hide inside them i/o stuff,
including TCP-connect.

Why have connect() in each library?
It is universal thing, as well as read() and write().
If I want several connection to DBMS in a program,
libraries like libpq compel me to use multithreading.

But what if I want to do many-many-many things concurrently
in a single thread?

Usually I design more or less complex (network) programs using
event-driven paradigm (reactor pattern) plus state machines.
In other words programs designed this way are, so to say,
hierarchical team of state machines, interacting with
each other as well as with outer world (signals,
timers, events from sockets etc)

February 20, 2022

Re: Tips on TCP socket to postgresql middleware

Posted by Chris Piker
in reply to eugene

Permalink

Chris Piker

Posted in reply to eugene

Permalink

On Sunday, 20 February 2022 at 15:20:17 UTC, eugene wrote:

Most people will probably say this is crazy,
but as to PG, one can do without libraries.

Very interesting. I need to stand-up this program and two others in one week, so it looks like dpq2 and message passing is the good short term solution to reduce implementation effort. But I would like to return to your idea in a couple months so that I can try a fiber based implementation instead.

It sounds like you might have a rigorous way of defining and keeping track of your state machines. I could probably learn quite a bit from reading your source code, or the source for similarly implemented programs. Are there examples you would recommend?

February 20, 2022

Re: Tips on TCP socket to postgresql middleware

Posted by Ali Çehreli
in reply to Chris Piker

Permalink

Ali Çehreli

Posted in reply to Chris Piker

Permalink

On 2/19/22 12:13, Chris Piker wrote:

>    * gotchas you've run into in your multi-threaded (or just concurrent)
> programs,

I use the exact scenario that you describe: Multiple threads process data and pass the results to a "writer" thread that persist it in a file.

The main gotcha is your thread disappearing without a trace. The most common reason is it throws an exception and dies.

Another one is to set the message box sizes to throttle. Otherwise, producers could produce more than the available memory before the consumer could consume it. Unlike the main thread, there is nobody to catch an report this "uncaught" exception.

  https://dlang.org/phobos/std_concurrency.html#.setMaxMailboxSize

You need to experiment with different number of threads, the buffer size that you mention, different lengths of message boxes, etc. For example, I could not gain more benefit in my program beyond 3 threads (but still set the number to 4 :p Humans are crazy.).

In case you haven't seen yet, the recipe for std.concurrency that works for me is summarized here:

  https://www.youtube.com/watch?v=dRORNQIB2wA&t=1735s

Ali

February 20, 2022

Re: Tips on TCP socket to postgresql middleware

Posted by eugene
in reply to Chris Piker

Permalink

eugene

Posted in reply to Chris Piker

Permalink

On Sunday, 20 February 2022 at 16:55:44 UTC, Chris Piker wrote:

But I would like to return to your idea in a couple months so that I can try a fiber based implementation instead.

I thougt about implementing my engine using fibers but...
it seemed to me they are not very convinient because
coroutines yield returns to the caller, but I want
to return to a single event loop (after processing an event).

> >

Yes, here is my engine with example (echo client/server pair):

edsm = 'event driven state machines'

As to the program you are writing - I wrote a couple of dozens of programs
more or less similar to what you are going to do (data acqusition) using the engine above (C) for production systems and they all serve very well.

February 20, 2022

Re: Tips on TCP socket to postgresql middleware

Posted by eugene
in reply to Chris Piker

Permalink

eugene

Posted in reply to Chris Piker

Permalink

On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:

In general it should buffer data in RAM to avoid exerting back pressure on the input socket and to allow for dropped connections to the PG database.

If I get it right you want to restore connection
if it was closed by server for some reason.

I use special SM for that purpose, see this picture

In each state where this SM has to send/recv data, it takes
sending/receiving SM from a pool and commands them to
perform the task. Upon reaching IDLE state this machine
send some messsge to the user (another SM) of the connection
and seats in this state until the user detects connection lost
(in which case it sends M2 to DB-LINK SM). Then DB-LINK
goes to WAIT state, where it starts a timer and when it expires,
it goes to CONN state, where it tries to reconnect (using
sending SM - when connection is ready we get POLLOUT on socket).

You can have as many such connectors as you want,
so you have multiple connections within single thread.

I often use two connections, one for perform main task
(upload some data and alike) and the second for getting
notifications from PG, 'cause it very incovinient to
do both in a single connection.

February 21, 2022

Re: Tips on TCP socket to postgresql middleware

Posted by Chris Piker
in reply to Ali Çehreli

Permalink

Chris Piker

Posted in reply to Ali Çehreli

Permalink

On Sunday, 20 February 2022 at 17:58:41 UTC, Ali Çehreli wrote:

> Another one is to set the message box sizes to throttle.

Message sizes and rates are relatively well know so it will be
easy to pick a throttle point that's unlikely to backup the
source yet provide for some quick DB maintenance in the middle
of a testing session.

> In case you haven't seen yet, the recipe for std.concurrency that works for me is summarized here:
>
>   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1735s

Thanks!  I like your simple exception wrapping pattern, will use that.

February 21, 2022

Re: Tips on TCP socket to postgresql middleware

Posted by Chris Piker
in reply to eugene

Permalink

Chris Piker

Posted in reply to eugene

Permalink

On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:

Yes, here is my engine with example (echo client/server pair):

In D (for Linux & FreeBSD)

The code is terse and clean, thanks for sharing :) I'm adverse to reading it closely since there was no license file and don't want to accidentally violate copyright.

I noticed there were no dub files in the package. Not surprised. Dub is such a restrictive tool compared to say, setup.py/.cfg in python.

February 21, 2022

Re: Tips on TCP socket to postgresql middleware

Posted by Chris Piker
in reply to eugene

Permalink

Chris Piker

Posted in reply to eugene

Permalink

On Sunday, 20 February 2022 at 18:36:21 UTC, eugene wrote:

I often use two connections, one for perform main task
(upload some data and alike) and the second for getting
notifications from PG, 'cause it very incovinient to
do both in a single connection.

Ah, a very handy tip. It would be convoluted to multiplex notifications
on the data connection.

February 21, 2022

Re: Tips on TCP socket to postgresql middleware

Posted by eugene
in reply to Chris Piker

Permalink

eugene

Posted in reply to Chris Piker

Permalink

On Monday, 21 February 2022 at 04:48:56 UTC, Chris Piker wrote:

On Sunday, 20 February 2022 at 18:36:21 UTC, eugene wrote:

I often use two connections, one for perform main task
(upload some data and alike) and the second for getting
notifications from PG, 'cause it very incovinient to
do both in a single connection.

Ah, a very handy tip. It would be convoluted to multiplex notifications
on the data connection.

I am remembering psql client behavior - it sees notifications only after some request. It is really inconvinient to perform regular tasks and be ready to peek up notifications at any moment in one connection.

Top | Forum index | About this forum

Forums