Thread overview
std.socket - problems closing socket
Sep 30, 2011
simendsjo
Oct 03, 2011
Regan Heath
Oct 03, 2011
simendsjo
Oct 03, 2011
Regan Heath
Oct 03, 2011
simendsjo
Oct 03, 2011
Regan Heath
Oct 03, 2011
simendsjo
Oct 03, 2011
simendsjo
Oct 04, 2011
Regan Heath
Oct 03, 2011
Regan Heath
September 30, 2011
Not sure if this is a problem with std.socket, nginx or my knowledge of sockets. I'm pretty sure it's the last one.

I'm experimenting with fastcgi on nginx, and the socket stays in TIME_WAIT even after I call
  socket.shutdown(SocketShutdown.BOTH);
  socket.close();

(Crossposted from SO: http://stackoverflow.com/questions/7616601/nginx-fastcgi-and-open-sockets)
October 03, 2011
On Sat, 01 Oct 2011 00:26:35 +0100, simendsjo <simendsjo@gmail.com> wrote:

> Not sure if this is a problem with std.socket, nginx or my knowledge of sockets. I'm pretty sure it's the last one.
>
> I'm experimenting with fastcgi on nginx, and the socket stays in TIME_WAIT even after I call
>    socket.shutdown(SocketShutdown.BOTH);
>    socket.close();
>
> (Crossposted from SO: http://stackoverflow.com/questions/7616601/nginx-fastcgi-and-open-sockets)

For a "graceful" close you're supposed to ensure there is no data pending.  To do that you:

shutdown(SD_SEND);  // send only, not recv
<enter a loop reading all data remaining on the socket>
close();

The loop should read until recv returns 0.  If recv returns -1 and the socket is blocking it should error/exit.  If recv returns -1 and the socket is non-blocking it should check for [WSA]EWOULDBLOCK (and select/sleep + loop) or error/exit.

The reason to do this is to flush all the data from the socket buffers on the remote and local ends, otherwise a close can cause remote buffered data to cause a "connection broken" error on the remote end, and/or (I am guessing a little here) may cause the socket to close while negotiating a graceful close, and/or remain in a TIME_WAIT state due to buffered data or data "in flight".

.. are you setting any close options/timeouts i.e. LINGER?

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 03, 2011
On 03.10.2011 11:36, Regan Heath wrote:
> For a "graceful" close you're supposed to ensure there is no data
> pending.  To do that you:
>
> shutdown(SD_SEND);  // send only, not recv
> <enter a loop reading all data remaining on the socket>
> close();
>
> The loop should read until recv returns 0.  If recv returns -1 and the
> socket is blocking it should error/exit.  If recv returns -1 and the
> socket is non-blocking it should check for [WSA]EWOULDBLOCK (and
> select/sleep + loop) or error/exit.
>
> The reason to do this is to flush all the data from the socket buffers
> on the remote and local ends, otherwise a close can cause remote
> buffered data to cause a "connection broken" error on the remote end,
> and/or (I am guessing a little here) may cause the socket to close while
> negotiating a graceful close, and/or remain in a TIME_WAIT state due to
> buffered data or data "in flight".
>
> ... are you setting any close options/timeouts i.e. LINGER?

Thanks.

recv returns -1 for many requests. The errors are only WSAECONNABORTED and WSAECONNRESET as described here: http://msdn.microsoft.com/en-us/library/ms740668.aspx

I'm doing socket.shutdown(SocketShutdown.SEND) now after sending all my data and reading until I receive 0 or -1. (doesn't really matter as sending the FastCGI EndRequest makes the server shut it down as it doesn't handle multiplexing)

I have tried with linger too, but it doesn't help:
socket.setOption(SocketOptionLevel.SOCKET, SocketOption.LINGER, std.socket.linger(1, 30));

Could this be caused by some bad settings on the webserver?

PS: Seems my computer can handle about 16000 TIME_WAIT before it starts "hanging".
October 03, 2011
On Mon, 03 Oct 2011 12:57:56 +0100, simendsjo <simendsjo@gmail.com> wrote:

> On 03.10.2011 11:36, Regan Heath wrote:
>> For a "graceful" close you're supposed to ensure there is no data
>> pending.  To do that you:
>>
>> shutdown(SD_SEND);  // send only, not recv
>> <enter a loop reading all data remaining on the socket>
>> close();
>>
>> The loop should read until recv returns 0.  If recv returns -1 and the
>> socket is blocking it should error/exit.  If recv returns -1 and the
>> socket is non-blocking it should check for [WSA]EWOULDBLOCK (and
>> select/sleep + loop) or error/exit.
>>
>> The reason to do this is to flush all the data from the socket buffers
>> on the remote and local ends, otherwise a close can cause remote
>> buffered data to cause a "connection broken" error on the remote end,
>> and/or (I am guessing a little here) may cause the socket to close while
>> negotiating a graceful close, and/or remain in a TIME_WAIT state due to
>> buffered data or data "in flight".
>>
>> ... are you setting any close options/timeouts i.e. LINGER?
>
> Thanks.

:)

> recv returns -1 for many requests. The errors are only WSAECONNABORTED and WSAECONNRESET as described here: http://msdn.microsoft.com/en-us/library/ms740668.aspx

To help me understand (I know nothing about fastcgi or nginx) can you clarify...
1. Your D code is the client side, connecting to the web server and sending GET/POST style requests?
2. You get these ABORTED and RESET errors on the client side?
3. As #3 even after doing as I described, shutdown(SEND), recv, then close?

If yes to all the above, then it sounds like the web server/fastcgi is closing the socket without reading all the data you're sending, which probably means you're sending something it's not expecting.  I would start by verifying exactly what data you're sending, and that it's all expected by the remote end.

> I'm doing socket.shutdown(SocketShutdown.SEND) now after sending all my data and reading until I receive 0 or -1. (doesn't really matter as sending the FastCGI EndRequest makes the server shut it down as it doesn't handle multiplexing)

So, the socket closure is initiated by fastcgi/the web server.  This supports the theory that it's not reading some of your data, because it's not expecting it, and this is likely the cause of the ABORT/RESET errors you're seeing.

> I have tried with linger too, but it doesn't help:
> socket.setOption(SocketOptionLevel.SOCKET, SocketOption.LINGER, std.socket.linger(1, 30));

The default LINGER options should be fine, as-is.  But, double check the D socket code just in case it is setting different LINGER options by default (I haven't used it, or looked myself, sorry).

> Could this be caused by some bad settings on the webserver?

It is possible, but I would double check your requests first.  There may be a setting, or settings for aborting connections which take too long, or fail to send certain data, or connect from the wrong IP, or...  If your requests are otherwise working, then I suspect you're sending some 'extra' data which is not being read.

> PS: Seems my computer can handle about 16000 TIME_WAIT before it starts "hanging".

You'll be running out of operating system handles or similar at that point :p

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 03, 2011
This might be a useful read..
http://msdn.microsoft.com/en-us/library/windows/desktop/ms738547(v=vs.85).aspx
October 03, 2011
On 03.10.2011 16:16, Regan Heath wrote:
> On Mon, 03 Oct 2011 12:57:56 +0100, simendsjo <simendsjo@gmail.com> wrote:
(...)
> To help me understand (I know nothing about fastcgi or nginx) can you
> clarify...
> 1. Your D code is the client side, connecting to the web server and
> sending GET/POST style requests?
> 2. You get these ABORTED and RESET errors on the client side?
> 3. As #3 even after doing as I described, shutdown(SEND), recv, then close?
>
> If yes to all the above, then it sounds like the web server/fastcgi is
> closing the socket without reading all the data you're sending, which
> probably means you're sending something it's not expecting. I would
> start by verifying exactly what data you're sending, and that it's all
> expected by the remote end.


Yes. I've coded the client as follows:
1) start listening socket
2) wait for incoming connections or incoming data
3) receive(). If a socket returns 0 or -1, close it and process next with data
4) read fastcgi request from server
5) write fastcgi response
6) write fastcgi EndRequest (the server should now end the request)
7) if the application should close the request, send shutdown(send)
8) accept incoming connection
9) back to 2)

FastCGI connections works in one of two ways: the server is responsible for closing the connections (supports mulitplexing) or the application should close the connection after a request has been sent.
For the latter I send SocketShutdown.SEND after writing EndRequest in step 6), but it doesn't really matter as nginx doesn't support multiplexing. It closes the connection after each request anyway. I see the same result no matter what option I use.

I'm running the exact same request and writing the exact same response for all queries, so there shouldn't be any unknown fields.

I also only get an error on <1/5 of the requests, and even when the error occurs, the response has been written completely to the browser.


>> I'm doing socket.shutdown(SocketShutdown.SEND) now after sending all
>> my data and reading until I receive 0 or -1. (doesn't really matter as
>> sending the FastCGI EndRequest makes the server shut it down as it
>> doesn't handle multiplexing)
>
> So, the socket closure is initiated by fastcgi/the web server. This
> supports the theory that it's not reading some of your data, because
> it's not expecting it, and this is likely the cause of the ABORT/RESET
> errors you're seeing.
>
>> I have tried with linger too, but it doesn't help:
>> socket.setOption(SocketOptionLevel.SOCKET, SocketOption.LINGER,
>> std.socket.linger(1, 30));
>
> The default LINGER options should be fine, as-is. But, double check the
> D socket code just in case it is setting different LINGER options by
> default (I haven't used it, or looked myself, sorry).


Linger is default off, but it doesn't help to turn it on. RCV/SNDTIMO is also set to 0.


>> Could this be caused by some bad settings on the webserver?
>
> It is possible, but I would double check your requests first. There may
> be a setting, or settings for aborting connections which take too long,
> or fail to send certain data, or connect from the wrong IP, or... If
> your requests are otherwise working, then I suspect you're sending some
> 'extra' data which is not being read.


The requests are handled in ~1msec, so there shouldn't be any timeouts. The default timeout on nginx for fastcgi is 60 seconds too.
I can easily process ~200 requests per second (and nginx and my server doesn't break a sweat, it's my curl spammers that's using all the cpu)


>> PS: Seems my computer can handle about 16000 TIME_WAIT before it
>> starts "hanging".
>
> You'll be running out of operating system handles or similar at that
> point :p
>

Yup. I'll probably never have that problem in a production environment though :)
October 03, 2011
On Mon, 03 Oct 2011 17:33:57 +0100, simendsjo <simendsjo@gmail.com> wrote:
> Yes. I've coded the client as follows:
> 1) start listening socket
> 2) wait for incoming connections or incoming data
> 3) receive(). If a socket returns 0 or -1, close it and process next with data
> 4) read fastcgi request from server
> 5) write fastcgi response
> 6) write fastcgi EndRequest (the server should now end the request)
> 7) if the application should close the request, send shutdown(send)
> 8) accept incoming connection
> 9) back to 2)
>
> FastCGI connections works in one of two ways: the server is responsible for closing the connections (supports mulitplexing) or the application should close the connection after a request has been sent.
> For the latter I send SocketShutdown.SEND after writing EndRequest in step 6), but it doesn't really matter as nginx doesn't support multiplexing. It closes the connection after each request anyway. I see the same result no matter what option I use.

Ok, so your "client" (that you have coded) is also the "application" you refer to in the bit about FastCGI above?  Or are there 2 components here, and are both written in D?

Does the fastcgi "EndRequest" close the socket/connection?  If so, doing a socket.Shutdown /after/ this is not going to work as the socket has already been closed (which implicitly does a shutdown(BOTH)).  In that case, try doing the shutdown /before/ the EndRequest, and make sure you also read any/all data remaining on the socket before doing the EndRequest/close.

The key question seems to be, at which point does nginx close the connection?  and therefore, is there any unread data on the socket (at either end) when it does.  If, for example, it flushes the response to the other end, but does not wait for it to be read, and closes the socket, you will get CONNRESET/ABORTED errors on the other end.

> I'm running the exact same request and writing the exact same response for all queries, so there shouldn't be any unknown fields.

I didn't mean unknown "field" I mean extra data of any kind, but I suspect you're using an API to form the requests etc so this is probably not the case.

> I also only get an error on <1/5 of the requests, and even when the error occurs, the response has been written completely to the browser.

Ahh, ok, I believe the problem is simply the timing of the 'close/EndRequest'.  Sometimes it happens /before/ the data has been completely read (1/5), other times after (4/5).

R
October 03, 2011
On 03.10.2011 20:02, Regan Heath wrote:
> On Mon, 03 Oct 2011 17:33:57 +0100, simendsjo <simendsjo@gmail.com> wrote:
>> Yes. I've coded the client as follows:
>> 1) start listening socket
>> 2) wait for incoming connections or incoming data
>> 3) receive(). If a socket returns 0 or -1, close it and process next
>> with data
>> 4) read fastcgi request from server
>> 5) write fastcgi response
>> 6) write fastcgi EndRequest (the server should now end the request)
>> 7) if the application should close the request, send shutdown(send)
>> 8) accept incoming connection
>> 9) back to 2)
>>
>> FastCGI connections works in one of two ways: the server is
>> responsible for closing the connections (supports mulitplexing) or the
>> application should close the connection after a request has been sent.
>> For the latter I send SocketShutdown.SEND after writing EndRequest in
>> step 6), but it doesn't really matter as nginx doesn't support
>> multiplexing. It closes the connection after each request anyway. I
>> see the same result no matter what option I use.
>
> Ok, so your "client" (that you have coded) is also the "application" you
> refer to in the bit about FastCGI above? Or are there 2 components here,
> and are both written in D?


It's just one component to handle FastCGI requests. I didn't want to rely on the external libfcgi.


> Does the fastcgi "EndRequest" close the socket/connection? If so, doing
> a socket.Shutdown /after/ this is not going to work as the socket has
> already been closed (which implicitly does a shutdown(BOTH)). In that
> case, try doing the shutdown /before/ the EndRequest, and make sure you
> also read any/all data remaining on the socket before doing the
> EndRequest/close.


EndRequest doesn't really close the socket, it's just a message to the server telling that the full response is written (request handled). If the server (nginx) is responsible, it can reuse the connection to give other requests. If the server says that the application is responsible, shutdown(send) is called. This is part of the specification.


> The key question seems to be, at which point does nginx close the
> connection? and therefore, is there any unread data on the socket (at
> either end) when it does. If, for example, it flushes the response to
> the other end, but does not wait for it to be read, and closes the
> socket, you will get CONNRESET/ABORTED errors on the other end.
>
>> I'm running the exact same request and writing the exact same response
>> for all queries, so there shouldn't be any unknown fields.
>
> I didn't mean unknown "field" I mean extra data of any kind, but I
> suspect you're using an API to form the requests etc so this is probably
> not the case.
>
>> I also only get an error on <1/5 of the requests, and even when the
>> error occurs, the response has been written completely to the browser.
>
> Ahh, ok, I believe the problem is simply the timing of the
> 'close/EndRequest'. Sometimes it happens /before/ the data has been
> completely read (1/5), other times after (4/5).
>
> R


It seems nginx is to blame here, and not me. I tried Lighttp and it works. It gives several EWOULDBLOCK, but I can just handle these again with no problem. I should have tried this sooner... I've used a lot of time trying to track down these problems :|

Thanks for all your help - I'll update this thread if I find a solution to the nginx issue.
October 03, 2011
On 03.10.2011 20:41, simendsjo wrote:
>
> It seems nginx is to blame here, and not me. I tried Lighttp and it
> works. It gives several EWOULDBLOCK, but I can just handle these again
> with no problem. I should have tried this sooner... I've used a lot of
> time trying to track down these problems :|
>
> Thanks for all your help - I'll update this thread if I find a solution
> to the nginx issue.

Well, that was quick... Seems I was running a development version of nginx. I downloaded the stable version, and things work as expected - I can finally try to get some actual coding done :)
October 04, 2011
On Mon, 03 Oct 2011 19:41:14 +0100, simendsjo <simendsjo@gmail.com> wrote:
> It seems nginx is to blame here, and not me. I tried Lighttp and it works. It gives several EWOULDBLOCK, but I can just handle these again with no problem. I should have tried this sooner... I've used a lot of time trying to track down these problems :|

EWOULDBLOCK is to be expected, it simply means you've tried to read when there is no data available, before the close/shutdown(SEND) from the other end. :)

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/