Unix Programming - Socket errors and errno.

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > April 2005 > Socket errors and errno.





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Socket errors and errno.
Lawrie

2005-04-14, 6:03 pm

Hi,

I am fairly new to socket programming and am extremely paranoid about
making sure I catch any error condition that may affect my application
(and attempt a recovery if possible).

I am using the select() function to notify my program of socket events.

My application sends data to remote host. I have used shutdown() to
disbable socket reads so that client applications cannot send me data.

If I lose connection (for whatever reason) with any remote hosts or my
write socket fails (for wahatever reason) I need to take remdial action
(if possible) re-establish the connection (if possible) and continue to
send data.

I wonder if anyone could point me in the direction of any resources
which may help with the following queries:

1> If a remote host disconnects the select function will inform me that
the socket is ready to read/write. I am intending to use the value of
errno if my write attepmt (which will fail if disconnected) to the
socket to diagnose the problem and trigger my application to
re-establish the connection with the remote host.

Does this seem a sensible approach?

2> What error conditions should I check for to determine errors on my
local (write) socket. If I detect errors with my local socket is it
best to drop the socket and create a new socket before attempting to
reconnect to the remote host?

Many thanks

Lawrie

loic-dev@gmx.net

2005-04-18, 7:59 am

Hello Lawrie,

> I am fairly new to socket programming and am extremely paranoid about
> making sure I catch any error condition that may affect my

application
> (and attempt a recovery if possible).
>
> I am using the select() function to notify my program of socket

events.

[snip]

> 1> If a remote host disconnects the select function will inform me

that
> the socket is ready to read/write. I am intending to use the value of
> errno if my write attepmt (which will fail if disconnected) to the
> socket to diagnose the problem and trigger my application to
> re-establish the connection with the remote host.
>
> Does this seem a sensible approach?


Yes, I guess. But you must be aware of one fact: if you have received a
RST and you write to the socket, then the SIGPIPE signal shall be
delivered to your process. This would happen for instance if the remote
application has crashed.

The way to deal with such an issue is to ignore SIGPIPE. It this case,
write() shall return -1 and set errno to EPIPE.


> 2> What error conditions should I check for to determine errors on my
> local (write) socket. If I detect errors with my local socket is it
> best to drop the socket and create a new socket before attempting to
> reconnect to the remote host?


I think, the relevant question is: why would the remote host
disconnect? I can think only of 3 possibilities (perhaps I missing
some):

1) The remote application closes or shutdowns voluntarily the socket.
2) The remote application crashes.
3) There is a failure in the network path.

Unless you are doing some active monitoring of network's healthiness,
you have no "reasonable" mean to detect 3) directly [Using timeout
might give you an indirect hint that something is going wrong.]

You can detect 2) with the EPIPE mechanism, but you can't really
recover from that failure unless you can re-spawn the remote
application or the remote application is programmed against that kind
of failure (using e.g. a watchdog that restarts the application if it
crashes).

It might make sense to re-initiate the connection in case 1)... But why
would the remote application closes or shutdowns the socket voluntarily
in first place?


Perhaps detecting the EPIPE error condition and trying to reconnect
could add some more reliability to the program. But without knowing
explicitely the applications in question, it is difficult to tell.


Cheers,
Loic.

Lawrie

2005-04-18, 5:53 pm

I should have mention that my application sends periodic heartbeat
messages. I can monitor the state of the connection to the remote host
by examining the value of errno (if the write fails) and take the
relevant action.

Loic Domaigne

2005-04-20, 5:52 pm

Hello Lawrie,

> I should have mention that my application sends periodic heartbeat
> messages.


Are you heartbeating your remote app? Great!
With TCP?
Do you have a redundant network path?

> I can monitor the state of the connection to the remote host
> by examining the value of errno (if the write fails) and take the
> relevant action.


Assuming that you are using TCP, the reason why the hearbeat fail could be:

(1) The remote host has crashed (for instance, kernel panic, sudden
power off etc.)
(2) The remote app has crashed (e.g. SIGSEGV), but the remote host is OK.
(3) The remote app has closed or shutdowned the socket.
(4) There is a networking failure (network cable plugged out, NIC
problem that didn't led to a kernel panic etc.)

Unless you are using redundant network path, you have no mean to
distinguish those cases. However, you might tell whether it is case 1) -
4), or 2) - 3).

case 1) and 4) can be (only) detected by a timeout, whereas case 2) and
3) can be detected with the EPIPE error condition...


Regards,
Loic.
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com