×

I have a wrapper for connect() in which I need to implement self-
> connect detection. In the event of a self-connect, the wrapper will
> close the socket and respond as if it had received ECONNREFUSED.
> (The documented semantics for this wrapper implicitly make self-
> connect equivalent to a failure to connect at all.)
>
> (For readers not familiar with TCP self-connect: it’s permissible for
> a socket to connect to itself if the local and remote ports match and
> the remote IP address is in fact a local IP address, ie the address
> of one of the local host’s interfaces. This condition is most often
> met when a process tries to connect to a local port in the ephemeral
> range which is currently unused, and gets assigned that port as the
> socket’s local port.)

(snip)

> However, I was concerned that it might be possible on a multihomed
> host (which would really include just about every host, if we count
> the loopback interface) for the local address to use one local IP
> address and the remote to use another, in which case I’d get a false
> negative. Does anyone know offhand if that’s legal, and if so of any
> implementations that behave that way?

It still seems a little strange that self connect coult happen.

My guess is that self connect, as you describe, is a problem because
(local IP, local port, remote IP, remote port) is exactly the same as
(remote IP, remote port, local IP, local port), though that would not
be true if two different addresses on the same host were used.

You would fail to detect it, but I don’t see why it wouldn’t work.I agree – the 4-tuples would be different, and so it would not be a
self-connection. Instead, it would most likely give ECONNREFUSED due
to the lack of a listening socket at the other endpoint.

Maybe. In v2 Stevens says:

A process creates a socket and connects it to itself using the
system calls: socket, bind a local port (say 3000), and then
connect to this same port and some local IP address. (960)

The “some local IP address” bit is what worries me. It appears,
based on Stevens’ description of the control flow (962), that any
local IP address will be a match, because the implemenation he’s
looking at – BSD 4.4 – handles an outbound packet for any local
interface by queuing it for the loopback interface.

I ran across an interesting piece by Craig Milo Rogers on the
subject of self-connect in Linux.[1] It seems to imply that Linux
is unusual in allowing *accidental* self-connect, which is what’s
happening here; that some (most?) implementations where self-
connect is possible at all (which it should be) have a check to
prevent assigning an ephemeral port which matches the destination
port (possibly only if the destination address is a local one,
though there’s not really any need to make that check).

He believes he remembers Jon (Postel, presumably) recommending
this check in the stack as a defense against accidental self-
connect.

So unfortunately it appears that accidental self-connect is all
too possible on Linux, and that it may happen when source address
!= destination address, making it troublesome to detect. OTOH,
fixing my wrapper code so it only tries a reasonable number of
connects will make it *much* less likely, and checking for the
easy case where source address == destination address will catch
at least some of any accidental self-connects that sneak through.

1. http://www.ussg.iu.edu/hypermail/li…909.3/0510.html

I might have thought that before assigning ephemeral ports
that the system would check that the port wasn’t already in use.

Though as machines could make tens of thousands of connections
that would be a little too strict. Checking that the same
quad wasn’t already in use wouldn’t catch self connect.

Otherwise, if I telnet localhost 60000 and the source
port happens to be 60000 and the source IP is different then
I would think it should work.

Though I believe that many systems adjust the source
address to match the destination net, in which case it
would not be able to do that.

> He believes he remembers Jon (Postel, presumably) recommending
> this check in the stack as a defense against accidental self-
> connect.

It isn’t so obvious that self connect can’t be made to work.
If the system knows which socket the packet came from it
should just send it to the other one. That slightly violates
that rule that the quad is unique to each side of the TCP
connection. That seems more work than the OS detecting it.

Yes, but TCP connections are not distinguished by interface, but by IP
address. It is perfectly possible for a multihomed host (with IP
addresses i1 and i1′) to have two concurrent TCP connections to a
single other host (i2), with IP addresses and ports (i1,p1,i2,p2) and
(i1′,p1,i2,p2). These connections are distinct, since the quadruples
differ. The same would be true for a purported self-connection from
(i1,p1) to (localhost,p1) – the quadruples differ, so the connection
wouldn’t happen.

I am not sure what you mean by wouldn’t happen.

The quad (i1,p1,127.1,p1) should make a perfectly fine TCP
connection. The quad (i1,p1,i1,p1) might not.

Not with self-connect. Self-connect connects a single socket to
itself. The socket goes from CLOSED to SYN_SENT to SYN_RCVD to
ESTABLISHED. Stevens has a more detailed description.

> Say I decide to run a telnet server on port 60000 hoping
> that no-one will notice. (Security through obscurity.)

Not sure where you’re going with this, but a client trying to connect
to port 60000 on the local machine when the server isn’t running (no
socket in LISTEN state for port 6000) should self- connect if its
source port is 60000.

As I noted previously, though, BSD 4.3 and some implementations based
on it had a bug that prevented self-connect (and simultaneous open)
from succeeding; some later implementations had a bug that would
crash the stack (cf the “LAND attack”); and many specifically avoid
assigning an ephemeral port that matches the destination port, so
accidental self-connect is impossible. (It’s still possible on such
systems to deliberately self-connect by binding the client to the
destination port before calling connect.)

> I might have thought that before assigning ephemeral ports
> that the system would check that the port wasn’t already in use.

Apparently most do, but not Linux.

> Though as machines could make tens of thousands of connections
> that would be a little too strict. Checking that the same
> quad wasn’t already in use wouldn’t catch self connect.

No, but checking that next-ephemeral-port != destination-port is
trivial when assigning the ephemeral port in connect().

> Though I believe that many systems adjust the source
> address to match the destination net, in which case it
> would not be able to do that.

Yes, that occurred to me. When I’m connecting with an unbound
socket, the stack has to assign a source IP address as well as
an ephemeral port, of course, and if I’m connecting to a local
address then it seems sensible that the stack would assign that
same address – and so my check for self-connect would work.

>
> It isn’t so obvious that self connect can’t be made to work.

It’s supposed to, and it does. The problem (for my library) is
detecting it if it happens accidentally, or (for the stack)
preventing it from happening accidentally.

> If the system knows which socket the packet came from it
> should just send it to the other one. That slightly violates
> that rule that the quad is unique to each side of the TCP
> connection.

Apparently that rule is a gloss which doesn’t correctly cover the
case of self-connect. According to the references I’ve seen,
self-connect, bizzare though it seems, is not only legal but
necessary for a fully-compliant TCP implementation. It falls out
from support for simultaneous open (which RFC 1122 requires) coupled
with the description of the TCP state machine.

Other related topics:

Other Networking Software & Tools: