Descriptor passed w/SCM_RIGHTS is invalid
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Unix and Linux reviews > Free Unix support > Unix Programming > Descriptor passed w/SCM_RIGHTS is invalid




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    Descriptor passed w/SCM_RIGHTS is invalid  
Jo


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
09-30-04 03:46 PM

I've been banging my head for a week now over this issue, and now I'm
stuck dead. I have an open socket in a parent process, and I'm using a
pipe created by socketpair() to pass this socket to a child process.
For some bizarre reason, even though I can write to the socket in the
parent, write()s to the socket in the child fail, and errno is set to
EBADF, even though lsof gives the descriptor in the child as a valid
read/write socket.

The child has this piece:
int desNewSocket = GetNewSocket();
fprintf(a_logs_descriptor, "the child's (%d) socket descriptor: %d\n",
getpid(), desNewSocket);
if (write(desNewSocket, "hey man", 8) == -1 && errno == EBADF)
fprintf(a_logs_descriptor, "but it won't write\n");

So the log reports:
the child's (5555) socket descriptor: 4
but it won't write

Now if I do a lsof after this descriptor has been read, I see that
process 5555 has 4u under FD. So it should be able to write to it,
right? Futhermore, process 5554 (the parent) has 20u open to the same
socket (lsof reports that they both have the same NAME). The parent
CAN write to descriptor 20 without any problem. In case you are
wondering if descriptor 20 in the parent is interfering with
descriptor 4 in the child, don't. I've tried various different ways of
dealing with 20, by closing it right away, or waiting for the child to
use the socket before the parent closes it, etc. BTW, the client
program to which this socket is connected doesn't seem to experience a
peer reset until BOTH descriptors are closed.

With lsof showing the process having a valid descriptor, I can't
imagine what the problem could be, but I suppose you'll want to see my
code.

The part before and just after the fork() is pretty simple:
int desAncils[2];
socketpair(AF_UNIX, SOCK_STREAM, 0, desAncils);

pid_t nPid = fork();
switch (nPid) {
case 0:	{				//child
close(desAncils[1]);
desAncil = desAncils[0];//desAncil is a global
break;
}
case -1: assert(false);
default: {		//parent
close(desAncils[0]);
desAncil = desAncils[1];
}
}

I've researched routines for sending ancillary messages like crazy,
and I've wasted a lot of time on a lot of crap, but here what I've
ended up w/:
void SendNewSocket(int desChild, int desNewSocket) {
int sendfd = desNewSocket;
int nbytes = 100;
char ptr[nbytes];

struct msghdr   msg;
struct iovec    iov[1];

union {
struct cmsghdr        cm;
char                          control[CMSG_SPACE(sizeof(int))];
} control_un;
struct cmsghdr  *cmptr;

msg.msg_control = control_un.control;
msg.msg_controllen = sizeof(control_un.control);

cmptr = CMSG_FIRSTHDR(&msg);
cmptr->cmsg_len = CMSG_LEN(sizeof(int));
cmptr->cmsg_level = SOL_SOCKET;
cmptr->cmsg_type = SCM_RIGHTS;
*((int *) CMSG_DATA(cmptr)) = sendfd;

msg.msg_name = NULL;
msg.msg_namelen = 0;

iov[0].iov_base = ptr;
iov[0].iov_len = nbytes;
msg.msg_iov = iov;
msg.msg_iovlen = 1;

struct cmsghdr *cmsg = cmptr;
ssize_t sent = sendmsg(desChild, &msg, 0);
assert(sent == 1);
}

Now, here's the code in the child that does the receiving:
int GetNewSocket() {
int nbytes = 100;
char ptr[nbytes];

struct msghdr   msg;
struct iovec    iov[1];
int         recvfd;

union {
struct cmsghdr        cm;
char              control[CMSG_SPACE(sizeof(int))];
} control_un;
struct cmsghdr  *cmptr;

msg.msg_control = control_un.control;
msg.msg_controllen = sizeof(control_un.control);
msg.msg_name = NULL;
msg.msg_namelen = 0;

iov[0].iov_base = ptr;
iov[0].iov_len = nbytes;
msg.msg_iov = iov;
msg.msg_iovlen = 1;

//desAncil is a global, don't forget
if ( (recvfd = recvmsg(desAncil, &msg, 0)) <= 0)
return(recvfd);

if ( (cmptr = CMSG_FIRSTHDR(&msg)) != NULL &&
cmptr->cmsg_len == CMSG_LEN(sizeof(int))) {
assert (cmptr->cmsg_level == SOL_SOCKET && cmptr->cmsg_type ==
SCM_RIGHTS);
recvfd = *((int *) CMSG_DATA(cmptr));
} else
recvfd = -1;           /* descriptor was not passed */

return recvfd;
}

So that should do it, right? Remember, the desNewSocket parameter for
SendNewSocket() (which usually turns out to be 20) can be written to
in the parent, so the socket should be good. The descriptor that
appears out of GetNewSocket() (usually 4) is called a "bad descriptor"
upon writing, even though lsof shows it to be valid.





[ Post a follow-up to this message ]



    Re: Descriptor passed w/SCM_RIGHTS is invalid  
Jo


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-03-04 02:13 AM

I should further point out that I've run some more tests and I've
successfully performed fstat() on the descriptor. I find it
interesting that fstat() doesn't have a problem with the descriptor,
but select(), write() and read() do.
I'm not sure what results would be meaningful. The st_mode member is
140777, making it a socket, and completely permissive. It's also
interesting that when fstat() is called, the socket is reset: i.e. the
client-end of the socket detects a peer reset, and the socket is
removed from lsof. So what the heck is wrong with this socket?





[ Post a follow-up to this message ]



    Re: Descriptor passed w/SCM_RIGHTS is invalid  
James Antill


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-03-04 02:13 AM

On Wed, 29 Sep 2004 20:42:43 -0700, Jo wrote:

> I've been banging my head for a week now over this issue, and now I'm
> stuck dead. I have an open socket in a parent process, and I'm using a
> pipe created by socketpair() to pass this socket to a child process.
> For some bizarre reason, even though I can write to the socket in the
> parent, write()s to the socket in the child fail, and errno is set to
> EBADF, even though lsof gives the descriptor in the child as a valid
> read/write socket.
>
> The child has this piece:
> int desNewSocket = GetNewSocket();
> fprintf(a_logs_descriptor, "the child's (%d) socket descriptor: %d\n",
> getpid(), desNewSocket);
> if (write(desNewSocket, "hey man", 8) == -1 && errno == EBADF)
>     fprintf(a_logs_descriptor, "but it won't write\n");
>
> So the log reports:
> the child's (5555) socket descriptor: 4
> but it won't write
>
> Now if I do a lsof after this descriptor has been read, I see that
> process 5555 has 4u under FD. So it should be able to write to it,
> right? Futhermore, process 5554 (the parent) has 20u open to the same
> socket (lsof reports that they both have the same NAME). The parent
> CAN write to descriptor 20 without any problem. In case you are
> wondering if descriptor 20 in the parent is interfering with
> descriptor 4 in the child, don't. I've tried various different ways of
> dealing with 20, by closing it right away, or waiting for the child to
> use the socket before the parent closes it, etc. BTW, the client
> program to which this socket is connected doesn't seem to experience a
> peer reset until BOTH descriptors are closed.
>
> With lsof showing the process having a valid descriptor, I can't
> imagine what the problem could be, but I suppose you'll want to see my
> code.
>
> The part before and just after the fork() is pretty simple:
> int desAncils[2];
> socketpair(AF_UNIX, SOCK_STREAM, 0, desAncils);
>
> pid_t nPid = fork();
> switch (nPid) {
> 	case 0:	{				//child
> 		close(desAncils[1]);
> 		desAncil = desAncils[0];//desAncil is a global
> 		break;
> 	}
> 	case -1: assert(false);
> 	default: {		//parent
> 		close(desAncils[0]);
> 		desAncil = desAncils[1];
> 	}
> }
>
> I've researched routines for sending ancillary messages like crazy,
> and I've wasted a lot of time on a lot of crap, but here what I've
> ended up w/:
> void SendNewSocket(int desChild, int desNewSocket) {
> 	int sendfd = desNewSocket;
> 	int nbytes = 100;
> 	char ptr[nbytes];
>
> 	struct msghdr   msg;
> 	struct iovec    iov[1];
>
> 	union {
> 		struct cmsghdr        cm;
> 		char                          control[CMSG_SPACE(sizeof(int))];
> 	} control_un;
> 	struct cmsghdr  *cmptr;
>
> 	msg.msg_control = control_un.control;

I'd do...

msg.msg_control = &control_un.cm;

...here, but your version is probably correct.

> 	msg.msg_controllen = sizeof(control_un.control);
>
> 	cmptr = CMSG_FIRSTHDR(&msg);
> 	cmptr->cmsg_len = CMSG_LEN(sizeof(int));
> 	cmptr->cmsg_level = SOL_SOCKET;
> 	cmptr->cmsg_type = SCM_RIGHTS;
> 	*((int *) CMSG_DATA(cmptr)) = sendfd;

This is unaligned, but probably ok.

> 	msg.msg_name = NULL;
> 	msg.msg_namelen = 0;
>
> 	iov[0].iov_base = ptr;
> 	iov[0].iov_len = nbytes;

Passing random data like this isn't a good idea IMO, if you don't
car about that I'd pass a single char of zero or something and check that
at the other end.

> 	msg.msg_iov = iov;
> 	msg.msg_iovlen = 1;
>
> 	struct cmsghdr *cmsg = cmptr;
> 	ssize_t sent = sendmsg(desChild, &msg, 0);
> 	assert(sent == 1);
> }
>
> Now, here's the code in the child that does the receiving:
> int GetNewSocket() {
> 	int nbytes = 100;
> 	char ptr[nbytes];
>
> 	struct msghdr   msg;
> 	struct iovec    iov[1];
> 	int         recvfd;
>
> 	union {
> 		struct cmsghdr        cm;
> 		char              control[CMSG_SPACE(sizeof(int))];
> 	} control_un;
> 	struct cmsghdr  *cmptr;
>
> 	msg.msg_control = control_un.control;
> 	msg.msg_controllen = sizeof(control_un.control);
> 	msg.msg_name = NULL;
> 	msg.msg_namelen = 0;
>
> 	iov[0].iov_base = ptr;
> 	iov[0].iov_len = nbytes;
> 	msg.msg_iov = iov;
> 	msg.msg_iovlen = 1;
>
> 		//desAncil is a global, don't forget
> 	if ( (recvfd = recvmsg(desAncil, &msg, 0)) <= 0)
> 					return(recvfd);

I'd print out msg.msg_controllen at this point.

> 	if ( (cmptr = CMSG_FIRSTHDR(&msg)) != NULL &&
> 			cmptr->cmsg_len == CMSG_LEN(sizeof(int))) {
> 					assert (cmptr->cmsg_level == SOL_SOCKET && cmptr->cmsg_type ==
> SCM_RIGHTS);

This is a bad thing to assert IMO, this should be part of the if.

> 					recvfd = *((int *) CMSG_DATA(cmptr));
> 	} else
> 					recvfd = -1;           /* descriptor was not passed */
>
> 	return recvfd;
> }
>
> So that should do it, right? Remember, the desNewSocket parameter for
> SendNewSocket() (which usually turns out to be 20) can be written to
> in the parent, so the socket should be good. The descriptor that
> appears out of GetNewSocket() (usually 4) is called a "bad descriptor"
> upon writing, even though lsof shows it to be valid.

Well it's hard to see without the other code[1], the above looks about
right, although previous code[2] I've done using CMSGS is different on t
he
send side.

[1] You want to post at least the code calling the above functions, if y
ou
want more help.

[2] http://www.and.org/socket_com/

--
James Antill -- james@and.org
Need an efficient and powerful string library for C?
http://www.and.org/vstr/






[ Post a follow-up to this message ]



    Re: Descriptor passed w/SCM_RIGHTS is invalid  
Jo


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-03-04 07:48 AM

James Antill <james-netnews@and.org> wrote in message news:<pan.2004.10.02.20.17.50.850215@a
nd.org>...
> On Wed, 29 Sep 2004 20:42:43 -0700, Jo wrote:
> 
>
>  I'd do...
>
>  	msg.msg_control = &control_un.cm;
>
> ...here, but your version is probably correct.
> 
>
>  This is unaligned, but probably ok.
> 
>
>  Passing random data like this isn't a good idea IMO, if you don't
>  car about that I'd pass a single char of zero or something and check that
> at the other end.

What do you mean random data? This isn't my code, so I had to
take a leap of faith when adapting it, and in the case of ptr, I
assumed that it was a buffer for some other function to use. FWIW, I
tried preceding the assignment with ptr[0] = '\0', if that's what you
meant, but it didn't fix it.

> 
>
>  I'd print out msg.msg_controllen at this point.

Comin' right up....... 16. Does that tell you anything? I seem to
remember the length of something being 16 on the send side.

> 
>
>  This is a bad thing to assert IMO, this should be part of the if.

Why? I didn't think it would be the sort of problem that would be
expected in  working code, just in development code.

> 
>
>  Well it's hard to see without the other code[1], the above looks abou
t
> right, although previous code[2] I've done using CMSGS is different on
 the
> send side.

It's a lot of code, and I'm really not familiar with these
functions, so I think I'll leave it until we get stuck.

>
> [1] You want to post at least the code calling the above functions, if
 you
> want more help.

That makes sense, if the problem isn't with the above, but there
isn't anything that performs any operations on any of the descriptors,
except the code that creates them, and select()s for them; so I guess
I'll show you that:

Here's what creates the socket that listens for incoming AF_INET
connections:
BOOL SetupListener(int nPort) {
int client_sockfd;
int nServerLen;
struct sockaddr_in server_address;

m_desListenSocket = socket(AF_INET, SOCK_STREAM, 0);
if (m_desListenSocket == -1) {
Log((string)"Error getting socket: " + strerror(errno));
return FALSE;
}
int yes = 1;
if (setsockopt(m_desListenSocket, SOL_SOCKET, SO_REUSEADDR, &yes,
sizeof(yes)) != 0) {
Log((string)"Error setting socket option: " + strerror(errno));
return FALSE;
}

server_address.sin_family = AF_INET;
server_address.sin_addr.s_addr = htonl(INADDR_ANY);
server_address.sin_port = htons( nPort );
nServerLen = sizeof(server_address);

if (bind(m_desListenSocket, (struct sockaddr*)&server_address,
nServerLen) == -1) {
Log((string)"Error binding socket: " + strerror(errno));
return FALSE;
}

if (listen(m_desListenSocket, LISTENQUEUESIZE) == -1) {
Log((string)"Error listening for socket: " + strerror(errno));
return FALSE;
}

return TRUE;
}

Now here's what listens for those incoming AF_INET connections:
BOOL Listen(int nTimeOutMili, int &desNewSocket) {
desNewSocket = 0;
struct timeval timeout;
timeout.tv_sec = 0;
timeout.tv_usec = nTimeOutMili*1000;
fd_set fdSet;
int desMax;
FD_ZERO(&fdSet);
FD_SET(m_desListenSocket, &fdSet);
desMax = m_desListenSocket;
int nSelReturn = select(desMax+1, &fdSet, NULL, NULL, &timeout);

if (nSelReturn == -1) {
if (errno != EINTR) {
Log("Error while listening", TRUE);
}
return FALSE;
}

if (FD_ISSET(m_desListenSocket, &fdSet)) {
desNewSocket = accept(m_desListenSocket, NULL, NULL);
assert(desNewSocket != 0);
if (desNewSocket == -1) {
Log((string)"Error accepting connection");
return FALSE;
}
}

return TRUE;
}
No operations are performed on desNewSocket after this function
returns, until the code that sends it along the AF_UNIX socket is
called, which is listen above.

That's really all there is to it. The only other pertainent code is
whatever uses the descriptor when it is received. In that event, a
single call to write(), read() or select() will give EBADF when
called. fstat() will not return an error, but it seems to kill the
socket. I've not been able to get various other functions to kill the
socket or return an error, like fcntl() and ioctl().
After the the AF_INET socket has been sent along the AF_UNIX
socket, what I do with it on the sender side depends on what I feel
like testing at the moment. Sometimes I write to it, sometimes I close
it right after it is sent, sometimes I wait until the child has had a
go at it before I close. In any of these cases, it just shows the
parent being able to use the socket properly, but the child cannot.

Also, someone recomended I do a listing of /proc/self/fd right
before the socket is killed, so I did and here's what I got:
lrwx------  1 jo users 64 Oct  2 01:24 4 -> socket:[9124]
l-wx------  1 jo users 64 Oct  2 01:24 5 -> pipe:[8962]
lrwx------  1 jo users 64 Oct  2 01:24 6 -> socket:[8963]
lr-x------  1 jo users 64 Oct  2 01:24 7 -> /proc/4881/fd

#4 is probably the AF_INET socket, while 5 is created by pipe(),
and 6 looks like the AF_UNIX socket I use to pass the AF_INET socket.
Does that tell you anything?

I feel that a solution would involve finding out what is wrong with
the socket that it would cause that error at that point. So far all I
can tell is that some functions are able to operate on the descriptor,
and it behaves like a normal descriptor, and a call to lsof shows a
healthy bouncing baby socket, that I can read/write from/to. Listing
the /proc system doesn't show anything bad that I can see. Are there
any other ways of analysing a descriptor/socket?

Finally, the idea that something psychotic with my system is to
blame hasn't escaped me. I have RHFC2, but I tried running it on a
different distro and with a different kernel. It behaved just the
same.





[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 12:12 PM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register