|
Home > Archive > Unix Programming > December 2006 > select() "hangs"
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| ralph.schlosser@googlemail.com 2006-12-18, 1:20 pm |
| Hello folks,
perhaps you UNIX experts can shed some new light on a nasty little
problem I'm currently stuck with. Oh and please apologize, if I'm
posting to the wrong newsgroup.
Now, the problem I have is about the select() function call.
I'm developping a small UNIX daemon which works as a serial
multiplexer, multiplexing data arriving on a physical serial interface
/dev/ttySx to a number of logical interfaces /dev/smplexdy and vice
versa. The logical interfaces have been implemented using the PTY file
system.
Please consider the following code excerpt:
// Initialize data structures for the select system call.
FD_ZERO(&m_fd_set_read);
max_fd = tty_dev[0];
// Add TTY's to be monitored.
for(n = 0; n < tty_count; n++)
{
FD_SET(tty_dev[n], &m_fd_set_read);
if(tty_dev[n] > max_fd)
max_fd = tty_dev[n];
}
// Add PTY's to be monitored.
for(n = 0; n < chn_count * tty_count; n++)
{
FD_SET(pty_dev_array[n], &m_fd_set_read);
if(pty_dev_array[n] > max_fd)
max_fd = pty_dev_array[n];
}
// Prepare max_fd and backup copy for select() call.
backup = m_fd_set_read;
max_fd++;
while(1)
{
// Add timeval to select.
m_timeval.tv_sec = 0;
m_timeval.tv_usec = 200000;
// Restore the set.
m_fd_set_read = backup;
m_select_return = select(max_fd, &m_fd_set_read, NULL, NULL,
&m_timeval);
...
}
This is the multiplexer's main loop. My problem lies in the select()
function call. The code I have posted above works well for some time,
however, sooner or later select() will block, blatantly ignoring the
timeout value specified earlier and thus effectively cause the
multiplexer to stop working properly.
Perhaps you can see an obvious problem in my code? I'd be glad if you
had some some suggestions to me since frankly I am clueless :-)
Thanks for your input!
Ralph
| |
| Alex Fraser 2006-12-18, 1:20 pm |
| <ralph.schlosser@googlemail.com> wrote in message
news:1166464162.429245.243330@l12g2000cwl.googlegroups.com...
[snip]
> Please consider the following code excerpt:
>
> // Initialize data structures for the select system call.
> FD_ZERO(&m_fd_set_read);
> max_fd = tty_dev[0];
>
> // Add TTY's to be monitored.
> for(n = 0; n < tty_count; n++)
> {
> FD_SET(tty_dev[n], &m_fd_set_read);
> if(tty_dev[n] > max_fd)
> max_fd = tty_dev[n];
> }
> // Add PTY's to be monitored.
> for(n = 0; n < chn_count * tty_count; n++)
> {
> FD_SET(pty_dev_array[n], &m_fd_set_read);
> if(pty_dev_array[n] > max_fd)
> max_fd = pty_dev_array[n];
> }
> // Prepare max_fd and backup copy for select() call.
> backup = m_fd_set_read;
> max_fd++;
>
> while(1)
> {
> // Add timeval to select.
> m_timeval.tv_sec = 0;
> m_timeval.tv_usec = 200000;
>
> // Restore the set.
> m_fd_set_read = backup;
> m_select_return = select(max_fd, &m_fd_set_read, NULL, NULL,
> &m_timeval);
>
> ...
> }
>
> This is the multiplexer's main loop. My problem lies in the select()
> function call. The code I have posted above works well for some time,
> however, sooner or later select() will block, blatantly ignoring the
> timeout value specified earlier and thus effectively cause the
> multiplexer to stop working properly.
I have not spotted anything obviously wrong with the code above. How do you
know it is select() which blocks? Have you run the code under a system call
tracer? Are all descriptors set to non-blocking mode?
Alex
| |
| matevzb 2006-12-18, 1:20 pm |
| On Dec 18, 6:49 pm, ralph.schlos...@googlemail.com wrote:
> This is the multiplexer's main loop. My problem lies in the select()
> function call. The code I have posted above works well for some time,
> however, sooner or later select() will block, blatantly ignoring the
> timeout value specified earlier and thus effectively cause the
> multiplexer to stop working properly.
I might be wrong (it's been a while), but shouldn't you call
FD_ZERO()/FD_SET() every time you call select()?
--
WYCIWYG - what you C is what you get
| |
| matevzb 2006-12-18, 1:20 pm |
| On Dec 18, 7:21 pm, "matevzb" <mate...@gmail.com> wrote:
> I might be wrong (it's been a while), but shouldn't you call
> FD_ZERO()/FD_SET() every time you call select()?
F*!*^%*#( google groups, let me post! I spotted your "backup" just
after my post, so it does look ok -> look at Alex's suggestions...
--
WYCIWYG - what you C is what you get
| |
| ralph.schlosser@googlemail.com 2006-12-18, 1:20 pm |
| Alex,
> [snip]
> I have not spotted anything obviously wrong with the code above. How do you
> know it is select() which blocks? Have you run the code under a system call
> tracer? Are all descriptors set to non-blocking mode?
I definately know it's the select call since I have put a debug
statement directly after the select() line. I'm not aware of any system
call tracers other than strace (under Linux), which in my case can't be
used directly as I have a daemon application.
Is there any better tool than strace you would suggest?
As to your question if the descriptors are opened non-blocking, no they
are not:
fd = open(fd_dev, O_RDWR | O_NOCTTY | O_SYNC | O_DIRECT);
Could this block select()? Now that you mention it, it seems to me
superfluous, if not wrong, to open the files in a blocking manner...
Thanks,
Ralph
| |
| matevzb 2006-12-18, 1:20 pm |
| On Dec 18, 7:21 pm, "matevzb" <mate...@gmail.com> wrote:
> I might be wrong (it's been a while), but shouldn't you call
> FD_ZERO()/FD_SET() every time you call select()?
Then again, I just spotted your "backup" =)
--
WYCIWYG - what you C is what you get
| |
| matevzb 2006-12-18, 1:20 pm |
| On Dec 18, 7:21 pm, "matevzb" <mate...@gmail.com> wrote:
> I might be wrong (it's been a while), but shouldn't you call
> FD_ZERO()/FD_SET() every time you call select()?
Then again, I just spotted your "backup" =)
--
WYCIWYG - what you C is what you get
| |
| matevzb 2006-12-18, 1:20 pm |
| On Dec 18, 7:21 pm, "matevzb" <mate...@gmail.com> wrote:
> I might be wrong (it's been a while), but shouldn't you call
> FD_ZERO()/FD_SET() every time you call select()?
> --
> WYCIWYG - what you C is what you get
Then again, I just spotted your "backup" =)
--
WYCIWYG - what you C is what you get
| |
| matevzb 2006-12-18, 7:22 pm |
| On Dec 18, 7:21 pm, "matevzb" <mate...@gmail.com> wrote:
> I might be wrong (it's been a while), but shouldn't you call
> FD_ZERO()/FD_SET() every time you call select()?
> --
> WYCIWYG - what you C is what you get
Then again, I just spotted your "backup" =)
--
WYCIWYG - what you C is what you get
| |
| William Ahern 2006-12-18, 7:22 pm |
| On Mon, 18 Dec 2006 10:49:15 -0800, ralph.schlosser wrote:
> Alex,
>
>
> I definately know it's the select call since I have put a debug
> statement directly after the select() line. I'm not aware of any system
> call tracers other than strace (under Linux), which in my case can't be
> used directly as I have a daemon application.
>
strace -p <PID>
man strace(1).
| |
| loic-dev@gmx.net 2006-12-18, 7:22 pm |
| Hello Ralf,
<snip>
> This is the multiplexer's main loop. My problem lies in the select()
> function call. The code I have posted above works well for some time,
> however, sooner or later select() will block, blatantly ignoring the
> timeout value specified earlier and thus effectively cause the
> multiplexer to stop working properly.
>
> Perhaps you can see an obvious problem in my code? I'd be glad if you
> had some some suggestions to me since frankly I am clueless :-)
The first thing is to verify that your assumption is correct, namely
that your process blocks in select. Please follow William's advice, and
attach the strace to you're daemon when it seems to stuck, by calling
$ strace -p PID
where PID is the PID of your daemon process.
Another possibility would be to "follow" the children using the '-f'
flag when starting your daemon:
$ strace -o trace_file -f your_daemon
where /trace_file/ is the name of the file where the traces should be
written, and /your_daemon/ the name of your daemon's executable.
Cheers,
Loic.
| |
| John L Fjellstad 2006-12-19, 7:32 am |
| ralph.schlosser@googlemail.com writes:
> Please consider the following code excerpt:
>
> // Initialize data structures for the select system call.
> FD_ZERO(&m_fd_set_read);
> max_fd = tty_dev[0];
>
> // Add TTY's to be monitored.
> for(n = 0; n < tty_count; n++)
> {
> FD_SET(tty_dev[n], &m_fd_set_read);
> if(tty_dev[n] > max_fd)
> max_fd = tty_dev[n];
> }
> // Add PTY's to be monitored.
> for(n = 0; n < chn_count * tty_count; n++)
> {
> FD_SET(pty_dev_array[n], &m_fd_set_read);
> if(pty_dev_array[n] > max_fd)
> max_fd = pty_dev_array[n];
> }
> // Prepare max_fd and backup copy for select() call.
> backup = m_fd_set_read;
> max_fd++;
>
> while(1)
> {
> // Add timeval to select.
> m_timeval.tv_sec = 0;
> m_timeval.tv_usec = 200000;
>
> // Restore the set.
> m_fd_set_read = backup;
> m_select_return = select(max_fd, &m_fd_set_read, NULL, NULL,
> &m_timeval);
>
> ...
> }
>
> This is the multiplexer's main loop. My problem lies in the select()
> function call. The code I have posted above works well for some time,
> however, sooner or later select() will block, blatantly ignoring the
> timeout value specified earlier and thus effectively cause the
> multiplexer to stop working properly.
>
> Perhaps you can see an obvious problem in my code? I'd be glad if you
> had some some suggestions to me since frankly I am clueless :-)
Are you sure the timeval gets reset before every select() call? (is
this the original code?). In Linux, the timeval gets changed during the
call.
--
John L. Fjellstad
web: http://www.fjellstad.org/ Quis custodiet ipsos custodes
Replace YEAR with current four digit year
| |
| ralph.schlosser@googlemail.com 2006-12-19, 1:23 pm |
| John,
> Are you sure the timeval gets reset before every select() call? (is
> this the original code?). In Linux, the timeval gets changed during the
> call.
I'm quite sure the timeval gets reset, in other words this IS in fact
the original code and I made it this way because, as you have
mentioned, the timeval gets changed under Linux.
Thanks,
Ralph
| |
| ralph.schlosser@googlemail.com 2006-12-19, 1:23 pm |
| Hi!
First of all, thanks for your numerous replies.
Today I wasn't in my office so I shall try the things some of you have
mentioned tomorrow and, if necessary, come back here with an strace
output.
Perhaps my problem is caused by blocking file descriptors or perhaps
it's something else entirely, we will see. Thanks anyway for your time
and effort!
Ralph
| |
| Alex Fraser 2006-12-19, 1:23 pm |
| <ralph.schlosser@googlemail.com> wrote in message
news:1166467755.699217.184110@73g2000cwn.googlegroups.com...
>
> I definately know it's the select call since I have put a debug
> statement directly after the select() line.
Unless you also have one before it, how do you know it got to select()?
> I'm not aware of any system call tracers other than strace (under Linux),
> which in my case can't be used directly as I have a daemon application.
strace -p should do the job. Running as normal and attaching once the
application gets into this apparent stuck state might be enough.
> As to your question if the descriptors are opened non-blocking, no they
> are not:
>
> fd = open(fd_dev, O_RDWR | O_NOCTTY | O_SYNC | O_DIRECT);
>
> Could this block select()? Now that you mention it, it seems to me
> superfluous, if not wrong, to open the files in a blocking manner...
The operation of select() is not affected by the non-blocking mode of the
descriptors, but the mode will naturally affect read() and write(), which is
where I thought more likely that you were blocking.
O_SYNC and O_DIRECT do not seem to make sense in your application: when
using select() (or poll() for that matter) to multiplex IO, you want
reads/writes to complete as close to instantly as possible and they must not
block. This implies making full use of buffering in the kernel. Therefore, I
would remove O_SYNC and O_DIRECT, and add O_NONBLOCK so you get EAGAIN
instead of blocking.
Alex
|
|
|
|
|