| skaller 2007-05-03, 1:23 pm |
| On Sat, 28 Apr 2007 21:37:18 +0000, James Antill wrote:
>
> This proves little, and esp. if you expected differently it'd be much
> better to simplify what is being tested as much as possible.
Actually, what is being tested is my asynchronous I/O library.
However I expect you mean, the effect of 'close' with nonblocking mode.
>
> There is a huge amount of doubt, why are you assuming that your code is
> perfect and wget/firefox are perfect ...
Well, I'm not assuming my code is perfect. Clearly there's a problem
somewhere, most likely in my code. The problem is that I can't
figure out what it could possibly be, other that what I said:
Linux is simply flushing the buffer on close. As far as I can
tell that is allowed by Posix .. for the simple reason it has
to be. The only alternative would be to block on close until the
buffers were empty.
Using strace I can confirm the writes are all being done,
and the close is being done after the last write.
And I can tell you both wget and firefox fail to read all
the data sometimes: it is behaving AS IF close was flushing
the output buffers.
That is confirmed by the simple experiment of delaying the
close long enough.
On localhost, it takes a HUGE file to cause this problem.
On a remote site, NO files get sent properly -- my explanation
is that this is because my ADSL connection is very slow
compared to the server machine, and so the server write
buffers are still partially full at the time close() is issued.
OF course none of this is any kind of proof that there's a bug.
but kernel APIs that are used
> orders of magnitude more often are behaving differently from how people
> are telling you they do?
It is hard to know what the behaviour should be. The Linux man pages
AND the Posix specifications are vague. They typically do not
tell the whole truth, at least in one place.
Furthermore .. people tell me different stories. It's quite hard
to know ;(
> I've just tested this on kernel-2.6.20-1.2944.fc6 with and-httpd and it
> does what I'd expect when contacted via. wget. Ie. The relevant part of
> the strace for a request of a 506 byte file is:
>
> accept() = 5
> fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 epoll_ctl(3, EPOLL_CTL_ADD,
> 5, {0, {u32=151302912, u64=151302912}}) = 0 fcntl64(5, F_GETFL)
> = 0x2 (flags O_RDWR) fcntl64(5, F_SETFL, O_RDWR|O_NONBLOCK)
> = 0 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN, {u32=151302912,
> u64=151302912}}) = 0 readv(5) = 99
> stat64(fname)
> getxattr(fname, "user.mime_type")
> open() = 6
> fstat64(6)
> fadvise64_64(6, 0, 504, POSIX_FADV_SEQUENTIAL) = 0 epoll_ctl(3,
> EPOLL_CTL_MOD, 5, {0, {u32=151302912, u64=151302912}}) = 0 setsockopt(5,
> SOL_TCP, TCP_NODELAY, [0], 4) = 0 setsockopt(5, SOL_TCP, TCP_CORK, [1],
> 4) = 0 readv(6) = 504
> close(6)
> writev(5) = 782
> close(5)
>
> ...everything got to the other end fine, I also tried it with a
> 1,013,075 byte file and it had the same length and md5sum at the other
> end (the strace looks roughly the same.
But of course this proves nothing either: I only had problems
on localhost with a 2.8Meg file. Smaller files were OK.
>
> This is a dis-ingenious argument, esp. given the use of threading.
It's not an argument: it's a question. What could POSSIBLY go wrong?
I have confirmed the close is done after the last write (using
my own debugging AND strace), so it isn't a thread synchronisation
problem.
> This is one difference from and-httpd, in that it always sends a
> Content-Length header ... so it's _possible_ the clients are acting
> differently, although that wouldn't be the first place I'd look.
Dynamically generated streamed data cannot always send Content-Length,
because it isn't known. Of course, serving static pages it should
be sent -- but the webserver at the moment is simply a convenient
way to test my async I/O library.
--
/home/skaller/.signature
|