|
Home > Archive > Unix Programming > March 2007 > recv() sends extra garbage bytes
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
recv() sends extra garbage bytes
|
|
| chsalvia@gmail.com 2007-03-27, 7:16 am |
| I have a C program running on Linux/UNIX that sends data over a
network using the SOCK_STREAM protocol. One computer sends and
another receives. The incoming data is prefixed with a 4 byte
integer containing the size of the message.
Problem: I've noticed that very rarely, the receiving computer will
receive additional garbage bytes in the receive buffer. I'm not
exactly sure why this happens. To fix this, I used a decreasing
receive buffer size on each call, to match the remaining bytes on the
message, rather than the usual fixed "buffersize" value I normally
pass to receive. But I've never seen this technique done anywhere
else. Usually, people just specify some fixed buffer size value for
the recv() function, and the recv function never sends any more data
than it receives from the socket anyway. But if I do this, I receive
garbage data sometimes.
Here is what I'm doing:
int recv(int new_socket, void* buffer)
{
uint32_t recv_size;
/* Read 4 bytes to obtain size of recv'd data */
switch (recv(new_socket, buffer, sizeof(uint32_t), 0)) {
case -1: return SOCKET_ERR;
case 0: return CLOSE_SOCKET;
default: memcpy(&recv_size, buffer, sizeof(uint32_t));
}
uint32_t rem_bytes = recv_size - sizeof(uint32_t);
uint32_t recvd_bytes;
/* Receive rest of the data */
do {
switch (recvd_bytes = .recv(new_socket, buffer, rem_bytes,
0)) {
case -1: return SOCKET_ERR;
case 0: return CLOSE_SOCKET;
default: break;
}
rem_bytes -= recvd_bytes;
} while (rem_bytes != 0);
}
The literature on recv indicates that you should pass a constant value
to the recv() function to indicate the maximum number of bytes to
receive, like:
recv(sock, buffer, MAX_BYTES, 0);
....rather than the decreasing rem_bytes value I used above, since recv
is supposed to only receive exactly what was sent and nothing more,
regardless of what value you pass as MAX_BYTES. Plus, I assume that
even with the above method, the undesirable garbage bytes will stay in
the stream and simply be returned the next time I call recv().
So why am I receiving extra garbage data with recv()? Is this just
something that happens from time to time over networks?
| |
| Frank Cusack 2007-03-27, 7:16 am |
| On 26 Mar 2007 23:56:41 -0700 chsalvia@gmail.com wrote:
> int recv(int new_socket, void* buffer)
> {
> uint32_t recv_size;
>
> /* Read 4 bytes to obtain size of recv'd data */
> switch (recv(new_socket, buffer, sizeof(uint32_t), 0)) {
1. I didn't think you could name your function recv and then also call
the system recv. But looking at code below, I guess this is C++.
2. I would have to see the rest of the code, but buffer may not be
aligned properly.
> case -1: return SOCKET_ERR;
> case 0: return CLOSE_SOCKET;
> default: memcpy(&recv_size, buffer, sizeof(uint32_t));
You are not guaranteed to have received 4 bytes of data (since new_sock
is a SOCK_STREAM socket).
> }
>
> uint32_t rem_bytes = recv_size - sizeof(uint32_t);
If you did not receive 4 bytes, you just hosed yourself.
> uint32_t recvd_bytes;
>
> /* Receive rest of the data */
> do {
> switch (recvd_bytes = .recv(new_socket, buffer, rem_bytes,
Looks like a typo here (.recv). It's always best to copy&paste the
exact code you are using (or an exact test case) for best answers.
I'd be worried that rem_bytes may be larger than buffer can hold, first
because you don't pass the size of buffer to compare it against, and
second because of the bug above.
> 0)) {
> case -1: return SOCKET_ERR;
> case 0: return CLOSE_SOCKET;
> default: break;
> }
> rem_bytes -= recvd_bytes;
> } while (rem_bytes != 0);
> }
>
> The literature on recv indicates that you should pass a constant value
> to the recv() function to indicate the maximum number of bytes to
> receive, like:
>
> recv(sock, buffer, MAX_BYTES, 0);
>
> ...rather than the decreasing rem_bytes value I used above, since recv
> is supposed to only receive exactly what was sent and nothing more,
> regardless of what value you pass as MAX_BYTES.
Not for SOCK_STREAM. For SOCK_STREAM sockets, what is sent is a byte
stream without record boundaries so depending on pacing, yes you will
never receive more than what was sent, but the definition of "what was
sent" is somewhat loose.
> Plus, I assume that
> even with the above method, the undesirable garbage bytes will stay in
> the stream and simply be returned the next time I call recv().
>
> So why am I receiving extra garbage data with recv()? Is this just
> something that happens from time to time over networks?
You have at least 2 bugs. You are not checking that you received 4
bytes initially, and you are not checking that the length of the
message will fit in your buffer. Otherwise there might be a bug on
the sender's side.
-frank
| |
| chsalvia@gmail.com 2007-03-27, 7:16 am |
| > You are not guaranteed to have received 4 bytes of data (since new_sock
> is a SOCK_STREAM socket).
>
>
>
> If you did not receive 4 bytes, you just hosed yourself.
Good point. I overlooked the possibility that the initial recv might
get less than 4 bytes. I changed it so that the initial receive loops
until it gets 4 bytes.
> I'd be worried that rem_bytes may be larger than buffer can hold, first
> because you don't pass the size of buffer to compare it against, and
> second because of the bug above.
As you noticed above, I didn't copy and paste the exact code. The
reason I didn't was because my actual code uses a customized C++
buffer class that handles memory allocation. I tried to simplify my
code for the purpose of posting it here - perhaps that was a bad
idea. The loop is actually like this:
do {
switch (buffer.recv(new_socket, recv_size)) {
case -1: return SOCKET_ERR;
case 0: return CLOSE_SOCKET;
default: break;
}
} while (buffer.used_bytes < recv_size);
With buffer.recv() defined as:
inline int recv(int m_sock, uint32_t maxrecv, int flags = 0) {
if (used_bytes + maxrecv > size) realloc_buffer(used_bytes +
maxrecv);
int status = ::recv(m_sock, (byte*) data + used_bytes, maxrecv,
flags);
if (status > 0) used_bytes += status;
return status;
}
And of course, the member variable used_bytes keeps track of the
buffer length, and size keeps track of allocated space.
> You have at least 2 bugs. You are not checking that you received 4
> bytes initially, and you are not checking that the length of the
> message will fit in your buffer. Otherwise there might be a bug on
> the sender's side.
Thanks for pointing out that out. Before you said that "what is
actually sent" is loosely defined. Is there any possible way that a
server could recv() *more* data than a client sends? In my case, I
notice that every now and then, I'll receive an additional 1000 bytes
or so of garbage data on the tail end of the message. (Typical
message length is approx 300K.)
This problem doesn't seem like it could be related to the first bug
you pointed out, because even if I received less than the full 4 bytes
and then erroneously assigned the incomplete value to recv_size, this
would either result in me receiving *less* bytes than I expected,
(because a 4 byte integer containing a value like 300000 for example
would be if only 1 byte or 2 bytes were received), OR, if recv_size
contained some initial garbage value in it, it would be set to some
arbitrary high value, in which case I would recv() the full amount
anyway. This doesn't explain why I'd receive an additional ~1000
bytes of garbage data.
Is there any reason why recv() might get more data than was actually
sent by send()?
| |
| Frank Cusack 2007-03-27, 7:16 am |
| On 27 Mar 2007 00:47:48 -0700 chsalvia@gmail.com wrote:
> Thanks for pointing out that out. Before you said that "what is
> actually sent" is loosely defined. Is there any possible way that a
> server could recv() *more* data than a client sends?
Not for SOCK_STREAM.
By "loosely defined" I just mean that while you may do a recv for 1
"message" the other side may have sent multiple messages. If your
protocol operates in lock step this won't happen.
> In my case, I notice that every now and then, I'll receive an
> additional 1000 bytes or so of garbage data on the tail end of the
> message. (Typical message length is approx 300K.)
>
> This problem doesn't seem like it could be related to the first bug
> you pointed out, because even if I received less than the full 4 bytes
> and then erroneously assigned the incomplete value to recv_size, this
> would either result in me receiving *less* bytes than I expected,
> (because a 4 byte integer containing a value like 300000 for example
> would be if only 1 byte or 2 bytes were received),
if you are little-endian, and if buffer were zero'd out to begin with
> OR, if recv_size
> contained some initial garbage value in it, it would be set to some
> arbitrary high value, in which case I would recv() the full amount
> anyway. This doesn't explain why I'd receive an additional ~1000
> bytes of garbage data.
Sounds correct.
> Is there any reason why recv() might get more data than was actually
> sent by send()?
nope.
Maybe others will have some ideas.
-frank
| |
| David Schwartz 2007-03-27, 1:19 pm |
| On Mar 27, 12:47 am, chsal...@gmail.com wrote:
> do {
> switch (buffer.recv(new_socket, recv_size)) {
> case -1: return SOCKET_ERR;
> case 0: return CLOSE_SOCKET;
> default: break;
> }
> } while (buffer.used_bytes < recv_size);
>
> With buffer.recv() defined as:
>
> inline int recv(int m_sock, uint32_t maxrecv, int flags = 0) {
> if (used_bytes + maxrecv > size) realloc_buffer(used_bytes +
> maxrecv);
> int status = ::recv(m_sock, (byte*) data + used_bytes, maxrecv,
> flags);
> if (status > 0) used_bytes += status;
> return status;
> }
This is broken. Suppose you're trying to receive 10 bytes but you
receive 8. You now need to receive 2 more, but your code tries to
receive 10 again.
If I understand what you are trying to do, the correct fix is to
change:
switch (buffer.recv(new_socket, recv_size)) {
to
switch (buffer.recv(new_socket, buffer.used_bytes-recv_size)) {
However, IMO it is poor design to use the buffer level for this
purpose. If the protocol says I should be receiving 10 bytes, I want
to be dealing with the number 10, not 10 plus however many bytes were
already in the buffer.
So I think you should stop ignoring the return value from
'buffer.recv' and use it for its intended purpose -- to tell you how
many bytes you received so you know how many you still need to
receive.
DS
| |
| Bin Chen 2007-03-27, 1:19 pm |
| On Mar 27, 2:56 pm, chsal...@gmail.com wrote:
> I have a C program running on Linux/UNIX that sends data over a
> network using the SOCK_STREAM protocol. One computer sends and
> another receives. The incoming data is prefixed with a 4 byte
> integer containing the size of the message.
>
> Problem: I've noticed that very rarely, the receiving computer will
> receive additional garbage bytes in the receive buffer. I'm not
> exactly sure why this happens. To fix this, I used a decreasing
> receive buffer size on each call, to match the remaining bytes on the
> message, rather than the usual fixed "buffersize" value I normally
> pass to receive. But I've never seen this technique done anywhere
> else. Usually, people just specify some fixed buffer size value for
> the recv() function, and the recv function never sends any more data
> than it receives from the socket anyway. But if I do this, I receive
> garbage data sometimes.
Are you behind NAT? In my environment, the server will send 4 blank
chars to the client to keep the NAT window open.
| |
| chsalvia@gmail.com 2007-03-27, 1:19 pm |
| > However, IMO it is poor design to use the buffer level for this
> purpose. If the protocol says I should be receiving 10 bytes, I want
> to be dealing with the number 10, not 10 plus however many bytes were
> already in the buffer.
>
> So I think you should stop ignoring the return value from
> 'buffer.recv' and use it for its intended purpose -- to tell you how
> many bytes you received so you know how many you still need to
> receive.
>
> DS
David, are you suggesting something more like this:
/* Read 4 bytes to obtain size of recv'd data */
do {
switch (buffer.recv(new_socket, sizeof(uint32_t))) {
case -1: return SOCKET_ERR;
case 0: return CLOSE_SOCKET;
default: break;
}
} while (buffer.used_bytes < sizeof(uint32_t));
memcpy(&recv_size, buffer.data, sizeof(uint32_t));
uint32_t remaining = recv_size - sizeof(uint32_t);
uint32_t next;
do {
switch (next = buffer.recv(new_socket, remaining)) {
case -1: return SOCKET_ERR;
case 0: return CLOSE_SOCKET;
default: break;
}
remaining -= next;
} while (buffer.used_bytes < recv_size);
This is how I originally coded it. This way, I only recv() the amount
of bytes that I expect should be remaining. This seems to be what
you're saying to do. But this brings me back to my original
question. All the literature on recv() never indicates you should do
this. And others here have indicated that recv() will never actually
recv() more than was sent. So even if I do something like:
buffer.recv(new_sock, 10000000);
....I'll still only recv the amount that was sent. So I shouldn't need
to keep subtracting the amount received in each iteration from the
value I pass to recv().
| |
| David Schwartz 2007-03-28, 1:20 am |
| On Mar 27, 10:41 am, chsal...@gmail.com wrote:
> David, are you suggesting something more like this:
>
> /* Read 4 bytes to obtain size of recv'd data */
> do {
> switch (buffer.recv(new_socket, sizeof(uint32_t))) {
> case -1: return SOCKET_ERR;
> case 0: return CLOSE_SOCKET;
> default: break;
> }
> } while (buffer.used_bytes < sizeof(uint32_t));
>
> memcpy(&recv_size, buffer.data, sizeof(uint32_t));
>
> uint32_t remaining = recv_size - sizeof(uint32_t);
> uint32_t next;
>
> do {
> switch (next = buffer.recv(new_socket, remaining)) {
> case -1: return SOCKET_ERR;
> case 0: return CLOSE_SOCKET;
> default: break;
> }
> remaining -= next;
> } while (buffer.used_bytes < recv_size);
> This is how I originally coded it. This way, I only recv() the amount
> of bytes that I expect should be remaining. This seems to be what
> you're saying to do. But this brings me back to my original
> question. All the literature on recv() never indicates you should do
> this. And others here have indicated that recv() will never actually
> recv() more than was sent. So even if I do something like:
>
> buffer.recv(new_sock, 10000000);
>
> ...I'll still only recv the amount that was sent. So I shouldn't need
> to keep subtracting the amount received in each iteration from the
> value I pass to recv().
This is not a particularly good solution either. There are two
problems, one in complexity and one in performance. In complexity, it
requires two loops, one to get all the length bytes and one to get all
the data. In performance, it will require at least two call to recv
even if all the data is there.
The basic outline of a textbook solution is this:
1) Call 'recv' passing it however much buffer space you have left.
2) If you do not have enough data to know how many bytes you need, go
to step 1.
3) Compute how many bytes you need. If you don't have that many, go to
step 1.
4) Process the bytes you have received.
5) If you do not have any leftover bytes, empty the buffer and go to
step 1.
6) If you have leftover bytes, arrange things so the buffer only has
the leftover and go to step 2.
In pseudo code, assuming the packet length encoded in the data
includes the four bytes for the length.
int packet_len(unsigned char *buf_ptr)
{ /* unpack length of packet */
int ret=buf_ptr[0];
ret<=8; ret|=buf_ptr[1];
ret<=8; ret|=buf_ptr[2];
ret<=8; ret|=buf_ptr[3];
return ret;
}
int i, buf_ptr=0, packet_len;
unsigned char buf[BUF_SIZE];
/* This takes the packet without the length header */
extern void ProcessPacket(unsigned char *data, int length);
while(1)
{
i=recv(socket, buf+buf_ptr, BUF_SIZE-buf_ptr);
if(i<=0) return; /* error/eof */
buf_ptr+=i;
while( (buf_ptr>=4) &&
(buf_ptr>=(packet_len=get_len(buf))) )
{
ProcessPacket(buf+4, packet_len-4);
if(packet_len==buf_ptr) buf_ptr=0;
else
{ /* save leftover */
memmove(buf, buf+buf_ptr, buf_ptr-packet_len);
buf_ptr-=packet_len;
}
}
}
Hope that helps.
DS
| |
| James Antill 2007-03-28, 1:18 pm |
| On Tue, 27 Mar 2007 10:41:13 -0700, chsalvia wrote:
>
> David, are you suggesting something more like this:
>
> /* Read 4 bytes to obtain size of recv'd data */ do {
> switch (buffer.recv(new_socket, sizeof(uint32_t))) {
> case -1: return SOCKET_ERR;
> case 0: return CLOSE_SOCKET;
> default: break;
> }
> } while (buffer.used_bytes < sizeof(uint32_t));
This is wrong. The common "simple" solution to this problem is something
like:
void full_read(int fd, void *buf, size_t len)
{
while (len > 0)
{ /* loop until all of the read request is done */
ssize_t ret = read(fd, buf, len);
if ((ret == -1) && (errno != EINTR))
err(EXIT_FAILURE, "write");
if (ret == -1) continue;
buf += (size_t)ret;
len -= (size_t)ret;
}
}
full_read(new_socket, &recv_size, sizeof(uint32_t));
recv_size = ntohl(recv_size);
if (recv_size < recv_size) /* err */ ;
recv_size -= sizeof(uint32_t);
if (recv_size > buf_size) /* err */ ;
if (!recv_size) /* nothing */ ;
full_read(new_socket, buf, recv_size);
/* deal with data in buf */
> This is how I originally coded it. This way, I only recv() the amount
> of bytes that I expect should be remaining. This seems to be what
> you're saying to do. But this brings me back to my original question.
> All the literature on recv() never indicates you should do this. And
> others here have indicated that recv() will never actually recv() more
> than was sent. So even if I do something like:
>
> buffer.recv(new_sock, 10000000);
>
> ...I'll still only recv the amount that was sent. So I shouldn't need
> to keep subtracting the amount received in each iteration from the value
> I pass to recv().
You don't have to worry about receiving more than was sent, but you have
to worry about two things:
1. receiving less than was sent, data can be split into packets at any
boundary ... so at any point you might only get a single byte from recv().
2. receiving more than was sent in a single write, data can be combined
into packets at any boundary ... so at any point you might get more data
than you "want".
....if the literature you are reading doesn't talk about those two points,
you need better literature.
--
James Antill -- james@and.org
http://www.and.org/and-httpd/ -- $2,000 security guarantee
http://www.and.org/vstr/
| |
| chsalvia@gmail.com 2007-03-29, 1:21 am |
| Thanks for the example code. Should the send() side do a similar loop?
| |
| David Schwartz 2007-03-29, 7:19 am |
| On Mar 28, 6:51 pm, chsal...@gmail.com wrote:
> Thanks for the example code. Should the send() side do a similar loop?
Have a look at the code I posted in my Mar 27 reply. It's simpler, I
think, and saves the extra calls to 'recv'.
Sending is much simpler because you always know how many bytes you
need to send. Just try avoid sending the length bytes in a separate
call to 'send' from the payload -- you want to do it in one shot or
you will tremendously increase the time it takes to send short
messages.
DS
| |
| James Antill 2007-03-29, 1:24 pm |
| On Thu, 29 Mar 2007 00:07:35 -0700, David Schwartz wrote:
> On Mar 28, 6:51 pm, chsal...@gmail.com wrote:
>
> Have a look at the code I posted in my Mar 27 reply. It's simpler, I
> think, and saves the extra calls to 'recv'.
I think you mean it's closer in style to the more efficient
implementations, and so is familiar for people well versed in the art
(like yourself). However it contains much more pointer arithmetic and you
have to think about multiple parts of the problem at once, which can be a
significant problem for people not well versed in the art.
I agree it's worth looking at something like that to see where you could
go to improve performance ... but that specific design has very bad worst
case behaviour when you get a lot of data at once (the memmove dominates).
See: http://www.and.org/texts/network_io ... and I'll say again much
like if you want to use a decent RDBMS, understanding how something works
is good but just reusing known working and efficient code is almost
certainly the best thing to do.
Your code was also buggy as hell, which kind of proves my point. The
ones I spotted:
1. packet_len uses <= instead of <<=
2. One side sends two "msgs" and then waits for an answer it deadlocks.
3. memmove uses an offset of buf_ptr instead of packet_len.
4. handling of corrupted data via. packet_len being too big, is not good
(negative values might even be exploitable).
--
James Antill -- james@and.org
http://www.and.org/and-httpd/ -- $2,000 security guarantee
http://www.and.org/vstr/
| |
| David Schwartz 2007-03-30, 1:18 am |
| On Mar 29, 9:45 am, James Antill <james-netn...@and.org> wrote:
> Your code was also buggy as hell, which kind of proves my point. The
> ones I spotted:
>
> 1. packet_len uses <= instead of <<=
> 2. One side sends two "msgs" and then waits for an answer it deadlocks.
> 3. memmove uses an offset of buf_ptr instead of packet_len.
> 4. handling of corrupted data via. packet_len being too big, is not good
> (negative values might even be exploitable).
All the code posted in this thread has bugs of that type, but you have
a point.
DS
|
|
|
|
|