Unix Programming - preventing data from being mangled

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > January 2004 > preventing data from being mangled





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author preventing data from being mangled
Mantorok Redgormor

2004-01-23, 5:20 pm

if I am using read with a 513 byte buffer to compensate for longest
messages specified by the irc protocol, and I connect to a server
which produces a motd, how can I prevent my data from being mangled?

This is very annoying. I don't want to get data a byte at a time
or have to use an incredibly large buffer. I want my buffer to be just
large enough, inaccordance to the irc protocol. Or do I have to use
an extremely large buffer? even a buffer of 2048 mangles the data
if the motd is large enough.

is there a way around this so I can parse the messages appropriately?

with a 513 byte buffer, and calling "read", it will fill up the entire
buffer or depends on how much data is to be read. So I might have the
end
of a message from the server towards the end of the buffer and after
that message, it might fit in the beginning of another message which
comes
after the \r\n of the previous message

so my buffer would end up like this(hypothetically)

this is a message from the server\r\nthis is\0

then a second call to "read" would produce:

the second message\r\n0

this is where it gets mangled and I can't parse messages appropriately
unless like I have already said, I use a rather large buffer size.

there doesn't seem to be a way to prevent this if I am using a buffer
of
513 bytes. Anyone have any ideas? maybe I am missing something.

also, this only happens with a motd. Everything works fine after
connected,
it is just that it mangles all the data from the motd.

--
nethlek
DINH Viet Hoa

2004-01-23, 5:20 pm

nethlek@tokyo.com wrote :
quote:

> if I am using read with a 513 byte buffer to compensate for longest
> messages specified by the irc protocol, and I connect to a server
> which produces a motd, how can I prevent my data from being mangled?



you have to handle the case of mangled data.
read() can return (except 0 which mean that the connection closed)
whatever size the system wants, since it can be read.
If the remote server sends you 1024 bytes for example, read() can return a
value between 1 and 1024.

You have to handle the case and concatenate the returned buffers.
quote:

> --
> nethlek



signature is preceded by "-- "
^
note the space here ----+

--
DINH V. Hoa,

"monde de merde" -- Erwan David

Corey Murtagh

2004-01-23, 5:20 pm

Mantorok Redgormor wrote:
quote:

> if I am using read with a 513 byte buffer to compensate for longest
> messages specified by the irc protocol, and I connect to a server
> which produces a motd, how can I prevent my data from being mangled?
>


<snip>
quote:

>
> with a 513 byte buffer, and calling "read", it will fill up the entire
> buffer or depends on how much data is to be read. So I might have the
> end
> of a message from the server towards the end of the buffer and after
> that message, it might fit in the beginning of another message which
> comes
> after the \r\n of the previous message



The key is to read the incoming data and break it off into a list of
messages. In this case, the messages will be easy to split up, since
they're just CRLF-terminated text lines.

You'll need a container to store the incoming messages (lines) and a
string to store the current line (for incomplete lines). If you're
using C++ I'd go for a std::vector<std::string> for the list and
std::string for the current line object.

So... call read() to get the next chunk of data and scan it for CRLF
pairs. You can do it one char at a time, or line at a time, or whatever
you prefer. Just remember to keep the partial lines and add to them at
the start of the next block.

--
Corey Murtagh
The Electric Monk
"Quidquid latine dictum sit, altum viditur!"

David Schwartz

2004-01-23, 5:20 pm


"Corey Murtagh" <emonk@slingshot.co.nz.no.uce> wrote in message
news:1068756056.170876@radsrv1.tranzpeer.net...
quote:

> So... call read() to get the next chunk of data and scan it for CRLF
> pairs. You can do it one char at a time, or line at a time, or whatever
> you prefer. Just remember to keep the partial lines and add to them at
> the start of the next block.



What I like to do is use a reasonably sized buffer (4Kb to 16Kb) and
keep a count of how many bytes are in it. Call 'read' passing it a pointer
to the first empty byte of the buffer and the number of bytes of space left
in the buffer.

Then scan the buffer for complete lines and pass them on to the parser.
Any leftover is moved to the beginning of the buffer and the count set to
the number of bytes of leftover.

This is more efficient than one might at first expect because the vast
majority of the time, there is no leftover and hence no copy.

DS


Corey Murtagh

2004-01-23, 5:20 pm

David Schwartz wrote:
quote:

> "Corey Murtagh" <emonk@slingshot.co.nz.no.uce> wrote in message
> news:1068756056.170876@radsrv1.tranzpeer.net...
>
>
> What I like to do is use a reasonably sized buffer (4Kb to 16Kb) and
> keep a count of how many bytes are in it. Call 'read' passing it a pointer
> to the first empty byte of the buffer and the number of bytes of space left
> in the buffer.
>
> Then scan the buffer for complete lines and pass them on to the parser.
> Any leftover is moved to the beginning of the buffer and the count set to
> the number of bytes of leftover.



Personally I just find it simpler to transfer the remainder into a
string that I can append to next time through the loop. Depending on
how you do it, and how often, it shouldn't be terribly inefficient. The
bigger the buffer the better though, since you want to eliminate as many
part-message copies, and as few part-reads, as possible.

Of course if the string class in inefficient, my way will suck arse
compared to than yours

--
Corey Murtagh
The Electric Monk
"Quidquid latine dictum sit, altum viditur!"

David Schwartz

2004-01-23, 5:20 pm


"Corey Murtagh" <emonk@slingshot.co.nz.no.uce> wrote in message
news:1068784385.758069@radsrv1.tranzpeer.net...
quote:

> Personally I just find it simpler to transfer the remainder into a
> string that I can append to next time through the loop. Depending on
> how you do it, and how often, it shouldn't be terribly inefficient. The
> bigger the buffer the better though, since you want to eliminate as many
> part-message copies, and as few part-reads, as possible.


quote:

> Of course if the string class in inefficient, my way will suck arse
> compared to than yours



I would suspect your way is likely to be less efficient than mine, but
the inefficiency would be in an infrequently used code path. Most of the
time, the data you read will consist of some number of complete lines.

You may be surprised to find that the buffer size doesn't significantly
affect the number of part reads. Generally the client outpaces the server
(since it has little else to do) and the network is slower than either
(assuming the server isn't on the same LAN as the client). So the program
will wind up receiving data roughly in chunks the same size as the packets
flowing over the network, assuming the buffer exceeds the MTU.

In the rare case where the client doesn't get back to calling read or
receive for two packets, the buffer size won't matter so long as it's at
least twice the MTU. 4Kb is more than twice the typical MTU for ethernet. So
an 64Kb buffer won't receive more data than a 4Kb buffer unless you don't
get around to calling read/recv for the time it takes 3 packets to traverse
the network.

DS


Corey Murtagh

2004-01-23, 5:21 pm

David Schwartz wrote:
quote:

> "Corey Murtagh" <emonk@slingshot.co.nz.no.uce> wrote in message
> news:1068784385.758069@radsrv1.tranzpeer.net...


<snip>
quote:

>
> I would suspect your way is likely to be less efficient than mine, but
> the inefficiency would be in an infrequently used code path. Most of the
> time, the data you read will consist of some number of complete lines.



Yes, in the general case a message will arrive complete into the
buffer... assuming you're polling the connection often enough. It's
only the special case that we need to worry about, and that case
/should/ be rare.
quote:

> You may be surprised to find that the buffer size doesn't significantly
> affect the number of part reads. Generally the client outpaces the server
> (since it has little else to do) and the network is slower than either
> (assuming the server isn't on the same LAN as the client). So the program
> will wind up receiving data roughly in chunks the same size as the packets
> flowing over the network, assuming the buffer exceeds the MTU.



No, the buffer size is fairly inconsequential, so long as it's at least
large enough to hold the maximum message size. Actually, if you're
using a seperate part-message buffer (which in effect is what I'm
suggesting) then you can have any size buffer you want... just so long
as you don't care about how many part-message reads you do

That said, a larger buffer means fewer reads when you've got a backlog
waiting in the TCP stack's buffers, which means a few more clock cycles
to devote to processing. Large buffers are A Good Thing... up to a point


--
Corey Murtagh
The Electric Monk
"Quidquid latine dictum sit, altum viditur!"

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com