Unix Programming - Periodic calls to sendto() vs bursts

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > May 2006 > Periodic calls to sendto() vs bursts





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Periodic calls to sendto() vs bursts
Spoon

2006-05-21, 7:14 pm

Hello everyone,

I've run into something I cannot explain.

I have a PCI card producing data at 38 Mbit/s in an x86 PC.

If my process explicitly sleeps until enough data is available,
CPU utilization is 0. Note that even if the process requests a 1 ms
sleep, the kernel will actually make it sleep much longer (10-20 ms)
therefore the process will send data in bursts.

(PACKET_SIZE = 1316 bytes)

while ( 1 )
{
while (get_load(PCI_card) < PACKET_SIZE)
usleep(1000);
Read(PCI_card, buf, PACKET_SIZE);
sendto(sock, buf, PACKET_SIZE, 0, &addr, sizeof addr);
++count;
}

BUFFER=1316 BYTES AND SLEEP
COUNT=361090 packets
0.00user 0.01system 1:40.01elapsed 0%CPU


But, if my process does not explicitly sleep, it is blocked in the
device driver, waiting for enough data to become available. The result
is that packets are sent periodically, at regular intervals. The strange
thing is that, even though I send *the same number of packets*, CPU
utilization is much higher in this case...

while ( 1 )
{
Read(PCI_card, buf, PACKET_SIZE);
sendto(sock, buf, PACKET_SIZE, 0, &addr, sizeof addr);
++count;
}

BUFFER=1316 BYTES
COUNT=361106 packets
0.12user 2.47system 1:40.01elapsed 2%CPU

Also note that if I don't perform the sendto() operation,
CPU utilization is also slightly different:

BUFFER=1316
COUNT=108346
0.02user 0.20system 0:30.01elapsed 0%CPU

BUFFER=1316 AND SLEEP
COUNT=108326
0.00user 0.01system 0:30.01elapsed 0%CPU

In other words, when I send packets periodically, the sendto() call adds
two seconds. Can anyone see the reason for this? Could this be a DMA
issue? zero-copy or scatter/gather issue? Something else?

(My kernel is Linux 2.6.14)

Regards.
davids@webmaster.com

2006-05-21, 7:14 pm

Your 'sendto' is a blocking operation. It can take as long as it wants
to. My bet is that the delay is caused by the receving rate of whatever
you are sending data to.

What does the 'sendto' actually do? Is the TCP? UDP? Local? Remote? Or
what?

DS

Rick Jones

2006-05-22, 1:15 pm

Perhaps when your application sleeps and so bursts there are fewer
interrupts. You might check /proc/interrupts before and after your
test to see if there is a difference in the counts.

You could also consider taking a CPU profile of the system (include
the kernel) to see where time is spent in each case. If the
utilization is low the profile may have to be for a longer time of
course.

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Spoon

2006-05-22, 7:15 pm

Rick Jones wrote:
> Perhaps when your application sleeps and so bursts there are fewer
> interrupts. You might check /proc/interrupts before and after your
> test to see if there is a difference in the counts.


Excellent suggestion. I'll take a look tomorrow.

> You could also consider taking a CPU profile of the system (include
> the kernel) to see where time is spent in each case.


It seems the generic slackware kernel does not include oprofile :-(
Spoon

2006-05-22, 7:15 pm

davids wrote:

> Your 'sendto' is a blocking operation. It can take as long as it wants
> to. My bet is that the delay is caused by the receving rate of whatever
> you are sending data to.


(You snipped too much. There is no context left.)

You seem to have misunderstood the output of /usr/bin/time

0.12user 2.47system 1:40.01elapsed 2%CPU

means the process ran for 100 seconds, and 2.47 seconds (id est 7e9
cycles) were actively spent inside the kernel on behalf of the process.
Blocking in sendto() does not spin the CPU...

> What does the 'sendto' actually do? Is the TCP? UDP? Local? Remote?
> Or what?


It sends a UDP datagram to another PC on the same Ethernet LAN.
Spoon

2006-05-23, 7:17 pm

Rick Jones wrote:

> Perhaps when your application sleeps and so bursts there are fewer
> interrupts. You might check /proc/interrupts before and after your
> test to see if there is a difference in the counts.


Bingo.

Over a 3-minute run, with the call to usleep() thus *with* bursts:

timer IRQ = 45002 (does this mean HZ=1500?!?)
packet count = 649981 (21666 packets per second)
PCI card IRQ = 653194 (~one per packet)
eth0 IRQ = 125184 (5.2 packets per IRQ)
CPU time = 0%

without usleep() thus with regular calls to sendto():

timer IRQ = 45003
packet count = 650006
PCI card IRQ = 653343 (still ~one per packet)
eth0 IRQ = 650219 (~one IRQ per packet)
CPU time = 2.5%

The IRQ count is 5.2 times higher in the second case! I think there is
an optimization for bursts in the Ethernet device driver: when it gets a
frame to send, it waits "a little bit" to see if another frame needs to
be sent shortly afterwards.

Is that explanation reasonable? Would 525000 executions of the eth0
interrupt handler account for ~13e9 cycles? I think my kernel is not
configured to use the modern APIC, because /proc/interrupts only
mentions XT-PIC. Robert Redelmeier once wrote XT-PIC was slower than
APIC. Is there any reason why my generic kernel is not configured to use
APIC by default?

The driver's source code is here:
http://www.kernel.org/hg/linux-2.6/.../net/eepro100.c

with some documentation available here:
http://www.scyld.com/eepro100.html

When sending packets, why would an Ethernet device raise an IRQ? To tell
the OS it is ready to accept more packets? Would it be possible to
completely turn "send IRQs" off and have the device driver check whether
the card can accept a new packet when it needs to send one?

Regards.
Spoon

2006-05-29, 5:32 pm

Spoon wrote:

> Rick Jones wrote:
>
>
> Bingo.
>
> Over a 3-minute run, with the call to usleep() thus *with* bursts:
>
> timer IRQ = 45002 (does this mean HZ=1500?!?)


Doh! 3 min = 180 s
45002 IRQs in 3 min => 250 IRQs per second
thus HZ=250 and timeslice = 4 ms

> packet count = 649981 (21666 packets per second)


I meant ~3611 packets per second.

> PCI card IRQ = 653194 (~one per packet)
> eth0 IRQ = 125184 (5.2 packets per IRQ)
> CPU time = 0%
>
> without usleep() thus with regular calls to sendto():
>
> timer IRQ = 45003
> packet count = 650006
> PCI card IRQ = 653343 (still ~one per packet)
> eth0 IRQ = 650219 (~one IRQ per packet)
> CPU time = 2.5%
>
> The IRQ count is 5.2 times higher in the second case! I think there is
> an optimization for bursts in the Ethernet device driver: when it gets a
> frame to send, it waits "a little bit" to see if another frame needs to
> be sent shortly afterwards.
>
> Is that explanation reasonable? Would 525000 executions of the eth0
> interrupt handler account for ~13e9 cycles? I think my kernel is not
> configured to use the modern APIC, because /proc/interrupts only
> mentions XT-PIC. Robert Redelmeier once wrote XT-PIC was slower than
> APIC. Is there any reason why my generic kernel is not configured to use
> APIC by default?
>
> The driver's source code is here:
> http://www.kernel.org/hg/linux-2.6/.../net/eepro100.c
>
> with some documentation available here:
> http://www.scyld.com/eepro100.html
>
> When sending packets, why would an Ethernet device raise an IRQ? To tell
> the OS it is ready to accept more packets? Would it be possible to
> completely turn "send IRQs" off and have the device driver check whether
> the card can accept a new packet when it needs to send one?


Would anybody care to comment?
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com