|
Home > Archive > Linux Debian support > December 2005 > Help troubleshooting high interrupt problem
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Help troubleshooting high interrupt problem
|
|
| Ian East 2005-12-07, 7:52 am |
|
Hi,
I have 10 machines in a cluster. All are exactly the same hardware
and running debian-sarge. For 9 of them, the baseline stats are
within about 5-10% of each other which is fairly normal. However, one
of them has a CPU utilization and load average 10 times higher than
the others. Upon some investigation with vmstat, I discovered this
machine has an interrupt rate about 4 times as high as the others.
My question is, how can I troubleshoot the device that's causing this
problem? I checked all of the parameters with sysctl and nothing is
too out of the ordinary. The vmstat parameters were also all
resonably close aside from CPU utilization and interrupt rate. Even
when the machine is relatively idle the CPU still hovers around 35%
use by system processes. The other machines would be less than 1%
utilized.
This is really driving me crazy and I need to know if it's a hardware
problem so I can return it before the warranty expires.
Thanks for any help.
| |
| Ian East 2005-12-07, 7:52 am |
| On Wed, 07 Dec 2005 02:25:17 -0800, Ian East <ian.east@gmail> wrote:
>
>Hi,
>
>I have 10 machines in a cluster. All are exactly the same hardware
>and running debian-sarge. For 9 of them, the baseline stats are
>within about 5-10% of each other which is fairly normal. However, one
>of them has a CPU utilization and load average 10 times higher than
>the others. Upon some investigation with vmstat, I discovered this
>machine has an interrupt rate about 4 times as high as the others.
>
>My question is, how can I troubleshoot the device that's causing this
>problem? I checked all of the parameters with sysctl and nothing is
>too out of the ordinary. The vmstat parameters were also all
>resonably close aside from CPU utilization and interrupt rate. Even
>when the machine is relatively idle the CPU still hovers around 35%
>use by system processes. The other machines would be less than 1%
>utilized.
>
>This is really driving me crazy and I need to know if it's a hardware
>problem so I can return it before the warranty expires.
>
>Thanks for any help.
I have discovered sysstat and have a little more info. I take it this
machine is toast. Both machines were practically idle.
This is a normal machine:
#sar -u -I XALL 30 1
Linux 2.4.26-1-686-smp (cow25) 12/07/05
Average: CPU %user %nice %system %iowait %idle
Average: all 0.50 0.00 0.50 0.00 99.00
Average: INTR intr/s
Average: 14 5.20
Average: 54 593.00
Average: 55 347.00
This is the funky machine:
Average: CPU %user %nice %system %iowait %idle
Average: all 1.46 0.00 38.67 0.00 59.87
Average: INTR intr/s
Average: 14 4.60
Average: 16 49268.90
Average: 18 49816.20
Average: 19 49961.90
Average: 54 341781.00
Average: 55 342663.20
Here are the devices... The machines are identical:
# lspci -v
0000:00:00.0 Host bridge: Intel Corp. Server Memory Controller Hub
(rev 0c)
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, fast devsel, latency 0
Capabilities: [40] #09 [4105]
0000:00:00.1 ff00: Intel Corp. Memory Controller Hub Error Reporting
Register (rev 0c)
Subsystem: Intel Corp.: Unknown device 1079
Flags: fast devsel
0000:00:01.0 System peripheral: Intel Corp. Memory Controller Hub DMA
Controller (rev 0c)
Subsystem: Intel Corp.: Unknown device 1079
Flags: fast devsel, IRQ 16
Memory at fcdff000 (32-bit, non-prefetchable) [disabled] [size=4K]
Capabilities: [b0] Message Signalled Interrupts: 64bit- Queue=0/1
Enable-
0000:00:02.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express
Port A0 (rev 0c) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fce00000-fcffffff
Capabilities: [50] Power Management version 2
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
Enable-
Capabilities: [64] #10 [0041]
0000:00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #1 (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, medium devsel, latency 0, IRQ 16
I/O ports at c800 [size=32]
0000:00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #2 (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, medium devsel, latency 0, IRQ 19
I/O ports at c880 [size=32]
0000:00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #3 (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, medium devsel, latency 0, IRQ 18
I/O ports at cc00 [size=32]
0000:00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2
EHCI Controller (rev 02) (prog-if 20 [EHCI])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, medium devsel, latency 0, IRQ 23
Memory at fcdfec00 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Capabilities: [58] #0a [20a0]
0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2)
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=04, subordinate=04, sec-latency=32
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: fd000000-febfffff
0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC
Bridge (rev 02)
Flags: bus master, medium devsel, latency 0
0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150
Storage Controller (rev 02) (prog-if 8a [Master SecP PriP])
Subsystem: Intel Corp.: Unknown device 3437
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 18
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at fc00 [size=16]
0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus
Controller (rev 02)
Subsystem: Intel Corp.: Unknown device 1079
Flags: medium devsel, IRQ 17
I/O ports at 0540 [size=32]
0000:01:00.0 PCI bridge: Intel Corp. PCI Bridge Hub A (rev 09)
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=01, secondary=02, subordinate=02, sec-latency=64
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
Capabilities: [6c] Power Management version 2
Capabilities: [d8] PCI-X bridge device.
0000:01:00.1 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt
Controller A (rev 09) (prog-if 20 [IO(X)-APIC])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, fast devsel, latency 0
Memory at fcefe000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2
0000:01:00.2 PCI bridge: Intel Corp. PCI Bridge Hub B (rev 09)
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=01, secondary=03, subordinate=03, sec-latency=64
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fcf00000-fcffffff
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
Capabilities: [6c] Power Management version 2
Capabilities: [d8] PCI-X bridge device.
0000:01:00.3 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt
Controller B (rev 09) (prog-if 20 [IO(X)-APIC])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, fast devsel, latency 0
Memory at fceff000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2
0000:03:04.0 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet
Controller (rev 03)
Subsystem: Intel Corp. PRO/1000 MT Dual Port Network Connection
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 54
Memory at fcfa0000 (64-bit, non-prefetchable) [size=128K]
I/O ports at d880 [size=64]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device.
Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
0000:03:04.1 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet
Controller (rev 03)
Subsystem: Intel Corp. PRO/1000 MT Dual Port Network Connection
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 55
Memory at fcfe0000 (64-bit, non-prefetchable) [size=128K]
I/O ports at dc00 [size=64]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device.
Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
0000:04:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL
(rev 27) (prog-if 00 [VGA])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, stepping, medium devsel, latency 64, IRQ 17
Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
I/O ports at e800 [size=256]
Memory at febff000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at febc0000 [disabled] [size=128K]
Capabilities: [5c] Power Management version 2
| |
| Bill Marcum 2005-12-07, 5:48 pm |
| On Wed, 07 Dec 2005 02:25:17 -0800, Ian East
<ian.east@gmail> wrote:
>
> Hi,
>
> I have 10 machines in a cluster. All are exactly the same hardware
> and running debian-sarge. For 9 of them, the baseline stats are
> within about 5-10% of each other which is fairly normal. However, one
> of them has a CPU utilization and load average 10 times higher than
> the others. Upon some investigation with vmstat, I discovered this
> machine has an interrupt rate about 4 times as high as the others.
>
> My question is, how can I troubleshoot the device that's causing this
> problem? I checked all of the parameters with sysctl and nothing is
> too out of the ordinary. The vmstat parameters were also all
> resonably close aside from CPU utilization and interrupt rate. Even
> when the machine is relatively idle the CPU still hovers around 35%
> use by system processes. The other machines would be less than 1%
> utilized.
>
> This is really driving me crazy and I need to know if it's a hardware
> problem so I can return it before the warranty expires.
>
> Thanks for any help.
/proc/interrupts shows a count for each interrupt source.
--
I don't understand the HUMOUR of the THREE STOOGES!!
| |
| adminskynet 2005-12-08, 2:46 am |
| It seems that the crazy machine is using USB ???
Interrupts 16,18,19 are USB related.
Same hardware but not same software configuration ?
|
|
|
|
|