Help troubleshooting high interrupt problem
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Unix and Linux reviews > Free Debian support > Linux Debian support > Help troubleshooting high interrupt problem




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    Help troubleshooting high interrupt problem  
Ian East


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
12-07-05 12:52 PM


Hi,

I have 10 machines in a cluster.  All are exactly the same hardware
and running debian-sarge.  For 9 of them, the baseline stats are
within about 5-10% of each other which is fairly normal.  However, one
of them has a CPU utilization and load average 10 times higher than
the others.  Upon some investigation with vmstat, I discovered this
machine has an interrupt rate about 4 times as high as the others.

My question is, how can I troubleshoot the device that's causing this
problem?  I checked all of the parameters with sysctl and nothing is
too out of the ordinary.  The vmstat parameters were also all
resonably close aside from CPU utilization and interrupt rate.  Even
when the machine is relatively idle the CPU still hovers around 35%
use by system processes.  The other machines would be less than 1%
utilized.

This is really driving me crazy and I need to know if it's a hardware
problem so I can return it before the warranty expires.

Thanks for any help.





[ Post a follow-up to this message ]



    Re: Help troubleshooting high interrupt problem  
Ian East


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
12-07-05 12:52 PM

On Wed, 07 Dec 2005 02:25:17 -0800, Ian East <ian.east@gmail> wrote:

>
>Hi,
>
>I have 10 machines in a cluster.  All are exactly the same hardware
>and running debian-sarge.  For 9 of them, the baseline stats are
>within about 5-10% of each other which is fairly normal.  However, one
>of them has a CPU utilization and load average 10 times higher than
>the others.  Upon some investigation with vmstat, I discovered this
>machine has an interrupt rate about 4 times as high as the others.
>
>My question is, how can I troubleshoot the device that's causing this
>problem?  I checked all of the parameters with sysctl and nothing is
>too out of the ordinary.  The vmstat parameters were also all
>resonably close aside from CPU utilization and interrupt rate.  Even
>when the machine is relatively idle the CPU still hovers around 35%
>use by system processes.  The other machines would be less than 1%
>utilized.
>
>This is really driving me crazy and I need to know if it's a hardware
>problem so I can return it before the warranty expires.
>
>Thanks for any help.

I have discovered sysstat and have a little more info.  I take it this
machine is toast.  Both machines were practically idle.

This is a normal machine:
#sar -u -I XALL 30 1
Linux 2.4.26-1-686-smp (cow25)     12/07/05
Average:         CPU     %user     %nice   %system   %iowait     %idle
Average:         all      0.50      0.00      0.50      0.00     99.00

Average:         INTR    intr/s
Average:          14      5.20
Average:          54    593.00
Average:          55    347.00


This is the funky machine:
Average:        CPU     %user     %nice   %system   %iowait     %idle
Average:        all      1.46      0.00     38.67      0.00     59.87

Average:         INTR    intr/s
Average:           14      4.60
Average:           16  49268.90
Average:           18  49816.20
Average:           19  49961.90
Average:           54 341781.00
Average:           55 342663.20

Here are the devices... The machines are identical:
# lspci -v
0000:00:00.0 Host bridge: Intel Corp. Server Memory Controller Hub
(rev 0c)
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, fast devsel, latency 0
Capabilities: [40] #09 [4105]

0000:00:00.1 ff00: Intel Corp. Memory Controller Hub Error Reporting
Register (rev 0c)
Subsystem: Intel Corp.: Unknown device 1079
Flags: fast devsel

0000:00:01.0 System peripheral: Intel Corp. Memory Controller Hub DMA
Controller (rev 0c)
Subsystem: Intel Corp.: Unknown device 1079
Flags: fast devsel, IRQ 16
Memory at fcdff000 (32-bit, non-prefetchable) [disabled] [size=4K]
Capabilities: [b0] Message Signalled Interrupts: 64bit- Queue=0/1
Enable-

0000:00:02.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express
Port A0 (rev 0c) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fce00000-fcffffff
Capabilities: [50] Power Management version 2
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
Enable-
Capabilities: [64] #10 [0041]

0000:00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #1 (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, medium devsel, latency 0, IRQ 16
I/O ports at c800 [size=32]

0000:00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #2 (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, medium devsel, latency 0, IRQ 19
I/O ports at c880 [size=32]

0000:00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #3 (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, medium devsel, latency 0, IRQ 18
I/O ports at cc00 [size=32]

0000:00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2
EHCI Controller (rev 02) (prog-if 20 [EHCI])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, medium devsel, latency 0, IRQ 23
Memory at fcdfec00 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Capabilities: [58] #0a [20a0]

0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2)
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=04, subordinate=04, sec-latency=32
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: fd000000-febfffff

0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC
Bridge (rev 02)
Flags: bus master, medium devsel, latency 0

0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150
Storage Controller (rev 02) (prog-if 8a [Master SecP PriP])
Subsystem: Intel Corp.: Unknown device 3437
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 18
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at fc00 [size=16]

0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus
Controller (rev 02)
Subsystem: Intel Corp.: Unknown device 1079
Flags: medium devsel, IRQ 17
I/O ports at 0540 [size=32]

0000:01:00.0 PCI bridge: Intel Corp. PCI Bridge Hub A (rev 09)
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=01, secondary=02, subordinate=02, sec-latency=64
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
Capabilities: [6c] Power Management version 2
Capabilities: [d8] PCI-X bridge device.

0000:01:00.1 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt
Controller A (rev 09) (prog-if 20 [IO(X)-APIC])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, fast devsel, latency 0
Memory at fcefe000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2

0000:01:00.2 PCI bridge: Intel Corp. PCI Bridge Hub B (rev 09)
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=01, secondary=03, subordinate=03, sec-latency=64
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fcf00000-fcffffff
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
Capabilities: [6c] Power Management version 2
Capabilities: [d8] PCI-X bridge device.

0000:01:00.3 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt
Controller B (rev 09) (prog-if 20 [IO(X)-APIC])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, fast devsel, latency 0
Memory at fceff000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2

0000:03:04.0 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet
Controller (rev 03)
Subsystem: Intel Corp. PRO/1000 MT Dual Port Network Connection
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 54
Memory at fcfa0000 (64-bit, non-prefetchable) [size=128K]
I/O ports at d880 [size=64]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device.
Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-

0000:03:04.1 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet
Controller (rev 03)
Subsystem: Intel Corp. PRO/1000 MT Dual Port Network Connection
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 55
Memory at fcfe0000 (64-bit, non-prefetchable) [size=128K]
I/O ports at dc00 [size=64]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device.
Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-

0000:04:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL
(rev 27) (prog-if 00 [VGA])
Subsystem: Intel Corp.: Unknown device 1079
Flags: bus master, stepping, medium devsel, latency 64, IRQ 17
Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
I/O ports at e800 [size=256]
Memory at febff000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at febc0000 [disabled] [size=128K]
Capabilities: [5c] Power Management version 2






[ Post a follow-up to this message ]



    Re: Help troubleshooting high interrupt problem  
Bill Marcum


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
12-07-05 10:48 PM

On Wed, 07 Dec 2005 02:25:17 -0800, Ian East
<ian.east@gmail> wrote:
>
> Hi,
>
> I have 10 machines in a cluster.  All are exactly the same hardware
> and running debian-sarge.  For 9 of them, the baseline stats are
> within about 5-10% of each other which is fairly normal.  However, one
> of them has a CPU utilization and load average 10 times higher than
> the others.  Upon some investigation with vmstat, I discovered this
> machine has an interrupt rate about 4 times as high as the others.
>
> My question is, how can I troubleshoot the device that's causing this
> problem?  I checked all of the parameters with sysctl and nothing is
> too out of the ordinary.  The vmstat parameters were also all
> resonably close aside from CPU utilization and interrupt rate.  Even
> when the machine is relatively idle the CPU still hovers around 35%
> use by system processes.  The other machines would be less than 1%
> utilized.
>
> This is really driving me crazy and I need to know if it's a hardware
> problem so I can return it before the warranty expires.
>
> Thanks for any help.

/proc/interrupts shows a count for each interrupt source.


--
I don't understand the HUMOUR of the THREE STOOGES!!





[ Post a follow-up to this message ]



    Re: Help troubleshooting high interrupt problem  
adminskynet


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
12-08-05 07:46 AM

It seems that the crazy machine is using USB ???
Interrupts 16,18,19 are USB related.

Same hardware but not same software configuration ?







[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 07:24 AM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register