 |
|
 |
|
|
 |
Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-07-06 12:22 AM
Hi,
I've been using the this very simple script for a while to do test
builds of the kernel :
#!/bin/bash
for i in $(seq 1 100); do
nice make distclean
while true; do
nice make randconfig
grep -q "CONFIG_EXPERIMENTAL=y" .config
if [ $? -eq 1 ]; then
break
fi
done
cp .config config.${i}
nice make -j3 > build.log.${i} 2>&1
done
Which has worked great in the past, but with recent kernels it has
been a sure way to cause a complete lockup within 1 hour :-(
The last kernel where I know for sure that it ran without problems is
2.6.17.13 .
The first kernel where I know for sure it caused lockups is
2.6.18-git15 . I've also tested 2.6.18-git16, 2.6.18-git21 and
2.6.19-rc1-git2 and those 3 also lock up solid.
The lockup usually happens within 30 minutes, but sometimes the box
survives longer, but I've not seen it survive for more than 60 minutes
at most.
It doesn't seem to matter if I leave it alone just building kernels or
if I use it for other purposes while building in the background - if
anything, it seems to survive longer when I do other work while it
builds.
When the lockup happens the box just freezes and doesn't respond to
anything at all. Sometimes I can reboot with alt+sysrq+b but sometimes
not even that works.
Here's exactely what I do, so you can try to reproduce :
1) boot my distro (Slackware 11.0) into runlevel 4 (multi-user with
X), using kernel 2.6.19-rc1-git2 (or one of the other "known-bad"
kernels).
2) Log in via kdm, and once I'm at my KDE desktop I start 'konsole'.
3) cd into a dir holding a fresh copy of the 2.6.19-rc1-git2 source
and run the above script from a file named build-random.sh that I have
placed in the root of the source dir and made executable.
4) wait for 0-60 minutes.
After a reboot I find nothing in the logs, so I can't give you many
hints on what goes wrong, unfortunately.
Attached you can find the config I'm using for my current
2.6.19-rc1-git2 kernel that very consistently exhibits the problem,
and below are some details about my hardware and software environment.
I've run memtest86+ for ~12hrs without problems, just to rule out bad
RAM, and I've seen nothing at all in my logs to indicate that this
should be a hardware problem. Also, the fact that if I boot into
2.6.17.13 I can run the above script for hours and hours without
problems indiates to me that this is not a hardware issue.
# uname -a
Linux dragon 2.6.19-rc1-git2 #1 SMP PREEMPT Sat Oct 7 00:30:45 CEST
2006 i686 athlon-4 i386 GNU/Linux
# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 35
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
stepping : 2
cpu MHz : 2200.149
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips : 4402.75
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 35
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
stepping : 2
cpu MHz : 2200.149
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips : 4399.53
# cat /proc/meminfo
MemTotal: 2071360 kB
MemFree: 1683228 kB
Buffers: 29092 kB
Cached: 193184 kB
SwapCached: 0 kB
Active: 165528 kB
Inactive: 141904 kB
HighTotal: 1179328 kB
HighFree: 895532 kB
LowTotal: 892032 kB
LowFree: 787696 kB
SwapTotal: 763076 kB
SwapFree: 763076 kB
Dirty: 184 kB
Writeback: 0 kB
AnonPages: 85096 kB
Mapped: 48360 kB
Slab: 66968 kB
SReclaimable: 33216 kB
SUnreclaim: 33752 kB
PageTables: 1256 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 1798756 kB
Committed_AS: 285864 kB
VmallocTotal: 114680 kB
VmallocUsed: 6344 kB
VmallocChunk: 107532 kB
# lspci -vvx
00:00.0 Host bridge: ALi Corporation M1695 K8 Northbridge [PCI Express
and HyperTransport]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Capabilities: [40] HyperTransport: Slave or Primary Interface
Command: BaseUnitID=0 UnitCnt=3 MastHost- DefDir- DUL-
Link Control 0: CFlE- CST- CFE- <LkFail- Init+ EOC-
TXO- <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
Link Config 0: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut-
LWI=16bit DwFcInEn- LWO=16bit DwFcOutEn-
Link Control 1: CFlE- CST- CFE- <LkFail- Init+ EOC-
TXO- <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
Link Config 1: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut-
LWI=8bit DwFcInEn- LWO=16bit DwFcOutEn-
Revision ID: 1.05
Link Frequency 0: 800MHz
Link Error 0: <Prot- <Ovfl- <EOC- CTLTm-
Link Frequency Capability 0: 200MHz+ 300MHz- 400MHz+
500MHz- 600MHz+ 800MHz+ 1.0GHz+ 1.2GHz+ 1.4GHz- 1.6GHz- Vend-
Feature Capability: IsocFC- LDTSTOP+ CRCTM- ECTLT- 64bA- UIDRD-
Link Frequency 1: 800MHz
Link Error 1: <Prot- <Ovfl- <EOC- CTLTm-
Link Frequency Capability 1: 200MHz+ 300MHz- 400MHz+
500MHz- 600MHz+ 800MHz+ 1.0GHz+ 1.2GHz+ 1.4GHz- 1.6GHz- Vend-
Error Handling: PFlE- OFlE- PFE- OFE- EOCFE- RFE-
CRCFE- SERRFE- CF- RE- PNFE- ONFE- EOCNFE- RNFE- CRCNFE- SERRNFE-
Prefetchable memory behind bridge Upper: 00-00
Bus Number: 00
Capabilities: [5c] HyperTransport: MSI Mapping
Capabilities: [68] HyperTransport: UnitID Clumping
Capabilities: [74] HyperTransport: Interrupt Discovery and Configuration
Capabilities: [7c] Message Signalled Interrupts: 64bit+
Queue=0/1 Enable-
Address: 00000000fee00000 Data: 0000
00: b9 10 95 16 07 00 10 00 00 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00
00:01.0 PCI bridge: ALi Corporation PCI Express Root Port (prog-if 00
[Normal decode])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: ff200000-ff2fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Message Signalled Interrupts: 64bit+
Queue=0/1 Enable-
Address: 00000000fee00000 Data: 0000
Capabilities: [58] Express Root Port (Slot+) IRQ 0
Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag+
Device: Latency L0s <64ns, L1 <1us
Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
Link: Supported Speed 2.5Gb/s, Width x16, ASPM L0s L1, Port 0
Link: Latency L0s <2us, L1 <32us
Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
Link: Speed unknown, Width x1
Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug- Surpise-
Slot: Number 0, PowerLimit 0.000000
Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
Slot: AttnInd Off, PwrInd Off, Power-
Root: Correctable- Non-Fatal- Fatal- PME-
Capabilities: [7c] HyperTransport: MSI Mapping
Capabilities: [88] HyperTransport: Revision ID: 1.05
00: b9 10 4b 52 06 01 10 00 00 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 f0 00 00 00
20: 20 ff 20 ff f0 ff 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 03 00
00:02.0 PCI bridge: ALi Corporation PCI Express Root Port (prog-if 00
[Normal decode])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
Memory behind bridge: ff300000-ff3fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Message Signalled Interrupts: 64bit+
Queue=0/1 Enable-
Address: 00000000fee00000 Data: 0000
Capabilities: [58] Express Root Port (Slot+) IRQ 0
Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag+
Device: Latency L0s <64ns, L1 <1us
Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
Link: Supported Speed 2.5Gb/s, Width x2, ASPM L0s L1, Port 0
Link: Latency L0s <2us, L1 <32us
Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
Link: Speed unknown, Width x1
Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug- Surpise-
Slot: Number 0, PowerLimit 0.000000
Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
Slot: AttnInd Off, PwrInd Off, Power-
Root: Correctable- Non-Fatal- Fatal- PME-
Capabilities: [7c] HyperTransport: MSI Mapping
Capabilities: [88] HyperTransport: Revision ID: 1.05
00: b9 10 4c 52 06 01 10 00 00 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 02 02 00 f0 00 00 00
20: 30 ff 30 ff f0 ff 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 03 00
00:04.0 Host bridge: ALi Corporation M1689 K8 Northbridge [Super K8 Sing
le Chip]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at dc000000 (32-bit, prefetchable) [size=64M]
Capabilities: [40] HyperTransport: Slave or Primary Interface
Command: BaseUnitID=4 UnitCnt=1 MastHost- DefDir- DUL-
Link Control 0: CFlE- CST- CFE- <LkFail- Init+ EOC-
TXO- <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
Link Config 0: MLWI=16bit DwFcIn- MLWO=8bit DwFcOut-
LWI=16bit DwFcInEn- LWO=8bit DwFcOutEn-
Link Control 1: CFlE- CST- CFE- <LkFail+ Init- EOC+
TXO+ <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
Link Config 1: MLWI=8bit DwFcIn- MLWO=8bit DwFcOut-
LWI=8bit DwFcInEn- LWO=8bit DwFcOutEn-
Revision ID: 1.04
Link Frequency 0: 800MHz
Link Error 0: <Prot- <Ovfl- <EOC- CTLTm-
Link Frequency Capability 0: 200MHz+ 300MHz- 400MHz+
500MHz- 600MHz+ 800MHz+ 1.0GHz- 1.2GHz- 1.4GHz- 1.6GHz- Vend-
Feature Capability: IsocFC- LDTSTOP+ CRCTM- ECTLT- 64bA- UIDRD-
Link Frequency 1: 200MHz
Link Error 1: <Prot- <Ovfl- <EOC- CTLTm-
Link Frequency Capability 1: 200MHz- 300MHz- 400MHz-
500MHz- 600MHz- 800MHz- 1.0GHz- 1.2GHz- 1.4GHz- 1.6GHz- Vend-
Error Handling: PFlE- OFlE- PFE- OFE- EOCFE- RFE-
CRCFE- SERRFE- CF- RE- PNFE- ONFE- EOCNFE- RNFE- CRCNFE- SERRNFE-
Prefetchable memory behind bridge Upper: 00-00
Bus Number: 00
Capabilities: [60] HyperTransport: Interrupt Discovery and Configuration
Capabilities: [80] AGP version 3.0
Status: RQ=28 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64-
HTrans- 64bit- FW- AGP3- Rate=x1,x2,x4
Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit-
FW- Rate=<none>
00: b9 10 89 16 06 01 10 00 00 00 00 06 00 00 00 00
10: 08 00 00 dc 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00
00:05.0 PCI bridge: ALi Corporation AGP8X Controller (prog-if 00
[Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=03, subordinate=03, sec-latency=64
Memory behind bridge: ff400000-ff4fffff
Prefetchable memory behind bridge: c7f00000-d7efffff
Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity+ SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
00: b9 10 46 52 07 01 20 00 00 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 03 03 40 f0 00 20 22
20: 40 ff 40 ff f0 c7 e0 d7 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b 00
00:06.0 PCI bridge: ALi Corporation M5249 HTT to PCI Bridge (prog-if
01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=04, subordinate=04, sec-latency=32
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: ff500000-ff5fffff
Prefetchable memory behind bridge: 88000000-880fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
00: b9 10 49 52 07 01 00 00 00 01 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 04 04 20 d0 d0 00 22
20: 50 ff 50 ff 00 88 00 88 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00
00:07.0 ISA bridge: ALi Corporation M1563 HyperTransport South Bridge (rev 7
0)
Subsystem: ASRock Incorporation Unknown device 1563
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0 (250ns min, 6000ns max)
00: b9 10 63 15 0f 00 00 02 70 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 63 15
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 18
00:07.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU]
Subsystem: ASRock Incorporation Unknown device 7101
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
00: b9 10 01 71 00 00 00 02 00 00 80 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 01 71
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00:11.0 Ethernet controller: ALi Corporation ULi 1689,1573 integrated
ethernet. (rev 40)
Subsystem: ASRock Incorporation Unknown device 5263
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (5000ns min, 10000ns max), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at e800 [size=256]
Region 1: Memory at ff6ffc00 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: b9 10 63 52 07 01 10 02 40 00 00 02 08 20 00 00
10: 01 e8 00 00 00 fc 6f ff 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 63 52
30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 14 28
00:12.0 IDE interface: ALi Corporation M5229 IDE (rev c7) (prog-if 8a
[Master SecP PriP])
Subsystem: ASRock Incorporation Unknown device 5229
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Interrupt: pin A routed to IRQ 0
Region 0: I/O ports at <ignored>
Region 1: I/O ports at <ignored>
Region 2: I/O ports at <ignored>
Region 3: I/O ports at <ignored>
Region 4: I/O ports at ff00 [size=16]
00: b9 10 29 52 05 00 a0 02 c7 8a 01 01 00 20 00 00
10: f1 01 00 00 f5 03 00 00 71 01 00 00 75 03 00 00
20: 01 ff 00 00 00 00 00 00 00 00 00 00 49 18 29 52
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00
00:13.0 USB Controller: ALi Corporation USB 1.1 Controller (rev 03)
(prog-if 10 [OHCI])
Subsystem: ASRock Incorporation Unknown device 5237
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (20000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ff6fe000 (32-bit, non-prefetchable) [size=4K]
00: b9 10 37 52 17 01 a8 02 03 10 03 0c 10 20 80 00
10: 00 e0 6f ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 37 52
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 01 00 50
00:13.1 USB Controller: ALi Corporation USB 1.1 Controller (rev 03)
(prog-if 10 [OHCI])
Subsystem: ASRock Incorporation Unknown device 5237
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (20000ns max), Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 3
Region 0: Memory at ff6fd000 (32-bit, non-prefetchable) [size=4K]
00: b9 10 37 52 17 01 a8 02 03 10 03 0c 10 20 80 00
10: 00 d0 6f ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 37 52
30: 00 00 00 00 00 00 00 00 00 00 00 00 03 02 00 50
00:13.2 USB Controller: ALi Corporation USB 1.1 Controller (rev 03)
(prog-if 10 [OHCI])
Subsystem: ASRock Incorporation Unknown device 5237
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (20000ns max), Cache Line Size: 64 bytes
Interrupt: pin C routed to IRQ 11
Region 0: Memory at ff6fc000 (32-bit, non-prefetchable) [size=4K]
00: b9 10 37 52 17 01 a8 02 03 10 03 0c 10 20 80 00
10: 00 c0 6f ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 37 52
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 03 00 50
00:13.3 USB Controller: ALi Corporation USB 2.0 Controller (rev 01)
(prog-if 20 [EHCI])
Subsystem: ASRock Incorporation Unknown device 5239
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (4000ns min, 8000ns max), Cache Line Size: 64 bytes
Interrupt: pin D routed to IRQ 5
Region 0: Memory at ff6ff800 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Debug port
00: b9 10 39 52 16 01 b0 02 01 20 03 0c 10 20 80 00
10: 00 f8 6f ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 39 52
30: 00 00 00 00 50 00 00 00 00 00 00 00 05 04 10 20
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] HyperTransport Technology Configuration
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Capabilities: [80] HyperTransport: Host or Secondary Interface
!!! Possibly incomplete decoding
Command: WarmRst+ DblEnd-
Link Control: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0
Link Config: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit
Revision ID: 1.02
00: 22 10 00 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Address Map
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
00: 22 10 01 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] DRAM Controller
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
00: 22 10 02 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Miscellaneous Control
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
00: 22 10 03 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA Parhelia
AGP (rev 03) (prog-if 00 [VGA])
Subsystem: Matrox Graphics, Inc. Parhelia 128Mb
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (4000ns min, 8000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 5
Region 0: Memory at c8000000 (32-bit, prefetchable) [size=128M]
Region 1: Memory at ff4fe000 (32-bit, non-prefetchable) [size=8K]
Expansion ROM at ff4c0000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [f0] AGP version 2.0
Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64-
HTrans- 64bit- FW+ AGP3- Rate=x1,x2,x4
Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit-
FW- Rate=<none>
00: 2b 10 27 05 07 00 b0 02 03 00 00 03 10 20 00 00
10: 08 00 00 c8 00 e0 4f ff 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 2b 10 40 08
30: 00 00 4c ff dc 00 00 00 00 00 00 00 05 01 10 20
04:05.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a)
Subsystem: Creative Labs SBLive! 5.1 eMicro 28028
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (500ns min, 5000ns max)
Interrupt: pin A routed to IRQ 20
Region 0: I/O ports at d880 [size=32]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 02 11 02 00 05 01 90 02 0a 00 01 04 00 20 80 00
10: 81 d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 02 11 67 80
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 02 14
04:05.1 Input device controller: Creative Labs SB Live! Game Port (rev 0a)
Subsystem: Creative Labs Gameport Joystick
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Region 0: I/O ports at dc00 [size=8]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 02 11 02 70 05 01 90 02 0a 00 80 09 00 20 80 00
10: 01 dc 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 02 11 20 00
30: 00 00 00 00 dc 00 00 00 00 00 00 00 00 00 00 00
04:06.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
Subsystem: Adaptec 29160N Ultra160 SCSI Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (10000ns min, 6250ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
BIST result: 00
Region 0: I/O ports at d400 [disabled] [size=256]
Region 1: Memory at ff5ff000 (64-bit, non-prefetchable) [size=4K]
Expansion ROM at 88000000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 05 90 80 00 16 01 b0 02 02 00 00 01 10 20 00 80
10: 01 d4 00 00 04 f0 5f ff 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 05 90 a0 62
30: 00 00 5c ff dc 00 00 00 00 00 00 00 03 01 28 19
04:07.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (r
ev 42)
Subsystem: D-Link System Inc DFE-530TX rev B
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (750ns min, 2000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at d000 [size=256]
Region 1: Memory at ff5fec00 (32-bit, non-prefetchable) [size=256]
Expansion ROM at 88020000 [disabled] [size=64K]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 06 11 65 30 17 01 10 02 42 00 00 02 10 20 00 00
10: 01 d0 00 00 00 ec 5f ff 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 11 01 14
30: 00 00 ff ff 40 00 00 00 00 00 00 00 0b 01 03 08
root@dragon:/home/juhl/download/kernel/linux-2.6.19-rc1-git2# scripts/ver_li
nux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
Linux dragon 2.6.19-rc1-git2 #1 SMP PREEMPT Sat Oct 7 00:30:45 CEST
2006 i686 athlon-4 i386 GNU/Linux
Gnu C 3.4.6
Gnu make 3.81
binutils 2.15.92.0.2
util-linux 2.12r
mount 2.12r
module-init-tools 3.2.2
e2fsprogs 1.39
reiserfsprogs 3.6.19
quota-tools 3.13.
PPP 2.4.4b1
Linux C Library 2.3.6
Dynamic linker (ldd) 2.3.6
Linux C++ Library 6.0.3
Procps 3.2.7
Net-tools 1.60
Kbd 1.12
Sh-utils 5.97
udev 097
Modules Loaded snd_seq_oss snd_seq_midi_event snd_seq
snd_pcm_oss snd_mixer_oss agpgart snd_emu10k1 snd_rawmidi
snd_ac97_codec snd_ac97_bus snd_pcm snd_seq_device snd_timer
snd_page_alloc snd_util_mem snd_hwdep evdev snd
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
Attachment: config.gz
This has been downloaded 0 time(s).
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-07-06 12:22 AM
On Sat, 7 Oct 2006 01:36:24 +0200
"Jesper Juhl" <jesper.juhl@gmail.com> wrote:
> Hi,
>
> I've been using the this very simple script for a while to do test
> builds of the kernel :
>
>
> #!/bin/bash
>
> for i in $(seq 1 100); do
> nice make distclean
> while true; do
> nice make randconfig
> grep -q "CONFIG_EXPERIMENTAL=y" .config
> if [ $? -eq 1 ]; then
> break
> fi
> done
> cp .config config.${i}
> nice make -j3 > build.log.${i} 2>&1
> done
>
>
> Which has worked great in the past, but with recent kernels it has
> been a sure way to cause a complete lockup within 1 hour :-(
>
This is probably one of those nobody-but-you-can-reproduce-it things.
>
> The last kernel where I know for sure that it ran without problems is
> 2.6.17.13 .
> The first kernel where I know for sure it caused lockups is
> 2.6.18-git15 . I've also tested 2.6.18-git16, 2.6.18-git21 and
> 2.6.19-rc1-git2 and those 3 also lock up solid.
>
> The lockup usually happens within 30 minutes, but sometimes the box
> survives longer, but I've not seen it survive for more than 60 minutes
> at most.
> It doesn't seem to matter if I leave it alone just building kernels or
> if I use it for other purposes while building in the background - if
> anything, it seems to survive longer when I do other work while it
> builds.
>
> When the lockup happens the box just freezes and doesn't respond to
> anything at all. Sometimes I can reboot with alt+sysrq+b but sometimes
> not even that works.
If you can do sysrq-b then you can do sysrq-t, too?
Please ensure that you have all the CONFIG_DEBUG_* things set, apart from
PAGEALLOC.
> Here's exactely what I do, so you can try to reproduce :
>
> 1) boot my distro (Slackware 11.0) into runlevel 4 (multi-user with
> X), using kernel 2.6.19-rc1-git2 (or one of the other "known-bad"
> kernels).
>
> 2) Log in via kdm, and once I'm at my KDE desktop I start 'konsole'.
>
> 3) cd into a dir holding a fresh copy of the 2.6.19-rc1-git2 source
> and run the above script from a file named build-random.sh that I have
> placed in the root of the source dir and made executable.
>
> 4) wait for 0-60 minutes.
>
>
> After a reboot I find nothing in the logs, so I can't give you many
> hints on what goes wrong, unfortunately.
>
Once you've got the test set up and running, you can do the alt-ctl-F1
thing to take you out of X and into the vga console. I suggest you leave
it running that way, see if anything pops up when it hangs.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-07-06 12:22 AM
On 07/10/06, Andrew Morton <akpm@osdl.org> wrote:
> On Sat, 7 Oct 2006 01:36:24 +0200
> "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
>
>
> This is probably one of those nobody-but-you-can-reproduce-it things.
>
I hope not. But that actually why I post the script, to try an get
more people to reproduce...
>
> If you can do sysrq-b then you can do sysrq-t, too?
>
I don't know, haven't tried - but I'll try the next few times it locks up.
> Please ensure that you have all the CONFIG_DEBUG_* things set, apart from
> PAGEALLOC.
>
$ zgrep CONFIG_DEBUG_ /proc/config.gz
# CONFIG_DEBUG_DRIVER is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_HIGHMEM=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_LIST=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_RODATA=y
That good enough?
>
> Once you've got the test set up and running, you can do the alt-ctl-F1
> thing to take you out of X and into the vga console. I suggest you leave
> it running that way, see if anything pops up when it hangs.
>
I've done that on a few occasions already without seeing anything, but
I'll try a few more times.
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-07-06 06:16 AM
On 07/10/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On 07/10/06, Andrew Morton <akpm@osdl.org> wrote:
> I've done that on a few occasions already without seeing anything, but
> I'll try a few more times.
>
Hmm, trying to do this (with 2.6.19-rc1-git2) seems to have revealed
yet another problem.
If I try to switch to tty1 just after boot, everything is fine. It's
still fine after using the box for a few minutes doing random stuf
like reading email, surfing the web etc, but once my build script has
been running for a few minutes (tested 2 times after ~5min. runs) I
just get a completely white screen when switching to tty1, and when
switching back to X I also just get a white screen :-(
Something is definately broken here....
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-07-06 06:16 AM
On Sat, 7 Oct 2006, Jesper Juhl wrote:
>
> Which has worked great in the past, but with recent kernels it has
> been a sure way to cause a complete lockup within 1 hour :-(
Reliable lock-ups (and "within 1 hour" is quite quick too) are actually
great.
> 2.6.17.13 .
> The first kernel where I know for sure it caused lockups is
> 2.6.18-git15 . I've also tested 2.6.18-git16, 2.6.18-git21 and
> 2.6.19-rc1-git2 and those 3 also lock up solid.
Can I bother you to just bisect it?
Even if you decide that it's too painful to bisect to the very end, "git
bisect" will give great results after just as few reboots as four or five,
and hopefully narrow down the thing a _lot_.
So, for example, while my git tree doesn't contain the stable release
numbers, you can trivially just get my tree, and then point "git fetch" at
the stable git tree and get v2.6.17.13 that way.
Then you can do just
git bisect start
git bisect good v2.6.17.13
git bisect bad $(cat patch-2.6.18-git15.id)
and off you go - it will pick a half-way point for you to test, and then
if that one was good, you just say "git bisect good", and it will pick the
next one..
(that "patch-2.6.18-git15.id" thing is from kernel.org - it's how you can
get the exact git state of any particular snapshot, even if it's not
tagged in any real tree - that particular one seems to have SHA1 ID
1bdfd554be94def718323659173517c5d4a69d25.)
"git bisect" really does kick XXX. Don't worry if it says "10374 commits
to test after this" - because it does a binary search, it basically
cuts the commits to test in half each time, and so if you do just five
bisections, you'll have cut down the 10,000 commits to just a few hundred.
At that point, maybe we even have a clue, or we might ask you to test a
few more times to narrow things down even more.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-07-06 06:16 AM
On Sat, 7 Oct 2006 01:36:24 +0200, "Jesper Juhl" <jesper.juhl@gmail.com> wro
te:
>Hi,
>
>I've been using the this very simple script for a while to do test
>builds of the kernel :
>
>
>#!/bin/bash
>
>for i in $(seq 1 100); do
> nice make distclean
> while true; do
> nice make randconfig
> grep -q "CONFIG_EXPERIMENTAL=y" .config
> if [ $? -eq 1 ]; then
> break
> fi
> done
> cp .config config.${i}
> nice make -j3 > build.log.${i} 2>&1
>done
>
>
>Which has worked great in the past, but with recent kernels it has
>been a sure way to cause a complete lockup within 1 hour :-(
There's some no-nos Adrian Bunk pointed out back when I was doing this,
here's what I used last year -- it recently ran a hundred compiles but
I forgot or lost the script that interpreted results
grant@sempro:~$ cat /usr/local/bin/zrandom-build
#!/bin/bash
#
# 2.6 kernel random .config compiler driver
#
# Copyright (C) 2005 Grant Coady gcoady.lk@gmail.com
#
# GPL v2 per linux/COPYING by reference
#
# Thanks to:
# comp.unix.shell people:
# Chris F.A. Johnson <http://cfaj.freeshell.org> for CLI number test
# Ed Morton <morton@lsupcaemnt.com> for 'awk' solution in resuming
# for answers to query 2005-07-27 for improvements to this script.
#
# linux-kernel people:
# Adrian Bunk Don't bother with useless CONFIG_BROKEN= .config
# CONFIG_STANDALONE=
# Jesper Juhl Feedback
#
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
# What?
# ``````
# A script to build random kernel .configs to discover kbuild errors.
# The .config, compiler output and time are recorded into the destination
# directory. Run several in parallel with outputs to different directories
.
#
# The .config and compiler result are linked by a three digit number at
# start of filename.
#
# Files
# ``````
# 000-about record settings for a particular run
# ???-config the .config
# ???-result build (compiler) output
# ???-time time to build in seconds and mm:ss (curiosity)
#
# Post processing of results lists each error (or warning) and the first
# .config file triggering the error/warning. Another script.
#
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
# globals
clean="" # -c set "Y" to do 'make clean' prior to compiles
store="../" # -d destination directory
jobnr="" # -jn make job control
limit=100 # -n number of .config builds to make
build="Y" # -t clear to not build .config for testing
patch="" # set "Y" to skip retry CONFIG_BROKEN=y .configs
count=0 # build counter
retry=0 # retry counter for useless .config filter
#
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
# Setup the trial series, command line interface
function show_usage
{
echo "random-build
random .config compiler driver for 2.6 series kernel
usage: random-build [-d destination_directory] [-n nnn] [-t]
1;-u]
-c do 'make clean' prior to each compile, default off
-d dir destination for results, default ../
-jn make job control, n = 0..9
-n nnn number of compile runs, default 100
-t testing config driver, no build .config
cd into linux top-level directory, specify an output directory
outside the kernel directory, example command:
random-build -c -n 333 -d ../trial-2.6.13-rc3-mm2-1
would make clean prior to each build and place the results of
333 random .config to directory ../trial-2.6.13-rc3-mm2-1
Useless .config generated are skipped, read script source to see
current setting; CONFIG_BROKEN=y is definitely useless
"
exit 1
}
function check_config_limit # limit
{
case $1 in
*[!0-9]*) limit=0;;
* ) limit=$1;;
esac
if [ $limit -lt 1 -o $limit -gt 999 ]; then
limit=100
fi
}
function check_create_dest # destination
{
local crap="n"
if [ ! -d "$1" ]; then
echo -e \
"Non-existent destination $1 specified, create it? (y/N) \c"
read crap
echo
if [ "$crap" == "y" -o "$crap" == "Y" ]; then
mkdir "$1"
else
echo "bad dest"; show_usage
fi
fi
store=$1
}
# parse command line
while [ $1 ]; do
case $1 in
-c ) clean="Y";; # do 'make clean'
-d ) check_create_dest $2; shift;;
-j[0-9]) jobnr=$1;;
-n ) check_config_limit $2; shift;;
-t ) build="";; # disable build
* ) echo "bad CLI"; show_usage;;
esac
shift
done
echo "
#==>>
#==>> Grant's random kernel configs $(date)
#==>> $0 from $PWD
#==>> host: linux-$(uname -r) on $HOSTNAME
#==>> store=$store
#==>> limit=$limit
#==>> clean=$clean
#==>> build=$build
#==>> job control=$jobnr
#==>>
" 2>&1 | tee "$store/000-about"
#
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
# Run the trial series
#
# check if destination contains results, if so assume restart test run and
# thus overwrite the last partial result, remove leading zeroes so number
# is seen as decimal, not octal! Queried comp.unix.shell - 2005-07-27...
function perhaps_resume_trial
{
count=$(ls $store/*-config 2>/dev/null \
| awk -F/ '{f=$NF}END{print f+0}')
if [ $count -gt 0 -a $count -lt 1000 ]; then
if [ $count -gt 1 ]; then
echo -e "\n#==>> Resuming: $count\n"
fi
((count--))
else
count=0
fi
}
function check_config
{
local x=$(egrep \
'CONFIG_BROKEN= | CONFIG_STANDALONE= | CONFIG_DEBUG_INFO=' \
.config > /dev/null)
return $x
}
function create_random_config
{
if [ -n "$patch" ]; then
make randconfig > /dev/null
else
while true; do
make randconfig > /dev/null
check_config && break
echo -e "\tRetry ($((++retry))): skipped useless .config"
done
fi
cp .config "$store/$trial-config"
}
function build_random_config
{
if [ -n "$build" ]; then
[ -n "$clean" ] && make clean
make $jobnr 2> "$store/$trial-result"
fi
}
stamp=$SECONDS
function write_timestamp_file
{
local t=0 m=0 s=0
t=$((SECONDS - stamp))
m=$(printf "%2d" $((t / 60)))
s=$(printf "%02d" $((t % 60)))
echo -e "$t\t$m:$s" > "$store/$trial-time"
stamp=$SECONDS
}
perhaps_resume_trial
while [ $((++count)) -le $limit ]; do
trial=$(printf %003d $count)
echo "#==>> $0, run $count: make randconfig"
create_random_config
build_random_config
write_timestamp_file
done
echo "skipped $retry useless .config :o)"
# end
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-08-06 12:24 AM
On 07/10/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Sat, 7 Oct 2006, Jesper Juhl wrote:
>
> Reliable lock-ups (and "within 1 hour" is quite quick too) are actually
> great.
>
>
> Can I bother you to just bisect it?
>
Sure, but it will take a little while since building + booting +
starting the test + waiting for the lockup takes a fair bit of time
for each kernel and also due to the fact that my git skills are pretty
limited, but I'll figure it out (need to improve those git skills
anyway) :-)
I'll be back with more info.
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-08-06 12:24 AM
On Sat, 7 Oct 2006, Jesper Juhl wrote:
>
>
> Sure, but it will take a little while since building + booting +
> starting the test + waiting for the lockup takes a fair bit of time
> for each kernel
Sure. That said, we've tried to narrow down things that took hours or days
(under real loads, not some nice test-script) to reproduce, and while it
doesn't always work, the real problem tends to be if the problem case
isn't really reproducible. It sounds like yours is pretty clear-cut, and
that will make things much easier.
> and also due to the fact that my git skills are pretty
> limited, but I'll figure it out (need to improve those git skills
> anyway) :-)
"git bisect" in particular isn't that hard to use, and it will really do
a lot of heavy lifting for you.
Although since it will just select a random commit (well, it's not
"random": it's strictly as half-way as it can possibly be, but it's
automated without any regard for anything else), you can sometimes hit a
situation where git will ask you to test a kernel that simply doesn't work
at all, and you can't even test whether it reproduces your particular bug
or not.
For example, "git bisect" might pick a kernel that just doesn't compile,
because of some stupid bug that was fixed almost immediately afterwards.
In those cases, the total automation of "git bisect" ends up being
something that has to be helped along by hand, and then it definitely
helps to know more about how git works.
Anyway, the quick tutorial about "git bisect" is that once you've given it
the required first "good" and "bad" points, it will create a new branch in
the repository (called "bisect", in case you care), and after that point
it will do a search in the commit DAG (aka "history tree" - it's not a
tree, it's a DAG, since merges will join branches together) for the next
commit that will neatly "split" the DAG into two equal pieces. It will
keep splitting the commit history until you get fed up, or until it has
pinpointed the single commit that caused the problem.
The nicest tool to use during bisection is to just do a
git bisect visualize
that simply starts up "gitk" (the default git history visualizer) to show
what the current state of bisection is. Now, if there are thousands and
thousands of commits, you'll have a really hard time getting a visual clue
about what is going on, but especially once you get to a smaller set of
commits, it's very useful indeed.
And it's _especially_ useful if you hit one of the problem spots where you
can't test the resulting tree for some unrelated reason. When that
happens, you should _not_ mark the problematic commit as being "bad",
because you really don't know - the "badness" of that commit is probably
not related to the "badness" that you're actually searching for.
Instead, you should say "ok, I refuse to test this commit at all, because
it's got other problems, and I will select another commit instead". The
bisection algorithm doesn't care which commit you pick, as long as it's
within the set of "unknown" commits that you'll see with the visualization
tool.
Of course, for efficiency reasons, the _closer_ you get to the half-way
mark, the better. So it's useful to try to pick a commit that is close to
the one that "git bisect" originally chose for you, but that's not a
correctness issue, that's just an issue of "if we have a thousand
potential commits, we're better off bisecting it 400/600 rather than
1/999, even if the exact half-way point isn't testable".
So if you need to decide to pick another point than the one "git bisect"
chose for you automatically, just select that commit in the visualizer
(which will cut the SHA1 name of it), and then do
git reset --hard <paste-sha1-here"
to reset the "bisect" branch to that point instead. And then compile and
test that kernel instead (and then if that's good or bad, you can do the
"git bisect good" or "git bisect bad" thing to mark it so, and git will
continue to bisect the set of commits).
It can be a bit boring, but damn, it's effective. I've used "git bisect"
several times when I've been too lazy to try to really think about what is
going on - I'll happily brute-force bug-finding even if it might take a
little longer, if it's guaranteed to find it (and if the bug is
reproducible, git bisect definitely guarantees to find what made it
appear, even if that may not necessarily be the deeper _cause_ of the bug)
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-09-06 12:23 AM
Ok, some preliminary results on this before I go get some sleep + a
working day tomorrow...
On 07/10/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Sat, 7 Oct 2006, Jesper Juhl wrote:
>
> Sure. That said, we've tried to narrow down things that took hours or days
> (under real loads, not some nice test-script) to reproduce, and while it
> doesn't always work, the real problem tends to be if the problem case
> isn't really reproducible. It sounds like yours is pretty clear-cut, and
> that will make things much easier.
>
Yeah, it seems pretty clear-cut, but I'm a bit nervous that it may
sometimes take longer than my observed 60min to reproduce, rendering
my git-bisection less than perfect (more on that below).
>
> "git bisect" in particular isn't that hard to use, and it will really do
> a lot of heavy lifting for you.
>
(...)
Thanks a lot for the tutorial, that really helped.
For some reason I couldn't get git to accept 2.6.17.13 as a "good"
starting point, so I used 2.6.17 instead, and the sha1 you gave me for
2.6.18-git15 as the "bad" starting point.
Here's where I am right now (a log of what I've done) :
[bisection start]
Bisecting: 5188 revisions left to test after this
& #91;92164c5dd1ade33f4e90b72e407910de6694
de49] USB: OHCI hub code unaligned
access
[git bisect good]
Bisecting: 2567 revisions left to test after this
& #91;e41542f5167d6b506607f8dd111fa0a3e468
ccb8] [DCCP]: Introduce dccp_pr
obe
[git bisect good]
Bisecting: 1351 revisions left to test after this
& #91;b98adfccdf5f8dd34ae56a2d5adbe2c030bd
4674] Merge
master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6
[git bisect good]
Bisecting: 635 revisions left to test after this
& #91;538d9d532b0e0320c9dd326a560b5a72d73f
910d] irq: remove a extra line
[git bisect good]
Bisecting: 292 revisions left to test after this
& #91;db1a19b38f3a85f475b4ad716c71be133d8c
a48e] Merge branch
'intelfb-patches' of
master.kernel.org:/pub/scm/linux/kernel/git/airlied/intelfb-2.6
[git bisect bad]
Bisecting: 146 revisions left to test after this
& #91;1db27c11e9a0c6d659040ac0b7c64a339e24
8fa1] istallion: Remove private
baud rate decoding, which is also broken in this case on some
platforms
[git bisect bad]
Bisecting: 73 revisions left to test after this
& #91;3171a0305d62e6627a24bff35af4f997e498
8a80] simplify update_times
(avoid jiffies/jiffies_64 aliasing problem)
[git bisect good]
Bisecting: 37 revisions left to test after this
& #91;29b884921634e1e01cbd276e1c9b8fc07a7e
4a90] set EXIT_DEAD state in
do_exit(), not in schedule()
[currently testing this kernel]
Looking at "git bisect visualize" the current status is this :
bisect/good: 3171a0305d62e6627a24bff35af4f997e4988a80
bisect/bad: 1db27c11e9a0c6d659040ac0b7c64a339e248fa1
Current bisect marker at: 29b884921634e1e01cbd276e1c9b8fc07a7e4a90
I'm a little worried though that my results may not be completely reliable.
There's no doubt that you can trust the kernels that I told git were
"bad" since those resultet in a hang and there's just no getting
around that. So we know for a fact that the bad commit is somewhere
between my last found bad kernel and 2.6.17, what we don't know with
the same amount of certainty is if the bad commit is between my last
found good kernel and the last found bad one.
What I'm worried about is the kernels I've marked as "good". Before
starting this run I had never experienced a hang if the kernel
survived past the one hour mark, so I concluded that testing each
kernel for 80min would be enough to prove it good or bad. This now
seems to be not completely reliable since my second bad kernel
happened to hang after ~2hrs. This happened since I forgot to check my
computer after 80min and only came back to it some 3hrs later (I know
the time it hung since I had a xterm doing while true;do sleep
10;uptime;done running, so I could check.
This all means that my testing and concluding kernels were "good"
after 80min of test runtime may not be 100% reliable.
Is it useful for me to continue bisecting from the point I'm at, or
should I reset from good==2.6.17 and bad==the_last_bad_commit_I_found
? Or do you have a likely culprit I should try revoking?
Whatever your answer it'll have to wait until tomorrow evening since
I'm going to go get some sleep now, but please let me know what you'd
like me to do ...
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Simple script that locks up my box with recent kernels |
 |
 |
|
|
10-17-06 12:20 AM
Ok, finally got to the end of the bisection (see below; quoting all of
my previous email since my concerns from that one are still valid).
On 09/10/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> Ok, some preliminary results on this before I go get some sleep + a
> working day tomorrow...
>
>
> On 07/10/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
> Yeah, it seems pretty clear-cut, but I'm a bit nervous that it may
> sometimes take longer than my observed 60min to reproduce, rendering
> my git-bisection less than perfect (more on that below).
>
>
> (...)
> Thanks a lot for the tutorial, that really helped.
>
> For some reason I couldn't get git to accept 2.6.17.13 as a "good"
> starting point, so I used 2.6.17 instead, and the sha1 you gave me for
> 2.6.18-git15 as the "bad" starting point.
>
> Here's where I am right now (a log of what I've done) :
>
> [bisection start]
>
> Bisecting: 5188 revisions left to test after this
> & #91;92164c5dd1ade33f4e90b72e407910de6694
de49] USB: OHCI hub code unaligne
d access
>
> [git bisect good]
>
> Bisecting: 2567 revisions left to test after this
> & #91;e41542f5167d6b506607f8dd111fa0a3e468
ccb8] [DCCP]: Introduce dccp_
probe
>
> [git bisect good]
>
> Bisecting: 1351 revisions left to test after this
> & #91;b98adfccdf5f8dd34ae56a2d5adbe2c030bd
4674] Merge
> master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6
>
> [git bisect good]
>
> Bisecting: 635 revisions left to test after this
> & #91;538d9d532b0e0320c9dd326a560b5a72d73f
910d] irq: remove a extra line
>
> [git bisect good]
>
> Bisecting: 292 revisions left to test after this
> & #91;db1a19b38f3a85f475b4ad716c71be133d8c
a48e] Merge branch
> 'intelfb-patches' of
> master.kernel.org:/pub/scm/linux/kernel/git/airlied/intelfb-2.6
>
> [git bisect bad]
>
> Bisecting: 146 revisions left to test after this
> & #91;1db27c11e9a0c6d659040ac0b7c64a339e24
8fa1] istallion: Remove private
> baud rate decoding, which is also broken in this case on some
> platforms
>
> [git bisect bad]
>
> Bisecting: 73 revisions left to test after this
> & #91;3171a0305d62e6627a24bff35af4f997e498
8a80] simplify update_times
> (avoid jiffies/jiffies_64 aliasing problem)
>
> [git bisect good]
>
> Bisecting: 37 revisions left to test after this
> & #91;29b884921634e1e01cbd276e1c9b8fc07a7e
4a90] set EXIT_DEAD state in
> do_exit(), not in schedule()
>
> [currently testing this kernel]
>
>
> Looking at "git bisect visualize" the current status is this :
>
> bisect/good: 3171a0305d62e6627a24bff35af4f997e4988a80
> bisect/bad: 1db27c11e9a0c6d659040ac0b7c64a339e248fa1
> Current bisect marker at: 29b884921634e1e01cbd276e1c9b8fc07a7e4a90
>
>
> I'm a little worried though that my results may not be completely reliable
.
>
> There's no doubt that you can trust the kernels that I told git were
> "bad" since those resultet in a hang and there's just no getting
> around that. So we know for a fact that the bad commit is somewhere
> between my last found bad kernel and 2.6.17, what we don't know with
> the same amount of certainty is if the bad commit is between my last
> found good kernel and the last found bad one.
>
> What I'm worried about is the kernels I've marked as "good". Before
> starting this run I had never experienced a hang if the kernel
> survived past the one hour mark, so I concluded that testing each
> kernel for 80min would be enough to prove it good or bad. This now
> seems to be not completely reliable since my second bad kernel
> happened to hang after ~2hrs. This happened since I forgot to check my
> computer after 80min and only came back to it some 3hrs later (I know
> the time it hung since I had a xterm doing while true;do sleep
> 10;uptime;done running, so I could check.
>
> This all means that my testing and concluding kernels were "good"
> after 80min of test runtime may not be 100% reliable.
>
> Is it useful for me to continue bisecting from the point I'm at, or
> should I reset from good==2.6.17 and bad==the_last_bad_commit_I_found
> ? Or do you have a likely culprit I should try revoking?
>
> Whatever your answer it'll have to wait until tomorrow evening since
> I'm going to go get some sleep now, but please let me know what you'd
> like me to do ...
>
In the end, this is what git told me :
1db27c11e9a0c6d659040ac0b7c64a339e248fa1
is first bad commit
commit 1db27c11e9a0c6d659040ac0b7c64a339e248fa1
Author: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Fri Sep 29 02:01:38 2006 -0700
[PATCH] istallion: Remove private baud rate decoding, which is
also broken in this case on some platforms
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
:040000 040000 0fc700de5e78b39acc130d529cf59437e9242b68
884b27574b6c38a5fa952d09ca945b167e36db84
M drivers
But, that doesn't make much sense, so I very strongly suspect that my
test case was not as reliable as I thought.
We can trust the commits I marked as 'bad' though since there's no
getting around a complete lockup of the box. So we know for sure now
that things broke between 2.6.17 and the commit above. But since that
commit makes no sense as the cause of the breakage it must be a case
of me having marked a kernel as 'good' that would eventually have
turned out bad if I'd run it longer :-(
Where do I go from here? The problem is still there... I'll test
2.6.19-rc2 tomorrow, but apart from that I don't know how to proceed
apart from trying to capture a sysrq+t dump when the box locks up...
any ideas?
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 01:42 PM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
|
 |
|
 |
|