What puzzles me is the sequence shown in the second oops of
blkdev_release_request leading to a schedule. Possibly via an
interrupt? You yourself said that the function cannot sleep. The oops
was triggered my my detection code placed in schedule which fires
on just such an event.
>>EIP; c011426a <schedule+142/8f4> <=====
Trace; c011b064 <exit_notify+178/2a8>
Trace; c011b548 <do_exit+3b4/3d4>
Trace; c0107a38 <do_divide_error+0/9c>
Trace; c0113683 <do_page_fault+3f3/510>
Trace; c0113290 <do_page_fault+0/510>
Trace; c01d91ad <memcpy_toiovec+35/64>
Trace; c01d7c4c <skb_release_data+68/74>
Trace; c0113cc3 <reschedule_idle+63/214>
Trace; c01ef15d <tcp_recvmsg+839/a58> <-- urr?
Trace; c0107304 <error_code+34/3c> <-- huh?
Trace; c01a3c17 <blkdev_release_request+df/1b4>
Trace; c01a4807 <end_that_request_last+a3/130>
Trace; c0134176 <__free_pages+1e/24>
Trace; c01341a3 <free_pages+27/2c>
Trace; c014c51d <do_select+1e5/1fc>
Trace; c0122071 <sys_rt_sigaction+89/d8>
Trace; c0143842 <blkdev_ioctl+2e/34>
Trace; c014b9a9 <sys_ioctl+1f9/280>
Trace; c010721b <system_call+33/38>
> "Andrew Morton wrote:"
> > Yes, there are often interrupt entrails on the stack. If you can,
> > generate that trace and send it....
>
> > It may simplify your oops tracing to remove all the `inline'
> > qualifiers in ll_rw_blk.c, BTW. They tend to obscure things.
> > They should in fact be taken out permanently - someone went
> > absolutely insane there.
(I un-inlined all of ll_rw_blk.c)
> What [also] puzzles me is that I seem to get an oops immediately preceding
> theBUG() oops that I put in to check for scheduling with the
> io_request_lock held. Without the extra code added in for the
> debugging, this strange first oops does not happen.
>
> So there's the strange oops, then the BUG() oops, then the NMI watchdog
> kicks in, and syslog oopses as well. That shows why I was never
> able to log it.
First oops, possibly second in real time order, but it came up on
screen first. I am at a loss to explain it.
ksymoops 2.4.1 on i686 2.4.6-SMP-XFS. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.6-SMP-XFS/ (default)
-m /boot/System.map-2.4.6-SMP-XFS (default)
-S
Unable to handle kernel paging request at virtual address 00002004
c01a3c17
*pde = 00000000
Oops: 0002
CPU: 1
EIP: 0010:[<c01a3c17>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010046
eax: c0366dec ebx: c7371ce0 ecx: c7371cec edx: 00002000
esi: c0366dbc edi: 00000004 ebp: c31dbe28 esp: c31dbe1c
ds: 0018 es: 0018 ss: 0018
Process my-client (pid: 929, stackpage=c31db000)
Stack: c02e69e0 c7371ce0 c7371ce0 c31dbe3c c01a4807 c7371ce0 c88d3940 00000001
c31dbe7c c88cb5c8 c7371ce0 00000001 c88d0a07 c88cb592 c7371ce0 400fa078
c31dbf2c c3070aec c7371ce0 400fa078 c31dbf2c 00000246 00000001 00000000
Call Trace: [<c01a4807>] [<c0134176>] [<c01341a3>] [<c014c51d>] [<c0122071>] [<c0143842>] [<c014b9a9>]
[<c010721b>]
Code: 89 4a 04 89 53 0c 89 41 04 89 08 8d 0c bd 00 00 00 00 8d 46
>>EIP; c01a3c17 <blkdev_release_request+df/1b4> <=====
Trace; c01a4807 <end_that_request_last+a3/130>
Trace; c0134176 <__free_pages+1e/24>
Trace; c01341a3 <free_pages+27/2c>
Trace; c014c51d <do_select+1e5/1fc>
Trace; c0122071 <sys_rt_sigaction+89/d8>
Trace; c0143842 <blkdev_ioctl+2e/34>
Trace; c014b9a9 <sys_ioctl+1f9/280>
Trace; c010721b <system_call+33/38>
Code; c01a3c17 <blkdev_release_request+df/1b4> 00000000 <_EIP>:
Code; c01a3c17 <blkdev_release_request+df/1b4> 0: 89 4a 04 mov %ecx,0x4(%edx) <=====
Code; c01a3c1a <blkdev_release_request+e2/1b4> 3: 89 53 0c mov %edx,0xc(%ebx)
Code; c01a3c1d <blkdev_release_request+e5/1b4> 6: 89 41 04 mov %eax,0x4(%ecx)
Code; c01a3c20 <blkdev_release_request+e8/1b4> 9: 89 08 mov %ecx,(%eax)
Code; c01a3c22 <blkdev_release_request+ea/1b4> b: 8d 0c bd 00 00 00 00 lea 0x0(,%edi,4),%ecx
Code; c01a3c29 <blkdev_release_request+f1/1b4> 12: 8d 46 00 lea 0x0(%esi),%eax
> At this point in the dmesg output, my printk appears:
>
> Scheduling in process 929 with io lock held on cpu 1 by process 929 in function
> blkdev_release_request() ll_rw_blk.c:584
>
> and that appears to me to be pretty accurate. The line number won't be
> the same in your copy, as I addded debugging code. It's the entry point
> of the function - I didn't do any more detailed code bisection in this test.
>
> The fact that that output appeared indicates that the [diagnosis] is
> accurate (I won't go into why, but there is a spinlock protecting changes
> in the reported variables values, and they can also only be nonzero while
> the io spinlock is held, so the output indicates that we run schedule
> while the io spinlock is being held on the same cpu by the indicated
> function).
>
> Now here's the next one .. this is the BUG() I inserted to show
> when we shchedule with the io spinlock held.
kernel BUG at sched.c:576!
invalid operand: 0000
CPU: 1
EIP: 0010:[<c011426a>]
EFLAGS: 00010282
eax: 0000001b ebx: c02e69e0 ecx: 00000001 edx: 00000001
esi: c31da000 edi: c31da674 ebp: c31dbcf8 esp: c31dbcb8
ds: 0018 es: 0018 ss: 0018
Process my-client (pid: 929, stackpage=c31db000)
Stack: c02300d6 c02301f9 00000240 c122af20 c31da000 c31da674 c31d0018 c31d0018
c31dbcf8 c011b064 c31da000 00000011 c122af20 00000001 c31da000 c0339420
c31dbd0c c011b548 00000046 00000000 c31da000 c31dbd28 c0107a38 0000000b
Call Trace: [<c011b064>] [<c011b548>] [<c0107a38>] [<c0113683>] [<c0113290>] [<c01d91ad>] [<c01d7c4c>]
[<c0113cc3>] [<c01ef15d>] [<c0107304>] [<c01a3c17>] [<c01a4807>] [<c0134176>] [<c01341a3>] [<c014c51d>]
[<c0122071>] [<c0143842>] [<c014b9a9>] [<c010721b>]
Code: 0f 0b e9 9f 07 00 00 81 fb 0c 21 2f c0 75 1e f0 fe 0d 0c 21
>>EIP; c011426a <schedule+142/8f4> <=====
Trace; c011b064 <exit_notify+178/2a8>
Trace; c011b548 <do_exit+3b4/3d4>
Trace; c0107a38 <do_divide_error+0/9c>
Trace; c0113683 <do_page_fault+3f3/510>
Trace; c0113290 <do_page_fault+0/510>
Trace; c01d91ad <memcpy_toiovec+35/64>
Trace; c01d7c4c <skb_release_data+68/74>
Trace; c0113cc3 <reschedule_idle+63/214>
Trace; c01ef15d <tcp_recvmsg+839/a58>
Trace; c0107304 <error_code+34/3c>
Trace; c01a3c17 <blkdev_release_request+df/1b4>
Trace; c01a4807 <end_that_request_last+a3/130>
Trace; c0134176 <__free_pages+1e/24>
Trace; c01341a3 <free_pages+27/2c>
Trace; c014c51d <do_select+1e5/1fc>
Trace; c0122071 <sys_rt_sigaction+89/d8>
Trace; c0143842 <blkdev_ioctl+2e/34>
Trace; c014b9a9 <sys_ioctl+1f9/280>
Trace; c010721b <system_call+33/38>
Code; c011426a <schedule+142/8f4> 00000000 <_EIP>:
Code; c011426a <schedule+142/8f4> 0: 0f 0b ud2a <=====
Code; c011426c <schedule+144/8f4> 2: e9 9f 07 00 00 jmp 7a6 <_EIP+0x7a6> c0114a10 <schedule+8e8/8f4>
Code; c0114271 <schedule+149/8f4> 7: 81 fb 0c 21 2f c0 cmp $0xc02f210c,%ebx
Code; c0114277 <schedule+14f/8f4> d: 75 1e jne 2d <_EIP+0x2d> c0114297 <schedule+16f/8f4>
Code; c0114279 <schedule+151/8f4> f: f0 fe 0d 0c 21 00 00 lock decb 0x210c
> Then comes what appears to be an NMI watchdog output ...
NMI Watchdog detected LOCKUP on CPU0, registers:
CPU: 0
EIP: 0010:[<c02229c9>]
EFLAGS: 00000082
eax: 00002b00 ebx: 00000206 ecx: c2f92ea0 edx: 003ffff9
esi: 00002b00 edi: 00000018 ebp: c321dcc8 esp: c321dcc0
ds: 0018 es: 0018 ss: 0018
Process dd (pid: 1022, stackpage=c321d000)
Stack: 00000001 c2f92ea0 c321dce4 c01a42ca 00002b00 c2f92ea0 00000008 00000000
00200000 c321dd00 c01a4385 00000000 c2f92ea0 00000002 00000002 00001000
c321dd3c c01a460f 00000000 c2f92ea0 00000002 c321de98 00000000 00001000
Call Trace: [<c01a42ca>] [<c01a4385>] [<c01a460f>] [<c0142b86>] [<c01ac446>] [<c0114692>] [<c0142eff>]
[<c0133f06>] [<c012b5b2>] [<c0133e95>] [<c01274e8>] [<c01276aa>] [<c0113489>] [<c0113290>] [<c013a536>]
[<c0128771>] [<c0128fd8>] [<c0128b68>] [<c0139b38>] [<c010721b>]
Code: 7e f5 e9 57 08 f8 ff 80 3d e0 69 2e c0 00 f3 90 7e f5 e9 54
>>EIP; c02229c9 <stext_lock+917d/dbc9> <=====
Trace; c01a42ca <generic_make_request+be/120>
Trace; c01a4385 <submit_bh+59/78>
Trace; c01a460f <ll_rw_block+21f/28c>
Trace; c0142b86 <block_read+3b2/580>
Trace; c01ac446 <ide_do_request+2ce/320>
Trace; c0114692 <schedule+56a/8f4>
Trace; c0142eff <bdget+83/1e8>
Trace; c0133f06 <__alloc_pages+6a/270>
Trace; c012b5b2 <filemap_nopage+aa/440>
Trace; c0133e95 <_alloc_pages+19/20>
Trace; c01274e8 <do_no_page+2c/150>
Trace; c01276aa <handle_mm_fault+9e/130>
Trace; c0113489 <do_page_fault+1f9/510>
Trace; c0113290 <do_page_fault+0/510>
Trace; c013a536 <chrdev_open+a2/118>
Trace; c0128771 <do_munmap+55/2a8>
Trace; c0128fd8 <insert_vm_struct+9c/b4>
Trace; c0128b68 <do_brk+148/178>
Trace; c0139b38 <sys_read+98/d4>
Trace; c010721b <system_call+33/38>
Code; c02229c9 <stext_lock+917d/dbc9> 00000000 <_EIP>:
Code; c02229c9 <stext_lock+917d/dbc9> 0: 7e f5 jle fffffff7 <_EIP+0xfffffff7> c02229c0 <stext_lock+9174/dbc9> <=====
Code; c02229cb <stext_lock+917f/dbc9> 2: e9 57 08 f8 ff jmp fff8085e <_EIP+0xfff8085e> c01a3227 <blk_get_queue+b/9c>
Code; c02229d0 <stext_lock+9184/dbc9> 7: 80 3d e0 69 2e c0 00 cmpb $0x0,0xc02e69e0
Code; c02229d7 <stext_lock+918b/dbc9> e: f3 90 repz nop
Code; c02229d9 <stext_lock+918d/dbc9> 10: 7e f5 jle 7 <_EIP+0x7> c02229d0 <stext_lock+9184/dbc9>
Code; c02229db <stext_lock+918f/dbc9> 12: e9 54 00 00 00 jmp 6b <_EIP+0x6b> c0222a34 <stext_lock+91e8/dbc9>
> And then the other CPU joins the party ...
NMI Watchdog detected LOCKUP on CPU1, registers:
CPU: 1
EIP: 0010:[<c02229c0>]
EFLAGS: 00000082
eax: 00000306 ebx: 00000216 ecx: c32210e0 edx: 00102d81
esi: 00000306 edi: 000079bc ebp: c6b39d00 esp: c6b39cf8
ds: 0018 es: 0018 ss: 0018
Process syslogd (pid: 331, stackpage=c6b39000)
Stack: 00000001 c32210e0 c6b39d1c c01a42ca 00000306 c32210e0 00000002 00000001
000816c1 c6b39d38 c01a4385 00000001 c32210e0 c6acf618 00000001 00000400
c6b39d74 c01a460f 00000001 c32210e0 c6acf618 c32210e0 c02e86c8 00000400
Call Trace: [<c01a42ca>] [<c01a4385>] [<c01a460f>] [<c013c59f>] [<c0134231>] [<c01b0246>] [<c01afb30>]
[<c01b00b8>] [<c01b2059>] [<c01b6ca4>] [<c0134231>] [<c013de24>] [<c012cfe3>] [<c012d074>] [<c0139df5>]
[<c012cc38>] [<c0161726>] [<c0161713>] [<c013b77e>] [<c010721b>]
Code: 80 3d 0c 21 2f c0 00 f3 90 7e f5 e9 57 08 f8 ff 80 3d e0 69
>>EIP; c02229c0 <stext_lock+9174/dbc9> <=====
Trace; c01a42ca <generic_make_request+be/120>
Trace; c01a4385 <submit_bh+59/78>
Trace; c01a460f <ll_rw_block+21f/28c>
Trace; c013c59f <fsync_inode_buffers+11f/2b4>
Trace; c0134231 <nr_free_buffer_pages+11/58>
Trace; c01b0246 <ide_dmaproc+12a/1ec>
Trace; c01afb30 <ide_dma_intr+0/a0>
Trace; c01b00b8 <dma_timer_expiry+0/64>
Trace; c01b2059 <piix_dmaproc+15/2c>
Trace; c01b6ca4 <do_rw_disk+1bc/3b4>
Trace; c0134231 <nr_free_buffer_pages+11/58>
Trace; c013de24 <generic_commit_write+68/74>
Trace; c012cfe3 <generic_file_write+3ab/4c8>
Trace; c012d074 <generic_file_write+43c/4c8>
Trace; c0139df5 <do_readv_writev+1b1/220>
Trace; c012cc38 <generic_file_write+0/4c8>
Trace; c0161726 <ext2_fsync_inode+e/58>
Trace; c0161713 <ext2_sync_file+13/18>
Trace; c013b77e <sys_fsync+62/98>
Trace; c010721b <system_call+33/38>
Code; c02229c0 <stext_lock+9174/dbc9> 00000000 <_EIP>:
Code; c02229c0 <stext_lock+9174/dbc9> 0: 80 3d 0c 21 2f c0 00 cmpb $0x0,0xc02f210c <=====
Code; c02229c7 <stext_lock+917b/dbc9> 7: f3 90 repz nop
Code; c02229c9 <stext_lock+917d/dbc9> 9: 7e f5 jle 0 <_EIP>
Code; c02229cb <stext_lock+917f/dbc9> b: e9 57 08 f8 ff jmp fff80867 <_EIP+0xfff80867> c01a3227 <blk_get_queue+b/9c>
Code; c02229d0 <stext_lock+9184/dbc9> 10: 80 3d e0 69 00 00 00 cmpb $0x0,0x69e0
1 warning issued. Results may not be reliable.
I'll go ahead with the code bisection, since I can't explain what that error
call stuff is doing in the trace. Can you?
Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/