What puzzles me is the sequence shown in the second oops of
blkdev_release_request leading to a schedule. Possibly via an
interrupt? You yourself said that the function cannot sleep. The oops
was triggered my my detection code placed in schedule which fires
on just such an event.
>>EIP; c011426a <schedule+142/8f4>   <=====
Trace; c011b064 <exit_notify+178/2a8>
Trace; c011b548 <do_exit+3b4/3d4>
Trace; c0107a38 <do_divide_error+0/9c>
Trace; c0113683 <do_page_fault+3f3/510>
Trace; c0113290 <do_page_fault+0/510>
Trace; c01d91ad <memcpy_toiovec+35/64>
Trace; c01d7c4c <skb_release_data+68/74>
Trace; c0113cc3 <reschedule_idle+63/214>
Trace; c01ef15d <tcp_recvmsg+839/a58>                 <-- urr?
Trace; c0107304 <error_code+34/3c>                    <-- huh?
Trace; c01a3c17 <blkdev_release_request+df/1b4>
Trace; c01a4807 <end_that_request_last+a3/130>
Trace; c0134176 <__free_pages+1e/24>
Trace; c01341a3 <free_pages+27/2c>
Trace; c014c51d <do_select+1e5/1fc>
Trace; c0122071 <sys_rt_sigaction+89/d8>
Trace; c0143842 <blkdev_ioctl+2e/34>
Trace; c014b9a9 <sys_ioctl+1f9/280>
Trace; c010721b <system_call+33/38>
> "Andrew Morton wrote:"
> > Yes, there are often interrupt entrails on the stack.  If you can,
> > generate that trace and send it....
> 
> > It may simplify your oops tracing to remove all the `inline'
> > qualifiers in ll_rw_blk.c, BTW.  They tend to obscure things.
> > They should in fact be taken out permanently - someone went
> > absolutely insane there.
(I un-inlined all of ll_rw_blk.c)
> What [also] puzzles me is that I seem to get an oops immediately preceding
>  theBUG() oops that I put in to check for scheduling with the
> io_request_lock held.  Without the extra code added in for the
> debugging, this strange first oops does not happen.
> 
> So there's the strange oops, then the BUG() oops, then the NMI watchdog
> kicks in, and syslog oopses as well. That shows why I was never
> able to log it.
First oops, possibly second in real time order, but it came up on
screen first. I am at a loss to explain it.
ksymoops 2.4.1 on i686 2.4.6-SMP-XFS.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.6-SMP-XFS/ (default)
     -m /boot/System.map-2.4.6-SMP-XFS (default)
     -S
Unable to handle kernel paging request at virtual address 00002004
c01a3c17
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[<c01a3c17>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010046
eax: c0366dec   ebx: c7371ce0   ecx: c7371cec   edx: 00002000
esi: c0366dbc   edi: 00000004   ebp: c31dbe28   esp: c31dbe1c
ds: 0018   es: 0018   ss: 0018
Process my-client (pid: 929, stackpage=c31db000)
Stack: c02e69e0 c7371ce0 c7371ce0 c31dbe3c c01a4807 c7371ce0 c88d3940 00000001 
       c31dbe7c c88cb5c8 c7371ce0 00000001 c88d0a07 c88cb592 c7371ce0 400fa078 
       c31dbf2c c3070aec c7371ce0 400fa078 c31dbf2c 00000246 00000001 00000000 
Call Trace: [<c01a4807>] [<c0134176>] [<c01341a3>] [<c014c51d>] [<c0122071>] [<c0143842>] [<c014b9a9>] 
       [<c010721b>] 
Code: 89 4a 04 89 53 0c 89 41 04 89 08 8d 0c bd 00 00 00 00 8d 46 
>>EIP; c01a3c17 <blkdev_release_request+df/1b4>   <=====
Trace; c01a4807 <end_that_request_last+a3/130>
Trace; c0134176 <__free_pages+1e/24>
Trace; c01341a3 <free_pages+27/2c>
Trace; c014c51d <do_select+1e5/1fc>
Trace; c0122071 <sys_rt_sigaction+89/d8>
Trace; c0143842 <blkdev_ioctl+2e/34>
Trace; c014b9a9 <sys_ioctl+1f9/280>
Trace; c010721b <system_call+33/38>
Code;  c01a3c17 <blkdev_release_request+df/1b4> 00000000 <_EIP>:
Code;  c01a3c17 <blkdev_release_request+df/1b4>    0:   89 4a 04                  mov    %ecx,0x4(%edx)   <=====
Code;  c01a3c1a <blkdev_release_request+e2/1b4>    3:   89 53 0c                  mov    %edx,0xc(%ebx)
Code;  c01a3c1d <blkdev_release_request+e5/1b4>    6:   89 41 04                  mov    %eax,0x4(%ecx)
Code;  c01a3c20 <blkdev_release_request+e8/1b4>    9:   89 08                     mov    %ecx,(%eax)
Code;  c01a3c22 <blkdev_release_request+ea/1b4>    b:   8d 0c bd 00 00 00 00      lea    0x0(,%edi,4),%ecx
Code;  c01a3c29 <blkdev_release_request+f1/1b4>   12:   8d 46 00                  lea    0x0(%esi),%eax
> At this point in the dmesg output, my printk appears:
> 
>   Scheduling in process 929 with io lock held on cpu 1 by process 929 in function
> blkdev_release_request() ll_rw_blk.c:584
> 
> and that appears to me to be pretty accurate. The line number won't be
> the same in your copy, as I addded debugging code. It's the entry point
> of the function - I didn't do any more detailed code bisection in this test.
> 
> The fact that that output appeared indicates that the [diagnosis] is
> accurate (I won't go into why, but there is a spinlock protecting changes
> in the reported variables values, and they can also only be nonzero while
> the io spinlock is held, so the output indicates that we run schedule
> while the io spinlock is being held on the same cpu by the indicated
> function).
> 
> Now here's the next one .. this is the BUG() I inserted to show
> when we shchedule with the io spinlock held.
kernel BUG at sched.c:576!
invalid operand: 0000
CPU:    1
EIP:    0010:[<c011426a>]
EFLAGS: 00010282
eax: 0000001b   ebx: c02e69e0   ecx: 00000001   edx: 00000001
esi: c31da000   edi: c31da674   ebp: c31dbcf8   esp: c31dbcb8
ds: 0018   es: 0018   ss: 0018
Process my-client (pid: 929, stackpage=c31db000)
Stack: c02300d6 c02301f9 00000240 c122af20 c31da000 c31da674 c31d0018 c31d0018 
       c31dbcf8 c011b064 c31da000 00000011 c122af20 00000001 c31da000 c0339420 
       c31dbd0c c011b548 00000046 00000000 c31da000 c31dbd28 c0107a38 0000000b 
Call Trace: [<c011b064>] [<c011b548>] [<c0107a38>] [<c0113683>] [<c0113290>] [<c01d91ad>] [<c01d7c4c>] 
       [<c0113cc3>] [<c01ef15d>] [<c0107304>] [<c01a3c17>] [<c01a4807>] [<c0134176>] [<c01341a3>] [<c014c51d>] 
       [<c0122071>] [<c0143842>] [<c014b9a9>] [<c010721b>] 
Code: 0f 0b e9 9f 07 00 00 81 fb 0c 21 2f c0 75 1e f0 fe 0d 0c 21 
>>EIP; c011426a <schedule+142/8f4>   <=====
Trace; c011b064 <exit_notify+178/2a8>
Trace; c011b548 <do_exit+3b4/3d4>
Trace; c0107a38 <do_divide_error+0/9c>
Trace; c0113683 <do_page_fault+3f3/510>
Trace; c0113290 <do_page_fault+0/510>
Trace; c01d91ad <memcpy_toiovec+35/64>
Trace; c01d7c4c <skb_release_data+68/74>
Trace; c0113cc3 <reschedule_idle+63/214>
Trace; c01ef15d <tcp_recvmsg+839/a58>
Trace; c0107304 <error_code+34/3c>
Trace; c01a3c17 <blkdev_release_request+df/1b4>
Trace; c01a4807 <end_that_request_last+a3/130>
Trace; c0134176 <__free_pages+1e/24>
Trace; c01341a3 <free_pages+27/2c>
Trace; c014c51d <do_select+1e5/1fc>
Trace; c0122071 <sys_rt_sigaction+89/d8>
Trace; c0143842 <blkdev_ioctl+2e/34>
Trace; c014b9a9 <sys_ioctl+1f9/280>
Trace; c010721b <system_call+33/38>
Code;  c011426a <schedule+142/8f4>             00000000 <_EIP>:
Code;  c011426a <schedule+142/8f4>                0:   0f 0b                     ud2a      <=====
Code;  c011426c <schedule+144/8f4>                2:   e9 9f 07 00 00            jmp    7a6 <_EIP+0x7a6> c0114a10 <schedule+8e8/8f4>
Code;  c0114271 <schedule+149/8f4>                7:   81 fb 0c 21 2f c0         cmp    $0xc02f210c,%ebx
Code;  c0114277 <schedule+14f/8f4>                d:   75 1e                     jne    2d <_EIP+0x2d> c0114297 <schedule+16f/8f4>
Code;  c0114279 <schedule+151/8f4>                f:   f0 fe 0d 0c 21 00 00      lock decb 0x210c
> Then comes what appears to be an NMI watchdog output ...
NMI Watchdog detected LOCKUP on CPU0, registers:
CPU:    0
EIP:    0010:[<c02229c9>]
EFLAGS: 00000082
eax: 00002b00   ebx: 00000206   ecx: c2f92ea0   edx: 003ffff9
esi: 00002b00   edi: 00000018   ebp: c321dcc8   esp: c321dcc0
ds: 0018   es: 0018   ss: 0018
Process dd (pid: 1022, stackpage=c321d000)
Stack: 00000001 c2f92ea0 c321dce4 c01a42ca 00002b00 c2f92ea0 00000008 00000000 
       00200000 c321dd00 c01a4385 00000000 c2f92ea0 00000002 00000002 00001000 
       c321dd3c c01a460f 00000000 c2f92ea0 00000002 c321de98 00000000 00001000 
Call Trace: [<c01a42ca>] [<c01a4385>] [<c01a460f>] [<c0142b86>] [<c01ac446>] [<c0114692>] [<c0142eff>] 
       [<c0133f06>] [<c012b5b2>] [<c0133e95>] [<c01274e8>] [<c01276aa>] [<c0113489>] [<c0113290>] [<c013a536>] 
       [<c0128771>] [<c0128fd8>] [<c0128b68>] [<c0139b38>] [<c010721b>] 
Code: 7e f5 e9 57 08 f8 ff 80 3d e0 69 2e c0 00 f3 90 7e f5 e9 54 
>>EIP; c02229c9 <stext_lock+917d/dbc9>   <=====
Trace; c01a42ca <generic_make_request+be/120>
Trace; c01a4385 <submit_bh+59/78>
Trace; c01a460f <ll_rw_block+21f/28c>
Trace; c0142b86 <block_read+3b2/580>
Trace; c01ac446 <ide_do_request+2ce/320>
Trace; c0114692 <schedule+56a/8f4>
Trace; c0142eff <bdget+83/1e8>
Trace; c0133f06 <__alloc_pages+6a/270>
Trace; c012b5b2 <filemap_nopage+aa/440>
Trace; c0133e95 <_alloc_pages+19/20>
Trace; c01274e8 <do_no_page+2c/150>
Trace; c01276aa <handle_mm_fault+9e/130>
Trace; c0113489 <do_page_fault+1f9/510>
Trace; c0113290 <do_page_fault+0/510>
Trace; c013a536 <chrdev_open+a2/118>
Trace; c0128771 <do_munmap+55/2a8>
Trace; c0128fd8 <insert_vm_struct+9c/b4>
Trace; c0128b68 <do_brk+148/178>
Trace; c0139b38 <sys_read+98/d4>
Trace; c010721b <system_call+33/38>
Code;  c02229c9 <stext_lock+917d/dbc9>         00000000 <_EIP>:
Code;  c02229c9 <stext_lock+917d/dbc9>            0:   7e f5                     jle    fffffff7 <_EIP+0xfffffff7> c02229c0 <stext_lock+9174/dbc9>   <=====
Code;  c02229cb <stext_lock+917f/dbc9>            2:   e9 57 08 f8 ff            jmp    fff8085e <_EIP+0xfff8085e> c01a3227 <blk_get_queue+b/9c>
Code;  c02229d0 <stext_lock+9184/dbc9>            7:   80 3d e0 69 2e c0 00      cmpb   $0x0,0xc02e69e0
Code;  c02229d7 <stext_lock+918b/dbc9>            e:   f3 90                     repz nop 
Code;  c02229d9 <stext_lock+918d/dbc9>           10:   7e f5                     jle    7 <_EIP+0x7> c02229d0 <stext_lock+9184/dbc9>
Code;  c02229db <stext_lock+918f/dbc9>           12:   e9 54 00 00 00            jmp    6b <_EIP+0x6b> c0222a34 <stext_lock+91e8/dbc9>
> And then the other CPU joins the party ...
NMI Watchdog detected LOCKUP on CPU1, registers:
CPU:    1
EIP:    0010:[<c02229c0>]
EFLAGS: 00000082
eax: 00000306   ebx: 00000216   ecx: c32210e0   edx: 00102d81
esi: 00000306   edi: 000079bc   ebp: c6b39d00   esp: c6b39cf8
ds: 0018   es: 0018   ss: 0018
Process syslogd (pid: 331, stackpage=c6b39000)
Stack: 00000001 c32210e0 c6b39d1c c01a42ca 00000306 c32210e0 00000002 00000001 
       000816c1 c6b39d38 c01a4385 00000001 c32210e0 c6acf618 00000001 00000400 
       c6b39d74 c01a460f 00000001 c32210e0 c6acf618 c32210e0 c02e86c8 00000400 
Call Trace: [<c01a42ca>] [<c01a4385>] [<c01a460f>] [<c013c59f>] [<c0134231>] [<c01b0246>] [<c01afb30>] 
       [<c01b00b8>] [<c01b2059>] [<c01b6ca4>] [<c0134231>] [<c013de24>] [<c012cfe3>] [<c012d074>] [<c0139df5>] 
       [<c012cc38>] [<c0161726>] [<c0161713>] [<c013b77e>] [<c010721b>] 
Code: 80 3d 0c 21 2f c0 00 f3 90 7e f5 e9 57 08 f8 ff 80 3d e0 69 
>>EIP; c02229c0 <stext_lock+9174/dbc9>   <=====
Trace; c01a42ca <generic_make_request+be/120>
Trace; c01a4385 <submit_bh+59/78>
Trace; c01a460f <ll_rw_block+21f/28c>
Trace; c013c59f <fsync_inode_buffers+11f/2b4>
Trace; c0134231 <nr_free_buffer_pages+11/58>
Trace; c01b0246 <ide_dmaproc+12a/1ec>
Trace; c01afb30 <ide_dma_intr+0/a0>
Trace; c01b00b8 <dma_timer_expiry+0/64>
Trace; c01b2059 <piix_dmaproc+15/2c>
Trace; c01b6ca4 <do_rw_disk+1bc/3b4>
Trace; c0134231 <nr_free_buffer_pages+11/58>
Trace; c013de24 <generic_commit_write+68/74>
Trace; c012cfe3 <generic_file_write+3ab/4c8>
Trace; c012d074 <generic_file_write+43c/4c8>
Trace; c0139df5 <do_readv_writev+1b1/220>
Trace; c012cc38 <generic_file_write+0/4c8>
Trace; c0161726 <ext2_fsync_inode+e/58>
Trace; c0161713 <ext2_sync_file+13/18>
Trace; c013b77e <sys_fsync+62/98>
Trace; c010721b <system_call+33/38>
Code;  c02229c0 <stext_lock+9174/dbc9>         00000000 <_EIP>:
Code;  c02229c0 <stext_lock+9174/dbc9>            0:   80 3d 0c 21 2f c0 00      cmpb   $0x0,0xc02f210c   <=====
Code;  c02229c7 <stext_lock+917b/dbc9>            7:   f3 90                     repz nop 
Code;  c02229c9 <stext_lock+917d/dbc9>            9:   7e f5                     jle    0 <_EIP>
Code;  c02229cb <stext_lock+917f/dbc9>            b:   e9 57 08 f8 ff            jmp    fff80867 <_EIP+0xfff80867> c01a3227 <blk_get_queue+b/9c>
Code;  c02229d0 <stext_lock+9184/dbc9>           10:   80 3d e0 69 00 00 00      cmpb   $0x0,0x69e0
1 warning issued.  Results may not be reliable.
I'll go ahead with the code bisection, since I can't explain what that error
call stuff is doing in the trace. Can you?
Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/