[RESEND] 2.4.20: ext3: Assertion failure in journal_forget()/Oops

Andreas Steinmetz (ast@domdv.de)
Wed, 04 Dec 2002 21:27:31 +0100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Art Haas: "[PATCH] C99 initializer fix for drivers/video/vfb.c"
Previous message: Guillaume Boissiere: "[STATUS 2.5] December 4, 2002"

This is a multi-part message in MIME format.
--------------050108060303020109030508
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

It seems that either my previous post (below) was to vague or it either
got lost or ignored. Anyway I did hope for a spurious hardware error.

Unfortunately today I got an Oops from a completely different system
using ext3 on software raid 0 and raid 1 with data=ordered that again
points to a problem with ext3. The ksymoops output is attached. I'm
really beginning to get worried.

Below is my previous post.
--------------------------

This started to happen during larger (10MB-420MB) rsync based writes to
a striped ext3 partition (/dev/md11) residing on 4 scsi disks which is
mounted with defaults, i.e. data=ordered (rsync over 100Mbps link):

Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114696
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114697
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114700
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114701
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114702
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114706

<snip>

Dec 1 22:17:55 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_free_blocks
: Freeing blocks in system zones - Block = 573501, count = 2
Dec 1 22:17:55 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_free_blocks
: Freeing blocks in system zones - Block = 573552, count = 14
Dec 1 22:17:55 pollux kernel: Assertion failure in journal_forget() at
transaction.c:1225: "!jh->b_committed_data"

Trying to access the partition resulted in processes hanging in D state:

5336 pts/0 D 0:00 ls -a -N --color=tty -T 0 -l /mnt/data8

e2fstools version is 1.32 and the partition was created with this
version using 'mke2fs -j -b 2048 -i 4096 -R stride=16 /dev/md11'.

An earlier dump of the partition data using tune2fs -l gave:

tune2fs 1.32 (09-Nov-2002)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 7c8d7827-4b25-40ab-a3b8-1c4c6e286868
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype needs_recovery sparse_super
Default mount options: (none)
Filesystem state: clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 2621440
Block count: 5236992
Reserved block count: 261849
Free blocks: 4855697
Free inodes: 2621416
First block: 0
Block size: 2048
Fragment size: 2048
Blocks per group: 16384
Fragments per group: 16384
Inodes per group: 8192
Inode blocks per group: 512
Last mount time: Sat Nov 30 11:23:59 2002
Last write time: Sun Dec 1 14:09:55 2002
Mount count: 2
Maximum mount count: -1
Last checked: Fri Dec 1 19:18:16 2000
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal UUID: <none>
Journal inode: 8
Journal device: 0x0000
First orphan inode: 0

Trying 'e2fsck -y /dev/md11' after a reboot showed so many errors and
continued to run for minutes that I aborted e2fsck and do assume that
the file system was completely destroyed.

After recreation of the filesystem on /dev/md11 a rsync run completed
without errors.

As a side note: the system having the rsync sources has an identical
formatted partition (the systems are hardware twins) and doesn't show
any errors.

Some final information about the raid configuration of /dev/md11:

raiddev /dev/md11
raid-level 0
nr-raid-disks 4
nr-spare-disks 0
chunk-size 32
persistent-superblock 1
device /dev/sda13
raid-disk 0
device /dev/sdb13
raid-disk 1
device /dev/sdc15
raid-disk 2
device /dev/sdd15
raid-disk 3

-- 
Andreas Steinmetz
D.O.M. Datenverarbeitung GmbH



--------------050108060303020109030508
Content-Type: text/plain;
 name="ext3-oops.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="ext3-oops.txt"

ksymoops 2.4.8 on i686 2.4.20.  Options used
     -V (default)
     -k /proc/ksyms (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20/ (default)
     -m /boot/System.map (default)

Dec  4 16:19:59 tux kernel: Unable to handle kernel paging request at virtual address 6574616c
Dec  4 16:19:59 tux kernel: c0185102
Dec  4 16:19:59 tux kernel: *pde = 00000000
Dec  4 16:19:59 tux kernel: Oops: 0000
Dec  4 16:19:59 tux kernel: CPU:    0
Dec  4 16:19:59 tux kernel: EIP:    0010:[<c0185102>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Dec  4 16:19:59 tux kernel: EFLAGS: 00013a87
Dec  4 16:19:59 tux kernel: eax: 00000001   ebx: 65746144   ecx: c12c09bc   edx: 65746144
Dec  4 16:19:59 tux kernel: esi: c0007e00   edi: 00000001   ebp: 00000000   esp: c1595ef0
Dec  4 16:19:59 tux kernel: ds: 0018   es: 0018   ss: 0018
Dec  4 16:19:59 tux kernel: Process kswapd (pid: 5, stackpage=c1595000)
Dec  4 16:19:59 tux kernel: Stack: c010bb08 c0512890 c102c01c 00003202 00003296 00000000 c12c09bc 000001d0 
Dec  4 16:19:59 tux kernel:        00003d6d c0512890 c017bccd c4119680 c12c09bc 000001d0 c014d672 c12c09bc 
Dec  4 16:19:59 tux kernel:        000001d0 00000000 c12c09bc c0140895 c12c09bc 000001d0 c1594000 00000200 
Dec  4 16:19:59 tux kernel: Call Trace:    [<c010bb08>] [<c017bccd>] [<c014d672>] [<c0140895>] [<c0140a93>]
Dec  4 16:19:59 tux kernel:   [<c0140b16>] [<c0140c2c>] [<c0140ca8>] [<c0140ddd>] [<c0105000>] [<c010578e>]
Dec  4 16:19:59 tux kernel:   [<c0140d40>]
Dec  4 16:19:59 tux kernel: Code: 8b 5b 28 f6 42 19 02 75 26 39 f3 75 f1 f7 44 24 34 50 00 00 


>>EIP; c0185102 <journal_try_to_free_buffers+32/c0>   <=====

>>ecx; c12c09bc <_end+d15204/2038c8a8>
>>esp; c1595ef0 <_end+fea738/2038c8a8>

Trace; c010bb08 <call_do_IRQ+5/d>
Trace; c017bccd <ext3_releasepage+2d/40>
Trace; c014d672 <try_to_release_page+52/70>
Trace; c0140895 <shrink_cache+265/310>
Trace; c0140a93 <shrink_caches+63/b0>
Trace; c0140b16 <try_to_free_pages_zone+36/50>
Trace; c0140c2c <kswapd_balance_pgdat+5c/b0>
Trace; c0140ca8 <kswapd_balance+28/40>
Trace; c0140ddd <kswapd+9d/c0>
Trace; c0105000 <_stext+0/0>
Trace; c010578e <kernel_thread+2e/40>
Trace; c0140d40 <kswapd+0/c0>

Code;  c0185102 <journal_try_to_free_buffers+32/c0>
00000000 <_EIP>:
Code;  c0185102 <journal_try_to_free_buffers+32/c0>   <=====
   0:   8b 5b 28                  mov    0x28(%ebx),%ebx   <=====
Code;  c0185105 <journal_try_to_free_buffers+35/c0>
   3:   f6 42 19 02               testb  $0x2,0x19(%edx)
Code;  c0185109 <journal_try_to_free_buffers+39/c0>
   7:   75 26                     jne    2f <_EIP+0x2f>
Code;  c018510b <journal_try_to_free_buffers+3b/c0>
   9:   39 f3                     cmp    %esi,%ebx
Code;  c018510d <journal_try_to_free_buffers+3d/c0>
   b:   75 f1                     jne    fffffffe <_EIP+0xfffffffe>
Code;  c018510f <journal_try_to_free_buffers+3f/c0>
   d:   f7 44 24 34 50 00 00      testl  $0x50,0x34(%esp,1)
Code;  c0185116 <journal_try_to_free_buffers+46/c0>
  14:   00 


--------------050108060303020109030508--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Next message: Art Haas: "[PATCH] C99 initializer fix for drivers/video/vfb.c"
Previous message: Guillaume Boissiere: "[STATUS 2.5] December 4, 2002"