[RESEND] 2.4.20: ext3: Assertion failure in journal_forget()/Oops

Andreas Steinmetz (ast@domdv.de)
Wed, 04 Dec 2002 21:27:31 +0100


This is a multi-part message in MIME format.
--------------050108060303020109030508
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

It seems that either my previous post (below) was to vague or it either
got lost or ignored. Anyway I did hope for a spurious hardware error.

Unfortunately today I got an Oops from a completely different system
using ext3 on software raid 0 and raid 1 with data=ordered that again
points to a problem with ext3. The ksymoops output is attached. I'm
really beginning to get worried.

Below is my previous post.
--------------------------

This started to happen during larger (10MB-420MB) rsync based writes to
a striped ext3 partition (/dev/md11) residing on 4 scsi disks which is
mounted with defaults, i.e. data=ordered (rsync over 100Mbps link):

Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114696
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114697
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114700
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114701
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114702
Dec 1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114706

<snip>

Dec 1 22:17:55 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_free_blocks
: Freeing blocks in system zones - Block = 573501, count = 2
Dec 1 22:17:55 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_free_blocks
: Freeing blocks in system zones - Block = 573552, count = 14
Dec 1 22:17:55 pollux kernel: Assertion failure in journal_forget() at
transaction.c:1225: "!jh->b_committed_data"

Trying to access the partition resulted in processes hanging in D state:

5336 pts/0 D 0:00 ls -a -N --color=tty -T 0 -l /mnt/data8

e2fstools version is 1.32 and the partition was created with this
version using 'mke2fs -j -b 2048 -i 4096 -R stride=16 /dev/md11'.

An earlier dump of the partition data using tune2fs -l gave:

tune2fs 1.32 (09-Nov-2002)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 7c8d7827-4b25-40ab-a3b8-1c4c6e286868
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype needs_recovery sparse_super
Default mount options: (none)
Filesystem state: clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 2621440
Block count: 5236992
Reserved block count: 261849
Free blocks: 4855697
Free inodes: 2621416
First block: 0
Block size: 2048
Fragment size: 2048
Blocks per group: 16384
Fragments per group: 16384
Inodes per group: 8192
Inode blocks per group: 512
Last mount time: Sat Nov 30 11:23:59 2002
Last write time: Sun Dec 1 14:09:55 2002
Mount count: 2
Maximum mount count: -1
Last checked: Fri Dec 1 19:18:16 2000
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal UUID: <none>
Journal inode: 8
Journal device: 0x0000
First orphan inode: 0

Trying 'e2fsck -y /dev/md11' after a reboot showed so many errors and
continued to run for minutes that I aborted e2fsck and do assume that
the file system was completely destroyed.

After recreation of the filesystem on /dev/md11 a rsync run completed
without errors.

As a side note: the system having the rsync sources has an identical
formatted partition (the systems are hardware twins) and doesn't show
any errors.

Some final information about the raid configuration of /dev/md11:

raiddev /dev/md11
raid-level 0
nr-raid-disks 4
nr-spare-disks 0
chunk-size 32
persistent-superblock 1
device /dev/sda13
raid-disk 0
device /dev/sdb13
raid-disk 1
device /dev/sdc15
raid-disk 2
device /dev/sdd15
raid-disk 3

-- 
Andreas Steinmetz
D.O.M. Datenverarbeitung GmbH

--------------050108060303020109030508 Content-Type: text/plain; name="ext3-oops.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ext3-oops.txt"

ksymoops 2.4.8 on i686 2.4.20. Options used -V (default) -k /proc/ksyms (specified) -l /proc/modules (default) -o /lib/modules/2.4.20/ (default) -m /boot/System.map (default)

Dec 4 16:19:59 tux kernel: Unable to handle kernel paging request at virtual address 6574616c Dec 4 16:19:59 tux kernel: c0185102 Dec 4 16:19:59 tux kernel: *pde = 00000000 Dec 4 16:19:59 tux kernel: Oops: 0000 Dec 4 16:19:59 tux kernel: CPU: 0 Dec 4 16:19:59 tux kernel: EIP: 0010:[<c0185102>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 Dec 4 16:19:59 tux kernel: EFLAGS: 00013a87 Dec 4 16:19:59 tux kernel: eax: 00000001 ebx: 65746144 ecx: c12c09bc edx: 65746144 Dec 4 16:19:59 tux kernel: esi: c0007e00 edi: 00000001 ebp: 00000000 esp: c1595ef0 Dec 4 16:19:59 tux kernel: ds: 0018 es: 0018 ss: 0018 Dec 4 16:19:59 tux kernel: Process kswapd (pid: 5, stackpage=c1595000) Dec 4 16:19:59 tux kernel: Stack: c010bb08 c0512890 c102c01c 00003202 00003296 00000000 c12c09bc 000001d0 Dec 4 16:19:59 tux kernel: 00003d6d c0512890 c017bccd c4119680 c12c09bc 000001d0 c014d672 c12c09bc Dec 4 16:19:59 tux kernel: 000001d0 00000000 c12c09bc c0140895 c12c09bc 000001d0 c1594000 00000200 Dec 4 16:19:59 tux kernel: Call Trace: [<c010bb08>] [<c017bccd>] [<c014d672>] [<c0140895>] [<c0140a93>] Dec 4 16:19:59 tux kernel: [<c0140b16>] [<c0140c2c>] [<c0140ca8>] [<c0140ddd>] [<c0105000>] [<c010578e>] Dec 4 16:19:59 tux kernel: [<c0140d40>] Dec 4 16:19:59 tux kernel: Code: 8b 5b 28 f6 42 19 02 75 26 39 f3 75 f1 f7 44 24 34 50 00 00

>>EIP; c0185102 <journal_try_to_free_buffers+32/c0> <=====

>>ecx; c12c09bc <_end+d15204/2038c8a8> >>esp; c1595ef0 <_end+fea738/2038c8a8>

Trace; c010bb08 <call_do_IRQ+5/d> Trace; c017bccd <ext3_releasepage+2d/40> Trace; c014d672 <try_to_release_page+52/70> Trace; c0140895 <shrink_cache+265/310> Trace; c0140a93 <shrink_caches+63/b0> Trace; c0140b16 <try_to_free_pages_zone+36/50> Trace; c0140c2c <kswapd_balance_pgdat+5c/b0> Trace; c0140ca8 <kswapd_balance+28/40> Trace; c0140ddd <kswapd+9d/c0> Trace; c0105000 <_stext+0/0> Trace; c010578e <kernel_thread+2e/40> Trace; c0140d40 <kswapd+0/c0>

Code; c0185102 <journal_try_to_free_buffers+32/c0> 00000000 <_EIP>: Code; c0185102 <journal_try_to_free_buffers+32/c0> <===== 0: 8b 5b 28 mov 0x28(%ebx),%ebx <===== Code; c0185105 <journal_try_to_free_buffers+35/c0> 3: f6 42 19 02 testb $0x2,0x19(%edx) Code; c0185109 <journal_try_to_free_buffers+39/c0> 7: 75 26 jne 2f <_EIP+0x2f> Code; c018510b <journal_try_to_free_buffers+3b/c0> 9: 39 f3 cmp %esi,%ebx Code; c018510d <journal_try_to_free_buffers+3d/c0> b: 75 f1 jne fffffffe <_EIP+0xfffffffe> Code; c018510f <journal_try_to_free_buffers+3f/c0> d: f7 44 24 34 50 00 00 testl $0x50,0x34(%esp,1) Code; c0185116 <journal_try_to_free_buffers+46/c0> 14: 00

--------------050108060303020109030508--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/