Re: 2.5.18 / ext3 / oracle trouble

Zlatko Calusic (zlatko.calusic@iskon.hr)
Mon, 27 May 2002 10:43:57 +0200


Andrew Morton <akpm@zip.com.au> writes:

> Christoph Rohland wrote:
>>
>> Hi Zlatko,
>>
>> On Sun, 26 May 2002, Zlatko Calusic wrote:
>> > Hi!
>> >
>> > After lots of testing, I can say that 2.5.18 works great in all
>> > three modes of ext3 for all but one purpose. Oracle database still
>> > gets corrupted during insert load. More precisely, online redo log
>> > gets corrupted, database panics and restore is in order.
>> >
>> > This leads me to thinking that there's something wrong with sysv
>> > shared memory in 2.5.x. Although the problem could also be in
>> > fsync() or swap_out() & co. paths, it's yet to be discovered.
>> >
>> > It could also be that journaled mode helps the trouble, and it could
>> > be that some swapping makes it more certain, but none of these two
>> > facts are proved for sure. Take it as an observation.
>> >
>> > Christoph, I don't know if you're still taking care of shmem in
>> > 2.5.x, so take my apologies if you didn't want to see this email.
>> >
>> > Regards,
>> > --
>> > Zlatko
>>
>> Unfortunately I do not have the time to work on shmem right now. Hugh
>> Dickins is the right guy to contact nowadays.
>>
>
> Most likely suspect here is the heavy fsync() load is triggering
> some timing problem in ext3 - it'll be pushing the commits though
> at high rate.
>
> I'll teach fsx-linux (great test app, btw) about fsync() and see
> how it stands up. And if Zlatko can retest on ext2 that would be a
> big help.
>

This is just a short notice so that you know I'm working on it.

I did some testing last evening, but I need to do some more
comprehensive ones before any meaningful conclusion.

1 test: compiled ext2 in, mounted partitions as ext2, tests passed
(no corruption)
2 test: rebooted, mounted as ext3(journal/writeback). This time even
ext3 passed tests, so I got confused :)
3 test: pushed things harder on ext3, machine started swapping,
restarted tests and finally it choked (some kind of smon
non-fatal error 1/100, problem with writing scn, and instance
shutdown)

Obviously I need to perform tests on ext2 with swap load, and repeat
them few times. Will do this evening (it takes some time to recover a
database after a corruption, so it's slightly time consuming).

Later,

-- 
Zlatko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/