Something always seems to panic.. I haven't tried it recently
with the 2.1.6x series, though.
My system hardware is rock solid and stable, but also *very* fast.
Maybe a race condition somewhere?
-- mlord@pobox.com the Linux IDE guy
Brian Adams wrote: > > I have almost the same system, with dual 2940UW's, 1 Quantum 4.2GB, 2 > Seagate 4.1GB, Pentium 166, 128MB RAM. One IDE drive, but previously did > not have the ide drive. > > Red Hat Linux release 4.1 (Vanderbilt) > Kernel 2.0.27 on an i586 > > using 128MB for swap space (I always use the same as the RAM): > > /sbin/fdisk /dev/hda1 > Command (m for help): p > Disk /dev/hda: 64 heads, 63 sectors, 648 cylinders > Units = cylinders of 4032 * 512 bytes > Device Boot Begin Start End Blocks Id System > /dev/hda1 1 1 66 133024+ 82 Linux swap > /dev/hda2 67 67 648 1173312 83 Linux native > > I have had no problems for over one year. > While initially setting up the system I would get some kernal panic errors > relating thw the scsi drives. I determined them to be one of the following: > 1. bad scsi cable > 2. bad hard drive > 3. bad controller card > > Here are the things I recommend trying to fix your errors: > > 1. test both your scsi cards for the swap space, basically just swap > them, or remove the other one, which you did not removed in your > previous test. > 2. do the same for your scsi drives, move the swap space to your other drive. > 3. If you have a ide port on your system, put a ide drive on your system and > move the swap partition onto the ide drive. > > If my suggestions help you, maybe you can offer some help when I am ready > to set up raid on my system. > > Good Luck, > > Brian > > On Tue, 2 Dec 1997, David Mansfield wrote: > > > Hello, I've been trying to decide whether to set up a production web > > server/dial-in server using the RAID 1 mirroring. I've set it up and it > > seems to work OK but I've gotten a couple of oopses and a lot of > > interesting syslog kernel messages. I am close to suspecting bad memory, > > although the memtest (run about 10 times) doesn't show anything. The > > system looks like this: > > > > Pentium 150. > > 64 MB ram. > > No ide drives > > Adaptec 2940UW with 2x Quantum HD > > (Note: I was running with twin adapters for a while and trimmed to > > one, which didn't eliminate the problems) > > Kernel 2.0.32 with raid145-0.36.3-2.0.30.gz patch. > > (Note: although the patch is for 2.0.30 it applied cleanly...) > > raidtools-0.41 > > > > The rest is a stock RedHat 4.2 distribution. > > > > My tests are the following: > > --- test 1 --- > > cd /usr/src/linux > > while true; do make dep; make clean; make zImage; make modules; done > > --- test 2 --- > > while true; do cp -a /usr/src/linux /tmp/test; rm -r /tmp/test; done > > --- test 3 --- > > short c program that mallocs 10 mb and writes a random value to a random > > spot in this buffer (keeps all 10mb swapping) > > ---- > > > > I ran test1 + test2 + (7 x test3) to stress test the system. Note, since > > the system has only 64 MB this puts me about 20MB into swap. > > ~ > > Here are the results lots of these (for some reason my syslog has > > disappeared, but there are a number of these errors, at least 40 over the > > period of 12 hours) > > > > kernel: Internal error: bad swap device > > kernel: rw_swap_page: weirdness > > kernel: swap_free: weirdness > > kernel: Trying to free non-existant swap page > > kernel: Trying to swap to non swap device > > > > > > One of: > > Dec 2 10:40:08 tempiws kernel: Unable to handle kernel paging request at > > virtual address 081c8000 > > Dec 2 10:40:08 tempiws kernel: current->tss.cr3 = 039a9000, 8r3 = > > 039a9000 > > Dec 2 10:40:08 tempiws kernel: *pde = 00bbd067 > > Dec 2 10:40:08 tempiws kernel: *pte = 68747561 > > > > And three oops (first two copied by hand): > > CPU: 0 > > EIP: 0010 [<00123d7a>] > > EFLAGS: 00010246 > > eax: 00001800 ebx: 52565253 ecx: 0381944c edx: 00000c00 > > esi: 00000bc1 edi: 00000000 ebp: bffffe60 esp: 03993f84 > > ds:0018 es:0018 fs:002b gs:0026 ss:0018 > > Process update (pid: 305, process nr:27, stackpage:03993000) > > Stack 0031b810 00000000 00000000 00126c94 00000000 00000000 0031b810 > > 00000000 > > 00000000 0031b810 00126df1 0031b810 00000001 0010a86d 00000001 > > 00000000 > > 00000000 00000001 00000000 bffffe60 ffffffda 0000002b 0000002b > > 0000002b > > Call Trace: [<00126c94>] [<00126df1>] [<0010a86d>] > > general protection: 0000 > > > > and ksymoops says: > > Using `/usr/src/linux/System.map' to map addresses to symbols. > > > > >>EIP: 123d7a <sync_inodes+1e/58> > > Trace: 126c94 <sync_old_buffers+14/13c> > > Trace: 126df1 <sys_bdflush+35/98> > > Trace: 10a86d <system_call+55/7c> > > > > Second oops: > > CPU 0 > > EIP: 0010: [<0011ac2b>] > > EFLAGS: 00010246 > > eax: 00000000 ebx: 00fd2bfc ecx: 00000400 edx: 02001000 > > esi: 00fae660 edi: ds001000 ebp: 0009ad98 esp: 00006f58 > > ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 > > Process init (pid 1; process nr: 1; stackpage=00006000) > > Stack: 0011aa5x bfd98f68 00099414 00099414 fffffff3 00105025 0010a4f3 > > 000998b4 > > 00105025 00105025 00111624 00099414 0009ad98 bfd98000 00000001 00111508 > > 00000002 0804bccc bfd9906c 00099414 0377f618 bfd99720 0010a9d0 00006fbc > > Call Trace: [<0011aa5c>] [<0010a4f3>] [<00111624>] [<00111508>] > > [<0010a9d0>] > > Code: f3 ab 0b 55 0c 89 54 24 18 89 54 24 1c 8b 44 24 18 0c 40 89 > > > > and ksymoops says: > > Using `/usr/src/linux/System.map' to map addresses to symbols. > > > > >>EIP: 11ac2b <do_no_page+1cf/328> > > Trace: 11ac2b <do_no_page+1cf/328> > > Trace: 10a4f3 <handle_signal+5b/90> > > Trace: 111624 <do_page_fault+11c/310> > > Trace: 111624 <do_page_fault+11c/310> > > Trace: 10a9d0 <error_code+40/48> > > > > Code: 11ac2b <do_no_page+1cf/328> repz stosl %eax,%es:(%edi) > > Code: 11ac2d <do_no_page+1d1/328> orl 0xc(%ebp),%edx > > Code: 11ac30 <do_no_page+1d4/328> movl %edx,0x18(%esp,1) > > Code: 11ac34 <do_no_page+1d8/328> movl %edx,0x1c(%esp,1) > > Code: 11ac38 <do_no_page+1dc/328> movl 0x18(%esp,1),%eax > > Code: 11ac3c <do_no_page+1e0/328> orb $0x40,%al > > Code: 11ac3e <do_no_page+1e2/328> movl %eax,(%eax) > > Code: 11ac40 <do_no_page+1e4/328> nop > > Code: 11ac41 <do_no_page+1e5/328> nop > > Code: 11ac42 <do_no_page+1e6/328> nop > > > > Third oops (this one got logged so the symbols are already here): > > Oops: 0009 > > CPU: 0 > > EIP: 0010:[ext2_file_write+585/1116] > > EFLAGS: 00010216 > > eax: 028c8598 ebx: 00000400 ecx: 00000100 edx: 034f2400 > > esi: 081c8000 edi: 034f2400 ebp: 00000400 esp: 038efc04 > > ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 > > Process cc1 (pid: 4190, process nr: 33, stackpage=038ef000) > > Stack: 000c0000 0814b000 0814b000 0010b000 00000000 00000000 001d8dfc > > 0007d000 > > 00000000 00000210 00084000 00000000 028c8598 00000000 03bf8a00 > > 038efc90 > > 00eab500 00008180 03bf8a00 00bd9798 038efc90 00eab500 00125e1a > > 00bd9798 > > Call Trace: [do_coprocessor_segment_overrun+4/60] [__brelse+34/68] > > [ext2_create+ > > 341/360] [dump_write+28/44] [writenote+167/200] [dump_write+28/44] > > [elf_core_dum > > p+2488/2640] > > [do_no_page+620/808] [timer_bh+193/820] [do_signal+495/632] > > [signal_retur > > n+18/56] > > Code: 64 f3 a5 83 e3 03 89 d9 64 f3 a4 55 8b 54 24 34 8b 52 24 03 > > > > > > Does anyone have an opinion on whether RAID 1 is ready to play with the > > big boyz? Should I tuck this one away and try again in 6 months? Does > > it look like processor/memory weirdness? Other experiences and or > > comments welcome. > > > > David Mansfield > > david@cobite.com > > > > > > > > _/ _/ _/ _/ _/ _/ _/ Brian Adams > _/ _/ _/ _/_/ _/ _/ _/ adams@xws.com > _/ _/ _/ _/ _/ _/ XWS/Sawtooth Technologies > _/ _/ _/_/ _/_/ _/ _/ http://www.xws.com > _/ _/ _/ _/ _/ _/ 509-427-4865