I think I finally have and idea of what's happening. The major device # may be confusing the RAID bootup process.
During boot the devices are listed properly but then they don't all get bound.
Feb 28 05:42:48 yeti kernel: sda: sda1
Feb 28 05:42:48 yeti kernel: sdb: sdb1
Feb 28 05:42:48 yeti kernel: sdc: sdc1
Feb 28 05:42:48 yeti kernel: sdd: sdd1
Feb 28 05:42:48 yeti kernel: sde: sde1
Feb 28 05:42:48 yeti kernel: sdf: sdf1
Feb 28 05:42:48 yeti kernel: sdg: sdg1
Feb 28 05:42:48 yeti kernel: sdh: sdh1
Feb 28 05:42:48 yeti kernel: sdi: sdi1
Feb 28 05:42:48 yeti kernel: sdj: sdj1
Feb 28 05:42:48 yeti kernel: sdk: sdk1
Feb 28 05:42:48 yeti kernel: sdl: sdl1
Feb 28 05:42:48 yeti kernel: sdm: sdm1
Feb 28 05:42:48 yeti kernel: sdn: sdn1
Feb 28 05:42:48 yeti kernel: sdo: sdo1
Feb 28 05:42:48 yeti kernel: sdp: sdp1
Feb 28 05:42:48 yeti kernel: sdq: sdq1
Feb 28 05:42:48 yeti kernel: sdr: sdr1
Feb 28 05:42:48 yeti kernel: sds: sds1
Feb 28 05:42:48 yeti kernel: sdt: sdt1
Feb 28 05:42:49 yeti kernel: [events: 000000bc]
Feb 28 05:42:49 yeti kernel: md: bind<sdo1,1>
Feb 28 05:42:49 yeti kernel: [events: 000000bc]
Feb 28 05:42:49 yeti kernel: md: bind<sdp1,2>
Feb 28 05:42:49 yeti kernel: [events: 000000bc]
Feb 28 05:42:49 yeti kernel: md: bind<sdn1,3>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdb1,1>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdc1,2>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sda1,3>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sde1,4>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdf1,5>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdg1,6>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdh1,7>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdi1,8>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdj1,9>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdk1,10>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdl1,11>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdd1,12>
Feb 28 05:42:49 yeti kernel: [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdm1,13>
Note that sdq,sdr,sds , and sdt do not get an md: bind entry
The raid ends up in /proc/mdstat with just sdn,o,p in it.
I then stop and restart the array and everything is OK.
Feb 28 05:53:30 yeti kernel: md: md4 stopped.
Feb 28 05:53:30 yeti kernel: md: unbind<sdn1,2>
Feb 28 05:53:30 yeti kernel: md: export_rdev(sdn1)
Feb 28 05:53:30 yeti kernel: md: unbind<sdp1,1>
Feb 28 05:53:30 yeti kernel: md: export_rdev(sdp1)
Feb 28 05:53:30 yeti kernel: md: unbind<sdo1,0>
Feb 28 05:53:30 yeti kernel: md: export_rdev(sdo1)
Feb 28 05:53:34 yeti kernel: [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdo1,1>
Feb 28 05:53:34 yeti kernel: [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdp1,2>
Feb 28 05:53:34 yeti kernel: [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdq1,3>
Feb 28 05:53:34 yeti kernel: [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdr1,4>
Feb 28 05:53:34 yeti kernel: [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sds1,5>
Feb 28 05:53:34 yeti kernel: [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdt1,6>
Feb 28 05:53:34 yeti kernel: [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdn1,7>
Feb 28 05:53:34 yeti kernel: md: sdn1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdt1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdn1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdt1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sds1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdr1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdq1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdp1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdo1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md4: max total readahead window set to 3072k
Feb 28 05:53:34 yeti kernel: md4: 6 data-disks, max readahead per data-disk: 512k
Feb 28 05:53:34 yeti kernel: raid5: device sdn1 operational as raid disk 0
Feb 28 05:53:34 yeti kernel: raid5: device sdt1 operational as raid disk 6
Feb 28 05:53:34 yeti kernel: raid5: device sds1 operational as raid disk 5
Feb 28 05:53:34 yeti kernel: raid5: device sdr1 operational as raid disk 4
Feb 28 05:53:34 yeti kernel: raid5: device sdq1 operational as raid disk 3
Feb 28 05:53:34 yeti kernel: raid5: device sdp1 operational as raid disk 2
Feb 28 05:53:34 yeti kernel: raid5: device sdo1 operational as raid disk 1
Feb 28 05:53:34 yeti kernel: raid5: allocated 7483kB for md4
Feb 28 05:53:34 yeti kernel: md: updating md4 RAID superblock on device
Feb 28 05:53:34 yeti kernel: md: sdn1 [events: 000000bd]<6>(write) sdn1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdt1 [events: 000000bd]<6>(write) sdt1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sds1 [events: 000000bd]<6>(write) sds1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdr1 [events: 000000bd]<6>(write) sdr1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdq1 [events: 000000bd]<6>(write) sdq1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdp1 [events: 000000bd]<6>(write) sdp1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdo1 [events: 000000bd]<6>(write) sdo1's sb offset: 36081856
I noticed that the major node is different starting at sdq.
brw-rw---- 1 root disk 8, 208 Feb 22 2002 /dev/sdn
brw-rw---- 1 root disk 8, 224 Feb 22 2002 /dev/sdo
brw-rw---- 1 root disk 8, 240 Feb 22 2002 /dev/sdp
brw-rw---- 1 root disk 65, 0 Feb 22 2002 /dev/sdq
brw-rw---- 1 root disk 65, 16 Feb 22 2002 /dev/sdr
brw-rw---- 1 root disk 65, 32 Feb 22 2002 /dev/sds
brw-rw---- 1 root disk 65, 48 Feb 22 2002 /dev/sdt
So it looks to me like the md driver doesn't like the raid system crossing the major device boundary for some odd reason.
Although I'm not sure why I can stop and restart after the system is totally up but not start it in rc.local using the same
commands:
We're not hitting the 27 maximum disks so that isn't it.
Looking thru the code in md.c I don't quite see where the problem is (or may be).
Michael D. Black mblack@csi-inc.com
http://www.csi-inc.com/
http://www.csi-inc.com/~mike
321-676-2923, x203
Melbourne FL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/