Re: catch-22 in current partitioning layer

Luc Van Oostenryck (luc.vanoostenryck@easynet.be)
Wed, 30 Oct 2002 20:45:43 +0100


Peter T. Breuer wrote:
> Kernel 2.5.44
>
> 1) a module which needs to create a device needs to call add_disk
> in it's init, or it seems the kernel won't allow opens of the
> device node.
>
> * But it can't call add_disk normally without invoking
> * check_partitions (escape, do so with capacity=0 or minors=1)
> * which will stall as it needs to read the device or its
> * backer on disk, and we're still in init ...
>
> This hits NBD. Pavel's kernel NBD escaped (luckily?-) because his
> device is not partitionable, and so he called add_disk with
> minors=1 even though capacity was set to maximum. But he narrowly
> missed hanging in init.
>
> It's not clear how we're supposed to /subsequently/ go out and find
> some partitions over NBD, or how to avoid the races in doing so,
> since we are already sort of open for business. HEEEEELP. My own
> ENBD is being hit by this because it is partitionable.
>
> Can someone document the new genhd and partitions interfaces?
> I'm confused as anything, and fed up hunting for disappeared things
> that now seem to be available via strings of pointer references off
> of new structs (how do I get to rescan_partitions, for example!).
>
> 2) Partitions, ah yes, partitions.
>
> Well, seems the kernel is similarly not permitting opens of minors
> until we've run the partition check in add_disk. That's good
> encapsulation if you are planning on doing i/o to them, but I
> was planning on doing ioctls! And I don't think it's fair to
> stop me doing ioctls to the minors. Well, we can argue that, but it
> will break my userspace tools, and I'd rather not introduce
> incompatibilities between kernel versions.
>
> In any case, I don't think it's fair to equate minors with
> partitions. Partitions are ONE use of minors. I use minors as
> indices, to indicate which channel to a device should be used.
> The device has its partition structure, and any partition can
> be served (I'm speaking of requests) via any channel, i.e.,
> any minor.
>
> * My catch 22 is that I can't start using the channels/minors to
> * talk to the remote device (ENBD) until I've opened the device
> * and done the partition search via add_disk, which won't work
> * until the channels are open ...
>
>
> Help. I'm still fairly sane, but this is hurting.
>
> Ideally, what I would like, is to be able to open the minors for
> ioctls before allowing them to be open for r/w as partitions. Then
> I don't need to do any changes. I open NOBLOCK for ioctls, if
> that's any help.
>
> Now, what's with this register_region business? Can I maybe open 256
> devices on the same major, with 1 minor each, just to enable
> communications to the remote, then read the remote partition table, then
> unregister them all, then register them all again this time with 16
> minors per device, as I originally wanted? No! The partition check
> will fail, because there are too many partitions (>0) !
>
> Help.
>
> OK, I know what the userspace solution is, but I don't want to do it:
> Make all ioctls go the the whole disk minor. And keep them there. And
> even then, how am I supposed to rescan the device for partitions and
> avoid the races that must be in progress for access to those partitions
> at startup?
>
>
>
>
> Peter
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

I suppose this is related to the fact that with 2.5.44 (also in -ac),
trying to mount an unformated floppy give the following oops:

Unable to handle kernel NULL pointer dereference at virtual address 00000290
printing eip:
c01483bc
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0060:[<c01483bc>] Not tainted
EFLAGS: 00010202
EIP is at do_open+0x2bc/0x470
eax: c039ba08 ebx: f7f78c00 ecx: 000002a0 edx: f78c4a80
esi: f7881ec0 edi: f7881e60 ebp: c03114cc esp: f16f9d54
ds: 0068 es: 0068 ss: 0068
Process mount (pid: 159, threadinfo=f16f8000 task=f7921980)
Stack: 00000000 f7881ec0 f16f9e14 00000003 f7881ee0 00026c6c f7f78c00 00000000
00000000 021c9df4 00000000 0000001c c01485de f7881ec0 f78c4a80 f16f9e14
00000000 00000000 00000003 f7881ec0 00000000 00000000 f78c4a80 00000000
Call Trace:
[<c01485de>] blkdev_get+0x6e/0x80
[<c0146dba>] get_sb_bdev+0xaa/0x220
[<c017ee2f>] vfat_get_sb+0x1f/0x30
[<c017ed50>] vfat_fill_super+0x0/0xc0
[<c01470d0>] do_kern_mount+0x50/0xc0
[<c015b1b5>] do_add_mount+0x65/0x150
[<c015b4aa>] do_mount+0x16a/0x190
[<c015b2ed>] copy_mount_options+0x4d/0xa0
[<c015b9c5>] sys_mount+0xb5/0x130
[<c01072ef>] syscall_call+0x7/0xb

Code: 8b 41 f0 0b 41 f4 75 23 8b 46 44 ff 48 54 8b 46 44 8d 48 20

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/