Re: 64-bit kdev_t - just for playing

Roman Zippel (zippel@linux-m68k.org)
Mon, 31 Mar 2003 23:32:55 +0200 (CEST)


Hi,

On Mon, 31 Mar 2003, Joel Becker wrote:

> > > Can you envision solutions based on 16 bit kdev_t infrastructure?
> >
> > I know that 16bit is getting small (but with dynamic assignment even
> > that is still enough for most people), but OTOH I don't understand the
> > obsession for 64bit. 32bit is more than enough on a 32bit system.
>
> There are big companies out there that require thousands of
> disks. Many don't want to use hardware raid, they just want JBOD.
> There are installations today with 2k-4k disks covering the gamut from
> massive databases to HPC to research facilities. Today.
> A 32bit dev_t with the 12/20 Linus split provides 64k minors.
> That's 16k disks with our current 15-partitions-per limit. But if
> someone wants 35 partitions (I've seen that somewhere) you're down to
> 8k. And the places using 2-4k disks today will be getting to 8k disks
> soon after 2.6 becomes usable, if not before. They will likely hit 16k
> disks before 2.6 becomes an afterthought.

Umm, I must be missing something, I get here 1024k minors, 64k disks with
15 partitions and 16k disks with 63 partitions.
Anyway, the sooner userspace accepts that dev_t number is just a number
the better. The major/minor split is useless information in userspace and
it becomes less and less important in the kernel.

> > Somehow it sounds that we just have to introduce a huge dev_t and all our
> > problems are magically solved and I doubt that. If people want to encode
>
> 64bit dev_t is not a panacea. However, if we have to do this
> change, we want to do it once. This only solves the space issue. All
> of the other issues, such as naming, are orthogonal to this change.
> Holding one change until everything else has been written is silly.

SCSI already enumerates the devices sequentially, without userspace tools
already now it's very painful to manage thousands of disk, but all the
kernel can do is to give you all the information about the disks it has
and the number you can use to access the disk. How you call that thing is
a userspace issue not a kernel issue.
A lot of information is already exported via sysfs, missing information
can be added. Changing the kernel to sequentially assign dev_t numbers
starting 0x10000, when they are registered via add_disk, is a rather
simple change. Then you have can have 16M disks with 256 partitions, that
should be enough for a while?
What mostly is missing now is the userspace code. I haven't seen a
request, that someone has simple daemon and needs now this and that
information from the kernel to map a disk reliably to a name.

> There is no conspiracy. Everyone agrees we need more dev_t
> space. Peter, Andries, and others see it like I do; we only want to do
> this once, and we already can see a day when 20bits of minors isn't
> enough.

If you only want to do it once, then do it right. There are 2 basic
questions, which have to be answered:
1. How do we want to manage dev_t numbers in the future?
2. What compromises can we make for 2.6?
Without answering these questions now, we risk to pay heavily for it
later. The ones who ask now for a larger dev_t the loudest are likely the
first to demand later not change anything for "compability", because they
hardcoded certain assumptions about dev_t into their applications.

bye, Roman

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/