So, someone mentioned that Linux is all about "technical" issues, not what
people want but purely on a 'this is good and this is bad technology' basis.
OK, a possible "technical" problem is, I want to have 2 linux boxes(or more)
connected to the same scsi disks. (twin tailed or what have you). I have
running 2 instances of the same software both accessing those disks. For
obvious reasons, load balancing, spread load of jobs, and failover, if a
node
fails, at least the other instance still has access to the disk and can
RECOVER the data. Because my logfiles are also 'shared' so I can access the
other node's logfiles and recover from that.
We cannot use a filesystem, since we do not have a real distributed
filesystem
yet (note we need Performance here. so don't come with coda and what have
you...)
How do we solve this on every OS other than linux ? We use raw devices,
since
when we do a write, we know it's on the disk (there are no issues with scsi
controller cache...) All committed writes are captured and whenever
something
needs to be recovered we have all the data needed in the logfiles(also on
raw
devices so it contains all data).
The way the disks are shared depends on the hardware architectures, we do
not
really care, as long as all nodes can access the disks, even when a node
fails, the disk local to that node should still remain visible to the
others,
like with RVSDs, another server takes over control.
This is a widely used setup, very important for availability and failover.
And
the same architecture lends itself well for loadbalancing and stuff as well.
But... obviously it is of no importance to some folks... or at least. its
easy
to say : No we don't want that (coz we don't need it ourselves?) but give an
alternative solution ? short/longterm ? I haven't heard any yet in the
entire
thread that is going on. Too bad...
There is also the fact that raw io for databases IS faster. Whatever type
filesystem you design, doesn't matter since we know which blocks to write
where. An index entry points to a specific block/file/slot so its easy to
calculate the offset in the 'file' ;) And except for full table scans, the
data is spread allover the place, so read-ahead into buffercache doesn't do
didley squad in that case.
Whether Raw dev make it or not is not the issue for me (altho I think
Stephen
did a cool job;)... but I would like to hear solutions to the above, if raw
dev ain't the way to go technology-wise... if you cannot give a solution,
then
what keeps you from implementing raw dev as a short term solution (2.3) ?
Right now, we do not have the possibility to run parallel server on linux,
not
because we do not 'want' it but because linux does not offer us a solution.
And the other OS's do. Clustering is not just 'beowulf'... there is more
than
that. DLM's and all that stuff is doable.. but the disk access is what we
miss.
all flames > /dev/null.
Cheers,
Wim.
The statements and opinions expressed here are my own and
do not necessarily represent those of Oracle Corporation.
--=_ORCL_26555240_0_0
Content-Type:message/rfc822
Date: 15 Dec 98 18:00:00
From:Harald Milz <hm@seneca.muc.de>
To:linux-kernel@vger.rutgers.edu
Subject:Re: PATCH: Raw device IO for 2.1.131
Reply-to:UNX03.US.ORACLE.COM:h.milz@seneca.muc.de
Return-Path:<owner-linux-kernel-outgoing@vger.rutgers.edu>
Received:from mailsun2.us.oracle.com by mailsun3 with SMTP (SMI-8.6/37.9) id RAA01066; Tue, 15 Dec 1998 17:59:39 -0800
Received:from inet16.us.oracle.com by mailsun2.us.oracle.com with ESMTP (SMI-8.6/37.8) id RAA20601; Tue, 15 Dec 1998 17:59:37 -0800
Received:from listserv.funet.fi (listserv.funet.fi [128.214.248.27]) by inet16.us.oracle.com (8.8.5/8.8.5) with ESMTP id RAA14854 for <WCOEKAER@us.oracle.com>; Tue, 15 Dec 1998 17:58:49 -0800 (PST)
Received:from vger.rutgers.edu ([128.6.190.2]:22081 "EHLO vger.rutgers.edu" ident: "NO-IDENT-SERVICE[2]") by listserv.funet.fi with ESMTP id <1365-5666>; Wed, 16 Dec 1998 03:56:30 +0200
Received:by vger.rutgers.edu id <155030-31090>; Tue, 15 Dec 1998 18:41:27 -0500
Received:from slarti.muc.de ([193.174.4.10]:3269 "HELO slarti.muc.de" ident: "NO-IDENT-SERVICE[2]") by vger.rutgers.edu with SMTP id <155489-31090>; Tue, 15 Dec 1998 18:01:42 -0500
Received:(qmail 28609 invoked by uid 66); 15 Dec 1998 23:31:52 -0000
Received:from seneca by slarti with UUCP; 15 Dec 1998 23:31:52 -0000 (GMT)
Path:not-for-mail
Newsgroups:linux.dev.kernel
Organization:Linux.DE
Lines:40
Message-ID:<756i4d$r8r$3@seneca.ak.munich.ibm.com>
References:<Pine.LNX.3.95.981212134618.2288F-100000@penguin.transmeta.com> <75521q$3r1$1@seneca.ak.munich.ibm.com> <13942.46943.37424.521841@watcher.ptim.com>
NNTP-Posting-Host:seneca.muc.de
X-Pgp-Public-Key:http://www.muc.de/~hm/pgp.asc or finger hm [at] muc.de
X-Nospam:I do not want to receive unsolicited advertising!
X-Reply-To:Replies to the From address will go to /dev/null. Use Reply-To.
X-no-archive:yes
User-Agent:tin/pre-1.4-980618 (UNIX) (Linux/2.0.35 (i586))
X-Orcpt:rfc822;linux-kernel@vger.rutgers.edu
Sender:owner-linux-kernel@vger.rutgers.edu
Precedence: bulk
X-Loop:majordomo@vger.rutgers.edu
MIME-Version: 1.0
Content-Transfer-Encoding:7bit
Content-Type:text/plain; charset="us-ascii"
Matthew Brown <mbrown@smartpages.com> wrote:
> 1) Linus and most of us are not being paid for this, and we are not
> doing this to sell. There *are* only technical issues.
C'mon. Linus and you are not being paid for this but there are people out
there who make their living on selling Linux. Linux just for fun, that's
been quite a while ago.
> 2) I'm sorry, but part of being a professional *is* telling the
> customer what's good for them. It's called professional
> integrity.
As a service professional, I learnt to a) listen to the customer's wishes
and b) not arguing against them of, and c) to talk to development if my
product doesn't fulfil them. This is proven to work.
> 3) Isn't 'telling the customer what's good for them' exactly what
> Oracle et al. would be doing here?
No. I said they could argue "well we do support Linux but it's suboptimal
compared to HP, Sun because we can't use raw devices".
> 4) Linus has made it clear a number of times on this issue that he
> doesn't think raw disk device access is a good idea, and he's not
> prepared to add such a feature to the kernel just because 'everyone
> else does it'.
Yes I can read. My opinion is different, though, and not only mine.
> Well, you could still use a partition like /dev/sda1 for a database --
> the only issue is that this goes through the buffer cache instead of
> reading or writing directly to the disk.
Which is exactly what Oracle et al. are not going to do - for obvious
reasons. This is not an "issue" but a showstopper. But you were kidding
anyway, weren't you?
-- Error in operator: add beer- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
--=_ORCL_26555240_0_0--
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/