[WBEL-users] WBEL, Clustering, GFS, and redundancy

Fri Apr 22 15:59:20 CDT 2005

On Fri, Apr 22, 2005 at 12:59:50PM -0700, Benjamin Smith wrote:
> Does anybody here have experience with clustering on WBEL/RHES/AS? 

Not particularly in the way your talking about.  However, I have
pondered plenty of downtime reduction techniques for databases.

> 
> I have a web-based application hosted on WBEL/PostgreSQL and am looking for 
> the best way to provide even higher uptimes. (So far, 30 minutes of unplanned 
> downtime in over a year) 

Cool.  We are roughly in that neigborhood for downtime.  Our LDAP
auth server went down, and caused the PAM LDAP modules to lock up a
production server.  We waited too long hoping it would
recover/timeout eventually.  No such luck.  Caused about 1 hr.  But
that's pretty much it for the last two years.

> 
> I'm having some difficulty fully understanding what GFS is actually all about. 
> Is my understanding of GFS such that I could have two machines, each with 
> their own HDD, and have it set up with GFS as a sort of cross-machine RAID1? 

I believe this to be incorrect.  Last I read up on GFS, you really
needed access to all of the block devices to be shared.  They used
SCSI-3 locks to arbitrate access and avoid race conditions.  I
don't believe GFS any type of network transport (I could be all
wrong on this one, it's been 18-24 months since I last looked into
GFS).  I believe it all works over block devices.

I thought you needed a Multi-Path SAN, firewire, or a SCSI equipment
where you connected the physical drives themselves to multiple
machines, to have GFS work.  I didn't think it was "NFS-like" in
that it provided network transport on the backend.

If what you really want is "cross machine RAID1", that sounds a lot
more like using "nbd" (Network Block Device) or "enbd" (Enhanced
NBD).  Or iSCSI, you can use iSCSI with the right stuff to make an
IDE drive look like an iSCSI drive available over a network.

> 
> If I could save data like this, and then replicate the PostgreSQL database 
> with Slony, I *think* I'd have a good solution for redundant servers at the 
> colo with minimal downtime in the event of a failure of either system... Am I 
> right? 

Don't know much about Slony.  Depends on if you are looking for hot
failover.  In a Oracle environment we have, we intentionally have
the hot standby on a separate machine.  You roll archive logs over a
network protocol (that is my rough understanding on what Slony does,
in Oracle it's just a remote archive log destination), the apply
archive logs to the standby.  Essenetially you permanently have a
spare machine moments from ready to recover.  All you really lose is
the transactions since the last archive logs.  So you roll archive
logs as often as you can stand (they are a performance hit in
Oracle, so you have to balance the performance versus the potential
loss).

I would assume that you would use GFS to provide more of a Oracle
RAC like setup.  Oracle has a specific filesystem just for this
purpose.  I forget the details about it, but you can download it,
and run firewire drives in a clustered filesystem with it.  Never
done it, but heard some IT guys I know talking about it.

> 
> Ideas/thoughts/pointers? Has anybody here set up stuff like that? 

It really depends on what you want to setup, and what your
requirements are.  With a Hot-Standby, you still lose all of your
existing transactions, and you have to implement failover (for me,
that's a manual process, and involves having to change an IP).

With a GFS type setup, in theory if one machine dies you should be
able to just startup the database instance on another machine.  As
long as it wasn't a disk failure, you are in business.  The new
machine will just startup (normally the secondary would be powered
up, you'd just start the database), and do whatever the primary
machine would have done in recovery.  However, you don't have any
downtime trying to investigate what went wrong with the first
machine.  You don't have to swap parts.  We nearly bought a SAN just
so we could accomplish this.  In the end, the downtime costs didn't
justify the costs of the SAN.  So we designed the hot standby
described earlier.

If you are worried about disk failures, that's just a matter of
throwing more disks at it.  It seems to me that GFS is providing a
solution so that if the network, CPU, memory go bad, you can
essentially "hot-swap" machines (hot swap is the wrong term, but
fail over very quickly), while keeping the data on the disks the
same.

> 
> Currently, we have two servers, a "live" server, and a hot backup which 
> synchronizes its filesystems periodically to the live server with rsync. This 
> gives a good "worst case scenario" recovery plan, but doesn't give much 
> support for lighter-grade failures. 

rsync is a really bad idea for a running database, unless Postgres
has some really cool recovery mechanisms, you'll get a very
inconsistant database (hmm, rsync'ing an LVM snapshot could work).
What you really want to do is setup a secondary machine that is in
permanent disaster recovery mode.  You keep appling the archive logs
as they get generated.  It's my understanding that there are several
projects that provide such functionality to Postgres, Slony is one
of them.

Then anytime the master goes down, you just finish the disaster
recovery steps on the standby.  Swap out IP's, and it's done.  You
might be able to setup some other neat stuff like, CARP/VRRP, some
other type of network layer transparency, or some other type of
application level failover.

Not stuff I've ever setup in a production environment, but
scenerios I've run thru looking for cost effective redundancy.  In
my experience, unless your downtime is incredibly expensive, a hot
standby scenerio is the best balance of cost and downtime.  Yeah,
and lots and lots of practice.  We practice the hot failover in a
dev/staging environment on a fairly regular basis.

    Kirby