[WBEL-devel] Ideas for improving availability
John Morris
jmorris@beau.org
Fri, 3 Dec 2004 17:55:06 -0600 (CST)
On Thu, 2 Dec 2004, Jon Lewis wrote:
> > All but one update is now on the primary site awaiting the mirrors to
> > catch them. The missing one is httpd and considering the problems it has
>
> What is the "primary site"?...or is that a closely guarded secret to keep
> it (the library?) from being overrun by mobs of WBEL users who should be
> using the mirrors?
Well since I suspect anyone who is reading -devel could figure it out,
http://www.whiteboxlinux.org/pub will get you to it via http. But
remember that it is at the end of a single T-1 so don't expect a lot of
throughput.
> I'd really like to setup a public mirror of WBEL, but I'd rather not
> mirror from a 3rd party mirror (like NCSU). Is NCSU the only mirror that
> talks to the primary site?...making them the tier 1 mirror that all other
> mirrors have to mirror if they want the most authoratative source they can
> get?
That is the current situation. Clearly, after catching up email and
seeing the chaos that erupted while I was away, something more redundant
is needed. Shortterm it is clear a second mirror needs to pull directly
from here and the rest of the mirrors need to be divided between those, so
a failure on one will still leave half of the mirror sites with valid
data.
Idea #1 would be to have a reliable european site begin to update via
rsync from here and have the other european mirrors pull from them. That
means less traffic on longhaul links when things are working but will
throw a LOT of transatlantic traffic when something goes wrong.
Idea #2 of course is to randomize who pulls from who with the obvious
advantages and disadvantages.
Is there a consensus as to which way would be best?
Longer term I see a need for some sort of cron job that checks the mirrors
and rewrites the up2date mirror lists in
http://whiteboxlinux.org/up2date-mirrors with only ones that have the most
complete packages. I can see a fair amount of thought going into that
little puppy to handle as many of the pathological cases as possible. And
considering how often Fedora's mirror system still falls down, perhaps it
can be made general purpose enough they could benefit from it?
A few ideas to get things started....
It should pull a file list from each mirror and compare them to the prime.
Some sort of metric needs to be calculated for each mirror. Then taking
into account the average bandwidth of each mirror and a guess of the
current WBEL traffic a list gets built starting with the best site and
adding in less complete ones until enough are included to cover the
expected load or just add all of them if they are complete.
Some obvious problems with that scheme are:
1. There is currently NO feedback from mirrors as to how much traffic
they have served.
2. This means there is no practical way to even guestimate average
aggregate bandwidth needed even if up2date traffic could be seperated from
.iso pulls, etc.
3. Calculating a completeness metric would be tricky. A site with a .hdr
file and no .rpm doesn't have the goods, but what if they are only missing
the .src.rpm? What if they are only missing the very latest updates but
have a boatload of bandwidth? What if files in the base distro are
missing for some reason? Once I get around to populating the extras
directory are they as important as updates?
Of course in a perfect world the files would get regenerated on the fly as
a .cgi and lookup the source IP and prefer mirror sites close to it but
that is just a dream since I doubt our server would handle that load and
those sort of detailed IP maps are probably too expensive.
--
John M. http://www.beau.org/~jmorris This post is 100% M$ Free!
Geekcode 3.1:GCS C+++ UL++++$ P++ L+++ W++ w--- Y++ b++ 5+++ R tv- e* r