[TriLUG] BSD/Linux firewall with multiple ISP and failover?

Ryan Leathers ryan.leathers at globalknowledge.com
Mon Jan 30 10:28:56 EST 2006


Here are some ideas:

To get top shelf multi-provider load balancing and fail-over, you could
look at several high dollar solutions.  BigIP is sorta the flavor o' the
month in this space currently.  This class of device does the magic of
routing traffic to and from a single address space via multiple upstream
networks without the complexities of BGP.

Of course, you could surely run BGP on Linux using zebra, or on BSD with
gated, but just running the protocol isn't the trick.  The tough part is
in setting it up for multi-homed non-transit.  In BGP parlance you have
a stub, and you don't want any traffic other than the stuff to you and
from you to go over your network.  Can you imagine the awful situation
if BigISP X started sending loads of traffic to BigISP Y via your tiny
little pipe?  Just that sort of thing happened in the Chicago area back
in the mid-90's when some goof misconfigured what should have been a
stub, and it caused serious consequences for hours all over the
Internet.  In short, you can't just pluck an ASN out of thin air, and no
ISP is likely to even consider this sort of arrangement with you for
anything on such a small scale.  You wouldn't want to be responsible for
it even if they did.   

Now, the neatest way to do this, in my opinion, is to use Ultra Monkey.
I've tried to convince others (who shall remain nameless in California)
that there should be much love for the Monkey, but alas...  In a
nutshell, you could use a couple of throw away wimpy PC's running your
favorite Linux distro, dual home them, throw a serial cable between
them, and be close to done.  Each of these hosts would connect to 1 of
your 2 ISP's.  Each would connect to your private network.  You could
force traffic of a certain type, or destination, or source, to prefer
one of the two paths, but always have full redundancy.  Pretty slick
huh?  The serial cable is used for heartbeat (STONITH type operation).

If you are wisely running debian </wry grin> its as easy as this to get
started:
 http://www.ultramonkey.org/3/installation-debian.sarge.html

Most of the examples you will see will discuss a single Internet
connection and a single director, but there is absolutely no reason you
couldn't have two directors with two Internet connections - in fact, its
really silly not to.  You'll want to front end your directors with a hub
(at least) or a good vlan-capable switch (better) so you can get the
traffic to the right spot on failure.

I suppose its worthwhile to point out here that there are three
'classes' of fail-over we could talk about.  DNS round-robin is an
example of the top level.  Its the slowest, but nicely decoupled from
the details of the underlying network.  BGP operations, in part, fall
under this class.  The middle level is IP address takeover.  This is
faster, and most everything you'll look at involves this.  The bottom
level, which is fastest, is MAC address takeover (or whatever the layer
2 address in use is).  ARP spoofing / gratuitous ARP operations are an
example of how we get the speed of layer 2 to control the behavior of
layer 3.     








On Sun, 2006-01-29 at 23:46 -0500, Jon Carnes wrote:
> On Sat, 2006-01-28 at 17:04, Aaron S. Joyner wrote:
> 
> > You want something that allows you to have multiple paths to the 
> > internet, and should one of those paths die, you want to switch to using 
> > the alternate path.  This is actually a very easy thing to do, and only 
> > requires a second ethernet interface in the firewall in question (note 
> > the word interface, not network card, as technically this could be done 
> > with a managed switch, vlans, and some craziness if you want to keep 
> > your existing hardware platform).  In short bullet points, assuming you 
> > want to use Linux, it'll go something like this:
> > 
> > 1 - Get one ISP working, dhcp, whatever is required.  Shutdown that 
> > interface.
> > 2 - Setup the second interface, get that ISP's connection working, shut 
> > that interface down.
> > 3 - Pick a few reliable hosts on the internet, I'd choose 6, to use as a 
> > measure of connectivity.
> > 4 - Configure DHCP on the backup internet connection not to write the 
> > default gateway or resolv.conf.  It helps if this connection has a 
> > static ip / default gateway.
> > 5 - Bring up both interfaces, and things should work as expected.  Note, 
> > you won't properly respond to traffic on the secondary interface, but 
> > having that interface turned up shouldn't interfere with the primary 
> > interface.
> > 6 - Setup iproute policy routing such that traffic leaving your 
> > secondary interface has the gateway set for the secondary default 
> > gateway, and if your primary is also static you may be able to do the 
> > same for the primary ISP, or at worst you can leave it in the default 
> > table.  This is a common technique for multi-homed servers, see here: 
> > http://www.linuxjournal.com/article/7291 for more information.  This was 
> > the first google result for the query ["ip rule" multihomed], fell free 
> > to look for other sources of how to setup multihomed servers to get a 
> > better feel for using the 'ip rule' and 'ip table' set of commands.  
> > Reading to have a thorough understanding of these topics is required for 
> > you to complete steps 7 and 8.
> > 7 - Setup custom "ip rule"s to each of your test hosts, to ensure that 
> > traffic to that test host goes over the correct interface.
> > 8 - Write a short script which attempts to connect to each of your 
> > primary isp's test hosts to validate that connection is valid.  If those 
> > tests fail, try the secondary isp's test hosts, if those succeed, change 
> > the default 'ip rule' to point t othe other table (see docs referenced 
> > in step 6 for more detail).
> > 
> > Come back and post again if you can't get it working correctly.  :)
> > 
> > Good luck Greg,
> > Aaron S. Joyner
> 
> Hmmm, interesting but a bit complex. I prefer to simply have the
> secondary take over the IP address of the primary - when the primary
> goes down.
> 
> If the internal primary interface has address 192.168.1.1, then the
> fail-over firewall runs this line:
>   ifconfig eth0:0 inet 192.168.1.1 netmask 255.255.255.0
> 
> ===
> You could initiate the fail-over with a script that uses a simple ping
> to see if the Primary server is up...
> 
> #! /bin/bash
> #
> # Server_Check: Run a minute by minute check of the
> #   Master server with internal address
> #   of 192.168.1.1 (and secondary internal address
> #   of 192.168.10.1), trigger Failover if
> #   Master goes off-line, trigger Backdown if
> #   Master goes back on-line. 
> #   Run via cron - every minute
> #     * * * * *   /usr/local/sbin/Server_check
> ######
> #
> # Check for existance of Trigger file
> #   0 = Normal (Master is fine)
> #   1 = Failover (This server has taken over)
> #
> if [ ! -f /root/config/trigger ]; then
>     echo 0 > /root/config/trigger
> fi
> 
> TRIGGER=`cat /root/config/trigger`
> 
> # Do three pings in a minute.  If all three pings fail
> #  j="xxx" and we fail the server over (if it's not
> #  already in that state).
> #  If one of the pings works then we assume that the
> #  Master is up and we return to normal (if we are not
> #  already in that state).
> #
> j=""; 
> for i in 1 2 3 ;
>    do 
>    ping -qc1 192.168.10.1 >/dev/null || j="x"$j; 
>    sleep 10; 
> done; 
> 
> if [ "xxx" = "$j" ]; then 
>   if [ ! "1" = "$TRIGGER" ]; then
>     /sbin/ifconfig eth0:0 inet 192.168.1.1 netmask 255.255.255.0
>     echo 1 > /root/config/trigger
>     echo "Primary Firewall has failed - Secondary taking over" |mail -s "ALERT: Primary Firewall is down" root
>   fi
> else
>   if [ "1" = "$TRIGGER" ]; then
>    /sbin/ifconfig eth0:0 inet 192.168.1.11 netmask 255.255.255.0
>    echo 0 > /root/config/trigger
>   fi
> fi
> 
> ===
> I like this because the Fail-over server does all the checking. It uses a secondary network (192.168.10.0) that is shared with the Primary Firewall. All testing is done across the secondary network. This lets you manipulate the primary network (192.168.1.0) and move the gateway for that network anytime you want, while still letting you test to see if the Primary Firewall comes back up.
> 
> It's elegant and it works great.
> Good Luck - Jon Carnes
> 
> 




More information about the TriLUG mailing list