[TriLUG] debugging DNS with proprietary VPN inside OpenVPN

Tom Roche Tom_Roche at pobox.com
Tue Nov 18 13:00:08 EST 2014


# summary

Networking beginner (me) needs to SSH from his Debian laptop, through a firewall, into login nodes on a compute cluster. The firewall requires a proprietary F5 SSL VPN, from which the laptop formerly connected directly to the cluster. However, cluster admins now require a public static IP# (et al), which I seek to provide via a linode jumpbox running OpenVPN, through which the F5VPN should tunnel.

With the OpenVPN client and server running, I can DNS the public internet, and login to the F5VPN's remote-access site (which starts the F5VPN), with the world seeing the correct public/static IP#. With the OpenVPN client running, the F5VPN client seems to start correctly, and allows login. But DNS fails immediately after logging into the F5VPN, preventing resolving of the private/intranet cluster login node hostnames, which prevents SSH connection.

How to fix or further debug?

Apologies if the following is too long: I'm trying to include relevant detail. Feel free to skip to section='specific problem', and backtrack as needed.

# table of contents

# details
## background
## problem
### general problem
### specific problem
## diagnosis
### DNS problem
### routing problem
### support problem
## questions

# details

## background

I'm a student using data and other computing resources provided by federal agency's compute clusters in RTP. For reasons unknown to me, a student accessing agency resources must be a federal contractor, aka a "business partner." The clusters are firewalled, but for 2 years I was able to SSH from home through the firewall to the clusters, using

* different 64-bit laptops owned/admin'ed by me, running Debian (various flavors)
* my ISP's connection hardware:
    * originally: a cablemodem provided by ISP=TWC
    * currently: a 4G modem provided by ISP=FreedomPop
* the agency-designated F5 SSL VPN (only available on Linux via an ancient 32-bit Firefox plugin[1]), aka the "F5NAP" (== "network access plugin")
* an agency-provided RSA SecurID

Note that F5 (the vendor)

- does not apparently provide a headless/non-GUI means to run this VPN--at least, not with the F5 versions which my agency apparently runs[2]
- is proprietary
- seems quite Windows-oriented (as is my agency)
- seems not (in my experience) much responsive to third-party users (such as myself)

On my laptop, the process to connect via the F5VPN was

1. Start a specially-configured, F5NAP'ed 32-bit Firefox, autobrowsing to the agency's remote-access URI.
2. Login using a password partly generated by the SecurID.
3. Wait until the F5NAP put up a second, smaller Firefox window with login status=success. The small window also provided some UI for VPN status and to start/stop it.
4. For each cluster I wanted to access: from a terminal/shell, start an SSH session to its login host, using its FQDN hostname (which is not DNS-able off-LAN without the VPN).

For completeness, and since I reference it below: the process to disconnect from the F5VPN was

1. End the SSH session (`logout` or C-d).
2. Stop the F5VPN via UI in the F5NAP's smaller window.
3. Logout from the F5VPN: hit link=Logout in the F5NAP's large/main window (which auto-whacked the small window).
4. Kill the 32-bit firefox. I noticed that it seemed to lose the ability to reconnect after a short while. Since the 32-bit firefox was relatively quick to restart from a script--it only ran one tab, for the remote-access URI--I got into the habit of killing it after each remote session.

## problem

### general problem

3 months ago, other agency contractors (there are lots!) implemented new security requirements for "remote business partners":

1. IP#s: one may connect to the clusters only from a registered IP address/number. While I am not sure whether the agency intends for the connecting IP#s to be static, the registration process seems to be sufficiently slow/painful that one will not want one's cluster-connecting IP# to change.
2. host security: several requirements, all specified by ... XP screenshots[3].
3. all the old requirements: notably, to use the F5NAP.

While waiting for approval for my Linux host security proposal (i.e., how to provide the desired protections without running XP), I setup a linode jumpbox as recommended by Kevin Otte et al[4]. The intents of this design (IIUC) were

1. The linode provides a static public IP# and an OpenVPN server (et al).
2. My laptop provides an OpenVPN client and the F5NAPed Firefox GUI.
3. Before each cluster session, start an OpenVPN tunnel, run the F5NAP through the OpenVPN tunnel, then run SSH through the F5 VPN.

I got the linode setup, with the networking portion of the setup documented here[5], as best I could without access to the endpoint clusters, access being held up by the security contractors' refusal to specify what constituted a secure Linux host. Eventually (thanks to some of my management) I contacted someone "up the food chain" who forced the issue with the security contractors, and I have been approved to re-access the clusters, and can once again (3 months after getting disconnected) login to the agency's remote-access URI. 

Unfortunately, re-approval to access the clusters does not imply ability to access the clusters :-(

### specific problem

My current OpenVPN configuration (documented here[6]) allows me to

1. (on my linode) Start the OpenVPN server, which runs to `Initialization Sequence Completed`.

(all following on my laptop)

2. Start the OpenVPN client, which runs to `Initialization Sequence Completed`.
3. Start the F5NAP'ed 32-bit Firefox.
4. Browse to http://www.whatismyip.com/ and see the linode's IP#. (This works with either the F5NAP'ed Firefox or the laptop's normal/64-bit Firefox, since the OpenVPN client is currently configured to route all traffic through the OpenVPN server.)
5. Browse to the agency's remote-access URI and login using the SecurID'ed password. (Login is blocked--i.e., the login web UI is unavailable--when not running the OpenVPN tunnel, which is further indication that the remote-access website is seeing the "correct" public/static IP#.)
6. At the resulting webpage, hit the link to start the F5VPN, which puts up the magic second/small window, with contents=success.

So at this point I appear to have both successfully started both the OpenVPN and the F5VPN. Note that "F5VPN up" != "F5NAP up": for the F5VPN to run, one must both

* have a web browser up and running the F5NAP
* use that browser to login to the F5VPN remote-access URI

End of goodness :-(

7. Try to start an SSH session using any of my previously-working ssh-config's. Fails with DNS error like

    > ssh: Could not resolve hostname <FQDN of login node/>: Name or service not known

8. In same laptop terminal, under the same conditions (i.e., both the OpenVPN and F5VPN up), my DNS is broken: e.g., `nslookup www.google.com` fails with

    > ;; connection timed out; no servers could be reached

9. Disconnect from the F5VPN (but not the OpenVPN, which remains up): hit link=Logout in the F5NAP'ed Firefox, then kill it.
10. Test networking: `nslookup` resumes working correctly, using DNS primary server=8.8.8.8 (configured in my OpenVPN setup[7]).

## diagnosis

### DNS problem

Before the new security requirements, I did not have the DNS problem: after starting the F5VPN, I was able not only to {`nslookup`, SSH to} cluster login nodes, but also use most services normally from my laptop. SMTP was blocked (very annoyingly), but (e.g.) I was able to browse to usual websites, stream last.fm, etc.

While configuring OpenVPN, I had a similar DNS problem (i.e., complete `nslookup` failure after starting OpenVPN on client) until I put the following lines (et al) in my linode:/etc/openvpn/server.conf (see complete listing here[7]):

    # choose DNS server(s) depending on your location
    # `nslookup 8.8.8.8` -> 8.8.8.8.in-addr.arpa  name = google-public-dns-a.google.com
    # TODO: determine if we need these, or can/should rely on 10.8.0.1
    push "dhcp-option DNS 8.8.8.8"
    push "dhcp-option DNS 8.8.4.4"
    # next line required by https://www.linode.com/docs/networking/vpn/secure-communications-with-openvpn-on-ubuntu-12-04-precise-and-debian-7#tunneling-all-connections-through-the-vpn to fix ultra-VPN routing
    # (i.e., not provided by https://wiki.debian.org/openvpn%20for%20server%20and%20client )
    push "dhcp-option DNS 10.8.0.1"

I also note (with caveat) the following sequence:

1. Immediately after booting my laptop, its DNS primary server=192.168.15.1 == my FreedomPop modem. (This is from memory: I should reboot and recheck.)
2. After I successfully start the OpenVPN client on the laptop (which implies having the OpenVPN server running on the linode) for the first time post-laptop-boot, the laptop's primary DNS server=8.8.8.8 (as set in linode:/etc/openvpn/server.conf).
3. After stopping the OpenVPN client on the laptop, the laptop's primary DNS server remains unchanged (i.e., 8.8.8.8) until laptop is rebooted.

This seems like a bug (ISTM stopping the OpenVPN client should restore the previous DNS config), though *very* minor (and probably due to configuration error by me, probably of the way the up and down scripts handle `resolvconf`). I mention it only because that observation, plus the observed F5VPN DNS failure, leads me to believe that

1. The F5VPN was successfully pushing a DNS server reference back to the laptop in "the good old days," which allowed me to resolve hostnames inside the firewall (and connect to them via SSH) as well as outside the firewall.
2. Now (in "the broken new days") the F5VPN is pushing *something* back to the laptop, but that something is completely broken: with the F5VPN up, I can't even `nslookup` the public internet, much less inside the firewall.

Am I missing something? If not, how to fix or further debug?

### routing problem

Kevin Otte recommended[4] "[configuring] the laptop to route traffic to vpn.federal.gov over the Linode VPN," with artwork like

+--------+ 10.0.100.2  10.0.100.1 +--------+ 192.0.1.1  198.51.100.1 +-----------------+
| laptop | ---------------------- | linode | ----------------------- | vpn.federal.gov |
+--------+                   tun0 +--------- eth0                    +-----------------+

where (IIUC) `vpn.federal.gov`==the F5VPN "endpoint" to which my laptop connects via the F5VPN. The problem is, I don't know to what the F5VPN is connecting, which makes it difficult for me to `ip route add` ... or am I missing something?

Nor do I know how to make the F5NAP tell me what it's connecting to--it seems like a blackbox, and probably a deliberately/proprietarily blackbox.

That being said, I don't know that there *is* a routing problem currently: at least, I don't see clear evidence of one, at least, not in the way that I have clear evidence of the DNS problem. I can hypothesize that the DNS problem is due to the F5VPN endpoint being unable to route something back to my laptop, but I don't know enough about how the F5VPN works to un/substantiate that hypothesis. (My lack of sufficient networking chops is probably also hurting me here.)

### support problem

Worse, my experience with the relevant "support personnel" (at both F5 and the agency) makes me expect that

- the people who would be willing to help me won't know how the F5VPN works (e.g., to what the F5VPN is connecting)
- the people who should know to what the F5VPN is connecting (at either F5 or the agency) won't be willing to help me

because

1. Aside from the folks who actually run the clusters (which are of course all Linux), the agency is definitely a "Windows shop." Particularly the "frontline" IT folks: non-security IT people are mostly quite friendly, but don't seem particularly knowledgeable (MCSAs at best).
2. The agency's security contractors' position seems to be, give no information unless forced to do so. They also seem completely Windows-centric. (This[3] is not an aberration in my experience--I've never seen agency doc for even Mac users, of which there are quite a few home users.)
3. F5 seems to have very limited doc for anything but Windows, and particularly poor Linux support.
4. F5 seems to be setup to support the folks who are running their stuff (e.g., admins in the bowels of the agency) and not the folks who are using their stuff (e.g., me).

## questions

First off, what am I missing? Is there anything obviously wrong with my analysis? Obviously I don't know nearly enough about networking, so I'm hoping I'm missing something simple ...

If I seem to be understanding the situation mostly correctly: how to fix this, or to further debug? Particularly, given the support problem (above),

1. How much relevant information can I determine myself? 
2. For what information must I ask? given that ISTM "minimizing my ask" probably maximizes my likelihood of getting a response from inside the agency.

My guess is, I need to fix the F5VPN DNS problem first. Is there a simple way to do that, without knowing what the F5VPN (aka the F5NAP'ed Firefox running on the laptop) is doing DNS-wise? Alternatively, can I setup/instrument my networking so as to discover what the F5VPN is doing? E.g., is there a way to see "what is going on" in the F5NAP'ed Firefox? Some kind of console?

Similarly, to fully setup "end-to-end" routing rules for this usecase (i.e., from my laptop to the F5VPN endpoint) ISTM I would need to know the "real name" of the F5VPN endpoint (represented by `vpn.federal.gov` in the diagram above)--correct? Do I need to fix the routing to fix the DNS?

your assistance is appreciated, Tom Roche <Tom_Roche at pobox.com>

[1]: See, e.g., https://support.mozilla.org/en-US/questions/931873 , https://support.f5.com/kb/en-us/products/big-ip_apm/manuals/product/apm_compatibility_matrix_10_2_1.html
[2]: See, e.g., https://devcentral.f5.com/questions/how-to-run-f5-network-access-on-a-64-bit-linux , https://support.mozilla.org/en-US/questions/931873
[3]: http://drive.google.com/file/d/0BzDAFHgIxRzKRU56MEVFTHA5RUU/view?usp=sharing
[4]: http://www.trilug.org/pipermail/trilug/Week-of-Mon-20140825/072109.html
[5]: https://bitbucket.org/tlroche/linode_jumpbox_config/wiki/Home . FWIW, I've also got notes on setup for AIDE and ClamAV, and will port to that wiki Real Soon Now.
[6]: https://bitbucket.org/tlroche/linode_jumpbox_config/wiki/OpenVPN_install , with tunnel test @ https://bitbucket.org/tlroche/linode_jumpbox_config/wiki/OpenVPN_install#rst-header-client-test . Both are linked to/from https://bitbucket.org/tlroche/linode_jumpbox_config/wiki/Home .
[7]: https://bitbucket.org/tlroche/linode_jumpbox_config/wiki/OpenVPN_install#rst-header-user-session-2


More information about the TriLUG mailing list