Too Busy For Words - the PaulWay Blog

Tunnelling PVM, or Clustered Computing Without IPSEC

I've experimented with PVM (Parallel Virtual Machine) for a long while. Like MPI, it's an abstraction of the message passing, data translation and task scheduling mechanisms that drive a parallel computation so that the separate processes can be on separate processors or separate machines on a LAN. It grew up in the 'good old days', when your only firewall was the one that protected you from the outside (and sometimes not even that) - each server you add to the PVM group communicates on UDP ports somewhere in the upper half of the port range. The port is chosen randomly when your initial PVM daemon starts up the PVM daemon on the new server; and is then 'shared knowledge' amongst the group that your PVM programs (unwittingly) use when sending work to the remote machine).

In these enlightened days, of course, running a machine without iptables blocking everything that you don't explicitly trust would be considered grounds for sectioning by some. Of course, if you're running a PVM cluster, you would normally put all the machines on your local LAN, put a big firewall at the door (or, even better, remove any connection between the LAN and the internet), remove iptables and anything that might slow the network down, and go for it. Fair enough.

Recently, here at the CLUG we have a loaner IBM Power5 machine, and the irrepressible Steve Walsh has a 60 day free trial of a Sun Sunfire. It's more multiprocessor power than I've ever had ssh access to, but of course not only do I have to do something useful with this power (rather than just running distributed.net) but both machines are on the other side of campus; so they're both naturally running firewalls everything locked down. Neither Steve nor Bob is going to be interested in opening UDP access to the upper half of the spectrum for a system which looks suspiciously like Sun RPC. Nor are they going to be interested in setting up IPSEC to VPN my machine to theirs. PVM, after all, runs as a user process for a good reason - separation of privilege. Let's keep it that way.

Now the applications I've written and the ones I'm trialling exclusively do a scatter/gather operation - a 'master' node starts a bunch of 'slave' nodes, sends them all a bunch of work to do, and collects the results. The slaves only communicate with the master. Other topologies are, of course, possible in PVM, but the tasks I'm working on (mandelbrot sets and sequence comparison, for example) are suited to this star topology. This means that, as far as I'm concerned, I only have to provide a pipe between the master and the slave; this may be suited to ssh's port forwarding capabilities.

PVM, when starting the daemon remotely, ssh's in (using keys to avoid password prompts) and starts the remote daemon. I renamed my ssh to /usr/bin/real-ssh, put a script in as /usr/bin/ssh that echoed the date and the command line to a file, and then called the real ssh. (I also managed to create a recursive bomb when I forgot to change the script to use the real ssh program instead of itself... but that's typical for me.) The command line it uses is:

ssh remote $PVM_ROOT/lib/pvmd -s -d0x0 -nremote 1 c0a81701:8665 4080 12 c0a817fb:0000

($PVM_ROOT/lib/pvmd is actually a script which finds the correct pvmd binary, makes sure that its environment is correct, and runs the daemon. It includes the amusing little note:

export PVM_ARCH
# make a joyful noise.

)

-nremote is the name of the remote machine's daemon; my guess is that -s is the flag to say "you're the remote side over there". I guess that the 1 after -nremote is the number of processors the remote side has. Everything else is unexplained; there doesn't appear to be any documentation on the flags to the pvmd3 binary. So it's time to trawl the source, and try some experiments.

So far I've discovered that the :8665 part is the UDP port of the source machine, in hex. The UDP port of the remote machine must be communicated through the ssh connection, because when I exclude ssh from the ethereal session and add the host, the first thing I see is the local machine talking to the remote on the new high port. I'll have to look at being more invasive with my fake ssh, and trying to log the entire session bidirectionally. Oh goody.

This may all be futile anyway: ssh's tunnelling abilities only act in the way of a proxy. Since the remote PVM daemon is wanting to send UDP packets to the local machine directly, and not via another port on the remote machine, it may be that I can't do what I want with ssh anyway. Even if I added the ability to use a proxy for the remote end (which is stretching the bounds of my C hacking skills), I'd still have to make the local PVM process use a proxy, and possibly a different proxy per connection. And it still doesn't cover the case of true multiprocessing, where messages pass between slaves. Maybe IPSEC is the only way to go here...

Last updated: | path: tech | permanent link to this entry

Too Busy For Words - the PaulWay Blog

Sat 29th Apr, 2006

Tunnelling PVM, or Clustered Computing Without IPSEC