WebAuth: Guide to Fail-Over Management
CESNET
technical report number 14/2006
also available in PDF,
PostScript, and
XML formats.
Petr Grolmus, Zdeněk Šustr
12.12.2006
1 Reliable WebAuth-Based Solutions
WebAuth is - as indicated by its name - an authentication system for web pages and web-based applications. Its primary goal is to provide for single-sign-on (SSO) behavior, i.e. to allow sharing an authenticated user's identity among multiple web-based applications.
This Report introduces one of several possible ways of making WebAuth infrastructure more robust and stable. It expects the reader to possess basic knowledge of the WebAuth system acquired either by experience or by studying available materials, such as [Gro05] or WebAuth home page. The Report does not deal with WebAuth or SSO basics. These are sufficiently described by the article referred to above, or by [GS05].
To put the WebAuth system successfully into operation [Gro05], it is necessary to ensure that the new system is reliable. Overall reliability is - most importantly - determined by the reliability of the WebKDC logon server, which represents the core of the whole system. Obviously, reliability can be increased by adding more logon servers. It is certainly possible to set-up multiple logon servers and to configure various applications to contact various WebKDCs. However, isolated WebKDCs cannot be used to set up a single-sign-on environment, which is our main target. That is why it is necessary to commission multiple WebKDC servers and to configure them in such a manner that they act uniformly as a single authentication server (or a cluster of such servers) regardless of which of the servers has been actually contacted by the user.
Several servers may be configured to act under a common DNS name - such as webkdc.zcu.cz - while the actual network address of the server is provided to the client by the DNS Round-Robin technique.
DNS Round-Robin allows name servers to answer a single request with multiple addresses. One hostname (or service name) may be entered into the DNS server's configuration file repeatedly with various addresses and it is also possible to define priorities and weights for individual IPs. When answering a request, the name server reorders all relevant IP addresses according to the pre-configured weight and priority (every answer gives same-priority addresses in a different order). It is up to the client to process the list of addresses, i.e. to try the second one if the first one fails, then try the third one if necessary, etc.
Unfortunately, at the time of designing the system, there was fear that not all the clients the system needs to support are capable of processing DNS replies containing multiple addresses correctly. Some clients tend to try the first address and then give up. This makes the Round-Robin technique insufficient and calls for yet another, more sophisticated solution overcoming the shortcomings of certain clients by delegating more load-balancing control to the name server.
Clearly, the problem may be solved by a modified DNS server capable not only of answering a standard DNS request, but also of balancing the load among several servers. This requirement is met for example by the lbnamed tool written in Perl. (a load-balanced named = DNS server combined with load-balancing techniques) Besides the actual DNS Server, the lbnamed solution relies on a poller component used to periodically contact the servers and verify not only their availability but also their current load (server load, number of users). To make this kind of information accessible, servers need to run a daemon communicating to the load balancing service - the lbcd tool that answers UDP requests generated by poller. Information gathered by poller is used to keep track of server availability. Should any of the servers become inaccessible, its IP address is no longer advertised as a relevant answer to DNS requests. Similarly, should the load imposed on any of the servers exceed certain threshold, the server's IP address is offered as an answer to proportionally less DNS requests as compared to the addresses of other servers forming the cluster.
lbnamed always answers every request with one DNS address - one which seems most suitable from the availability and load standpoint.
The TTL (Time to Live) parameter of all DNS responses provided by the lbnamed tool is set to 0. This value defines the validity period for the answer. With TTL set to zero, it is possible to use the address immediately, but it cannot be kept for future use. Whenever the client needs to query the server (cluster) name again, it has to contact the name server and get a new answer. This ensures that addresses are never stored in any cache, and that the user always receives the address of an authorization server that is currently the most suitable for processing the request.
Figure shows a diagram of a reliable - fault-tolerant - WebAuth solution. When using several DNS servers, it is advisable (or perhaps necessary) to connect various servers to various segments of the network, preferably dependant on various network hardware. Such a solution makes the system resistant to partial network failures, DNS server HW faults, etc. Load-balancing DNS servers (lbdns1.zcu.cz and lbdns2.zcu.cz) do not exchange network availability information directly. It is gathered independently by checking the (in)availability of other DNS servers.
The Figure does not describe the actual authentication - appropriate description has already been given in documentation referred to above. All steps described here take place before user authentication and authorization take place.
1. The user tries to access a WebAuth-enabled application.
2. The application "does not know" the user and makes a redirect to the WebAuth server named - in our case - webkdc.zcu.cz,
3. The client does not know the corresponding IP address, which is why it places a DNS query saying: "What is the IP address for webkdc.zcu.cz?",
4. The DNS server checks its configuration and realizes that webkdc.zcu.cz is an alias for host webkdc located in the lb.zcu.cz domain ("lb" = load-balanced) being served independently by two load-balancing DNS servers (communicating with logon servers by means of the lbnamed tool described earlier): lbdns1 and lbdns2. The client contacts both servers asking for the address.
5. Both servers - working independently - verify the availability and load of individual servers, pick the most suitable address, and send it back to the client as a relevant answer.
6. The client only uses one address - the first one it receives from any of the serves (either lbdns1 or lbdns2).
7. (not shown in the figure) The users uses the answer (whose TTL has been set to zero) to contact the server and discards it. Further communication takes place over the WebAuth protocol.
It is obviously not necessary to make all DNS servers run the load balancing suite. The best solution relies on creating a virtual DNS zone, such as lb.zcu.cz, served by one or more lbnamed-enabled servers. Adding appropriate records into the DNS configuration ensures that DNS queries concerning the " lb.zcu.cz " domain will be served by load-balancing servers lbdns1.zcu.cz and lbdns2.zcu.cz:
lb IN NS lbdns1.zcu.cz. lb IN NS lbdns2.zcu.cz.
It is also possible to overcome the problem of having to include subdomain names into the names of the name servers - most typically by registering a DNS alias. For example, we may use webkdc.zcu.cz to represent webkdc.lb.zcu.cz.
webkdc IN CNAME webkdc.lb.zcu.cz.
Finally, we need to complete the configuration of the lbnamed load-balancing cluster. The main lbnamed configuration file (sweet.config) contains records of actual servers mapped to server names registered in the lb.zcu.cz load-balancing domain. The records - each of then shown in a separate line - take the following format: "<hostname> <weight> <cluster>". It is possible to increase the weight of a selected server to designate it as one with a higher preference. For example, weights may be set to reflect the processing capacity of each server. One configuration file may be used to define multiple load-balancing clusters. sweet.config records for our example setup shown above would be as follows:
webkdc1 1 webkdc webkdc2 1 webkdc webkdc3 1 webkdc
1.1 Synchronizing WebKDC AES Private Keys
SSO provided by a cluster of logon servers requires identical private keys to be distributed among all WebKDC servers to function properly. Private keys are used by WebKDC servers to encrypt proxy tokens issued and provided to users as cookies. Proxy tokens contain user KRB tickets and are key to working SSO. That is why all logon servers within the cluster need to be able to decrypt cookies sent by users when accessing resources repeatedly.
With a single logon server the standard life span of a private AES key
may be determined by the WebKDC server configuration file. In a single
server scenario, key updates are carried out directly by the
mod_webkdc module. However, with a cluster of servers, this
function needs to be disabled since it would cause individual WebKDC servers
to replace their keys independently without regard of other servers. Key
updates have to be initiated "from the outside" with automatic
private key updates disabled for all servers in main config file of Apache
server's module named mod_webkdc (see [Gro05]).
WebKdcKeyringAutoUpdate off
Fortunately, WebAuth developers have anticipated the multiple WebKDC server scenario and made it possible by introducing wa_keyring - a private key managing tool. One of the servers may be designated as a "master" performing crontab-initiated key updates regularly (for example once a month). The original key may be replaced by the new one, however it is advisable to make key lifespans to overlap by at least the standard lifespan of a proxy token (i.e. that of the user's krb ticket). The wa_keyring suite introduces, among others, a garbage collector capable of removing expired keys (such as keys older than a given number of days) from the key file.
Subsequently, the keys generated by the master server need to be communicated securely to the other servers forming the cluster. This may be achieved, for example, by scp, the secure copy tool. To enable automatic key propagation without the need for entering the password repeatedly, a public/private key pair needs to be generated for each master/slave combination by means of the ssh-keygen tool. As a last step, files holding the newly generated public keys have to be copied to their appropriate servers. The following text gives step-by-step directions to generating the private/public key pairs:
- Logon to the master server and generate a private/public key pair:
ssh-keygen -t dsa
- Generate the keys without setting a password and store them in files - preferably in the default /root/.ssh/id_dsa and id_dsa.pub respectively.
- Append (!) the contents of the public key file (id_dsa.pub) to the /root/.ssh/authorized_keys file located on each server in the cluster.
- Logon manually as root to each of the cluster servers hosting the public key file.
- Configure the master server's crontab to run the following script:
#!/bin/sh
# The following line drops all keys older than 60 days
/usr/bin/wa_keyring -f /etc/webkdc/keyring gc -60d
# Generate a new key with a validity period starting in 2 days
/usr/bin/wa_keyring -f /etf/webkdc/keyring add 2d
# Copying the keyring file to other servers in the cluster
scp /etc/webkdc/keyring root@webkdc2.zcu.cz:/etc/webkdc/
scp /etc/webkdc/keyring root@webkdc3.zcu.cz:/etc/webkdc/
1.2 Conclusion
The technique described here has allowed us to set-up a stable and secure SSO environment, which has been put to pilot operation at the University of West Bohemia at the turn of 2003/2004. Standard operation begun in August 2005. At the present time (November 2006), the system interconnets over 50 web-based applications and its ability to keep working even with a portion of the network or its services out of operation has become a key feature that enables the WebAuth solution to spread across the University's IT environment.
Introducing the lbnamed tool is definitely not the only possible solution. However, it is a very effective one, and also one that meets the University's needs. A similar approach may be used to provide for load-balancing capabilities in other key services (such as LDAP).
References
| [Gro05] | Grolmus P.: WebISO: Single Sign-On řešení pro WWW [Single Sign-On Solution for WWW]. Technical Report 7/2005, CESNET, Praha, 2005. |
| [GS05] | Grolmus P., Švamberg M.: Single Sign-On řešení pro webové aplikace [Single Sign-On Solution for Web Applications]. In Proceedings of the XXVII. EurOpen Conference, EurOpen, Plzeň 2005, ISBN 80-86583-09-0, p. 87-100. |