Caching Architectural Considerations

From: Brian W. Spolarich <briansp@dont-contact.us>
Date: Wed, 30 Oct 1996 16:42:47 -0500 (EST)

  ANS is in the process of considering implementing some sort of
heirarchical caching service. Recent usage data suggests that up to 75%
of our long-haul circuit usage is from HTTP traffic. From what I
understand, we're somewhat late in the game in this arena.

  We'd like to implement this service for our users as easily as possible,
as proxy usage requires active participation on the part of individual
users (i.e. client configuration). The proxy service must be stable,
perform measurably better than non-proxied connections, highly-available,
and easy to use. A service that doesn't meet these requirements will not
receive very enthusiastic response from the user community, which in this
case is quite large.

  Cache servers should be distributed around the network backbone, and
ideally users would select caches that are topologically closest to
them, setting cache preferences according to network topology or even
topology + current network usage.

  First of all, is there any work being done in this area, or are there
any solutions specified or available? From what I understand, the NLANR
heirarchy for example distributes cache-to-cache access in a reasonable
way, but doesn't address how the client selects which cache server to
proxy through.

  One possibility would be to run the cache service on multihomed servers
on which the aliased address (or address prefix) is announced in multiple
places, and let routing protocols handle the traffic redirection. This
solution works, but does make monitoring such a service something of a
"black box" in that the monitoring system can never directly the service
at a particular node. This isn't necessarily a show-stopper, but is a
definite operational downside.

  Another possibility is to use dynamic client proxy configuration such as
that which Netscape provides. The JavaScript-based proxy configuration is
quite flexible, and we could even generate the configuration data based on
the clients IP address (perhaps using routing data to determine which
network aggregate the client is coming from and generating a list of
cache servers appropriate to that client).

  The problems with this approach are that this solution is not an open
one (Netscape is the only widely-distributed client of which I am aware
that offers this type of configuration), and that this type of dynamic
configuration would require clients to request the configuration data
every time they start up (at least, Netscape Navigator does). If this
configuration data were generated by a CGI or other server process, this
could result in significant load on the server generating the client
configuration). Again, not a show-stopper, but definitely an important
operational consideration.

  A middle ground is to provide dynamic configuration data and have
clients select which one they want to use based on their geographic
location (i.e. provide multiple, static config files). This would result
in less load on the config-serving machine, but would be less flexible.

  Maybe I'm making this too complicated. Do other folks have experience
and/or wisdom to share on this topic?

  I've begun testing squid 1.0 and 1.1, and the package seems stable and
easy-to-configure. Documentation seems to be lacking...is this arena
still an "experts-only" "muddle-your-way-through-to-wisdom" one?

  -bws

--
       Brian W. Spolarich - ANS - briansp@ans.net - (313)677-7311
             The net has fall'n upon me! I shall perish... 
Received on Wed Oct 30 1996 - 13:42:51 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:33:24 MST