>>>>> "Dancer" == Dancer <dancer@zeor.simegen.com> writes:
Dancer> I've thought about this from time to time, specifically that multiple URL's
Dancer> may refer to identical copies of an object, and that it would be preferential
Dancer> (from a bandwidth perspective, maybe from others as well) to store only a
Dancer> single copy of the object.
Well, one obvious answer is to use indirection. You have a struct indexed on
URL, which contains a pointer to a page content, indexed by the SHA hash of
said content. You then never store more than one copy of an identical
object. The content object would require a reference count, and perhaps a
reverse link to the URL(s).
If all the refresh data, etc. is in the URL, you avoid the issues. Of
course, you now have to deal with decisions about when to free the URl entry
(it's expired, but the content is still fresh from another URL, what do you
do?).
-- Carson Gaspar -- carson@tla.org carson@cs.columbia.edu carson@cugc.org http://www.cs.columbia.edu/~carson/home.html Queen Trapped in a Butch BodyReceived on Sun Apr 09 2000 - 23:58:15 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:22 MST