Following some IRC chat, I thought I'd start a discussion on a
possible improvement of refresh_pattern in Squid3.
The starting point for this discussion is the fact that
refresh_pattern is a source of confusion for many users, even expert
admins. It's not obvious what it does, how to achieve certain things,
or under what circumstances different bits of it apply or don't apply.
Currently refresh_pattern means different things depending on how the
response freshness was calculated: whether by explicit header set by
the origin server (Cache-Control, Expires), by invoking the Last-
Modified algorithm (if it had a Last-Modified header), or whether it
could not calculate a freshness by either of these methods.
It's quite complicated. I don't know what the right answer is.
Here is an idea though:
We could separate the configuration out into "standard" and "HTTP
violating" parts. Let us define "standard" as the two mechanisms that
are most semantically transparent:
1. Explicit expiration set by server (Cache-Control, Expires)
2. Heuristic expiration based on Last-Modified
And let's define "HTTP violating" as anything that either overrides
these, or anything that enforces cacheability in the absence of any
of these headers.
What configuration options do we need for each of these two categories?
For the "standard" configuration:
We don't need any options for the explicit expiry mechanism, as
it's... explicit :)
However, we do need a couple of global options for the Last-Modified
factor algorithm:
TAG: refresh_lastmod_factor (percent)
Default: 20
TAG: refresh_lastmod_max (minutes)
Default: 10080
These, then, are the only refresh options I propose for a non-HTTP-
violating setup.
Now for the "HTTP violating" overrides, which are more complicated.
Defaults are set first:
TAG: refresh_override_default options
Default: none
These can be refined by regex:
TAG: refresh_override_match [-i] pattern options
Default: none
where options can be any of:
min=xxx
minimum amount of time this object will be considered fresh
max=xxx
maximum amount of time this object will be considered fresh
ignore-reload=on|off
ignore all client headers that prevent serving a cached
response
reload-into-ims=on|off
client reload is downgraded from unconditional to
conditional GET
ignore-no-cache=on|off
ignore all server headers that prevent caching a response
ignore-no-store=on|off
ignore "Cache-Control: no-store" server header
ignore-private=on|off
ignore "Cache-Control: private" server header
ignore-auth=on|off
cache authorized responses, even if server didn't specify
"Cache-Control: public"
refresh-ims=on|off
always pass client IMS requests through to the origin,
even if we think our copy is fresh
For example:
refresh_override_default max=4320 reload-into-ims=on
refresh_override_match http://host/ ignore-reload=on
ignore-no-cache=on ignore-no-store=on
refresh_override_match /path/ reload-into-ims=off
refresh_override_match \.jpe?g$ min=1440
refresh_override_match \.css$ max=60
Main differences in usage:
1. The overrides would always apply, regardless of how the expiration
time was arrived at - whether by explicit headers or last-modified
algorithm heuristics. Currently the Min, Max and Percent settings
only apply in different specific circumstances, e.g. Max and Percent
only apply to L-M requests, Min only applies in the absence of L-M,
Expires and CC max-age.
2. The refresh_override_default would always apply (although its
options may be overridden by those of a refresh_override_match).
Currently the default refresh_pattern only applies if no patterns
match the request, meaning you can't ever override default behaviour,
you can only fall back to it.
3. There is no way of setting the Last-Modified factor percentage by
regex! This is perhaps a big problem, and it could be added as an
option. But then it would be the only non-HTTP-violating directive
possible in the option... and so would spoil it slightly.
4. No need for global counterparts of refresh_pattern directives,
e.g. refresh_all_ims and reload_into_ims.
5. Frequently used override options could be stated in the default
instead of every subsequent line
This may be completely the wrong way of looking at it, or it may be
just going too far. A smaller, but still helpful, step might be to
introduce a refresh_pattern_default whose values would be inherited
by any subsequent refresh_pattern match.
Any help or input into this would be very welcome indeed
Doug
On 1 Jun 2006, at 20:06, Doug Dixon wrote:
> Hi
>
> I'm fixing bug 1202 (it's a simple fix) and am cleaning up
> refresh.cc at the same time.
>
> I'd like to review the various refresh_pattern options, as some of
> them are mutually exclusive in practice (although you can configure
> all of them) and it's not clear from the documentation what they
> all mean. They're quite hard to understand and use correctly.
>
>
> 1. reload-into-ims
>
> The following is legal:
>
> refresh_pattern html$ 5 20% 60 ignore-reload
> reload-into-ims
>
> but reload-into-ims will not have any effect. You could argue that
> this is obvious, but I think it should be caught at parse time.
>
> 2. As an aside - but I want to mention it here - we need to make it
> clearer that if an object does specify an expiry time, the Min,
> Percent and Max values in refresh_pattern will be completely
> ignored, but the options won't be. I'll change cf.data.pre accordingly
>
> 3. override-expire
>
> override-expire enforces min age even if the server
> sent a Expires: header. Doing this VIOLATES the HTTP
> standard. Enabling this feature could make you liable
> for problems which it causes.
>
> If you do want to modify the behaviour of blindly obeying the
> server's explicit expiry time, you can - to an extent.
>
> The override-expire option enforces the Min time in cache, even if
> the origin stated it should expire before then.
> But it ignores the Max time (surprising!), and the L-M factor (more
> expected - not obvious what this would do anyway)
>
> It's not very intuitive. I think we should probably make this
> option enforce the Max time as well. Possibly even ignore the
> explicit expiry of the object altogether and fall back to last-
> modified factor??
>
> It could be a naming thing... override-expire doesn't really say
> what it does. enforce-min might be better. But then you've already
> stated a min and might expect it to be already enforced.
>
> 4. override-lastmod
>
> override-lastmod enforces min age even on objects
> that were modified recently.
>
> The Min time isn't enforced even when the last-modified factor
> algorithm does kick in. If the object was only just modified and
> the L-M factor algorithm results in a figure lower than the Min, it
> will be considered fresh for less than the configured Min.
>
> This isn't what I would expect. I know that the override-lastmod
> exists to let you do this, but it's really non-intuitive. I think
> the Min should always be enforced if we're using L-M factor
> algorithm, and that we should therefore lose the override-lastmod
> option. Can't see the point in the default (null) behaviour of Min
> otherwise.
>
>
> Thoughts?
>
> Doug
>
Received on Thu Jun 15 2006 - 05:33:13 MDT
This archive was generated by hypermail pre-2.1.9 : Fri Jun 30 2006 - 12:00:02 MDT