Re: [squid-users] TIMEOUT_ROUNDROBIN_PARENT and poor SIBLING_HIT performance

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 24 Feb 2011 13:34:42 +1300

 On Wed, 23 Feb 2011 14:55:25 -0800, M. Leong Lists wrote:
> Hi,
>
> I've 2 problems where squid is taking excessive time to service a
> request.
>
> My setup:
> -Accelerator setup
> -backends are on load balancer, squid is configured to connect to the
> load balancer IP multiple times
> -squid's configured to store the cache as long as possible.
> -icp time is set to really high, otherwise some siblings doesn't
> respond in time. Should this be lowered?

 Think about that a bit:
  * the sibling is taking a very long time to respond to a single ICP
 packet.
    What do you think the speed will be like to it when you send a whole
 bunch of request and reply packets? better/same/worse?

 So in the end do you think it is a better idea to ICP-timeout and mark
 the peers as down/unusable fast and move on to the alternatives? or to
 keep waiting?

>
> Version:
> Squid Cache: Version 2.7.STABLE9
> configure options: '--prefix=/apps/squid'
> '--enable-x-accelerator-vary' '--enable-linux-netfilter'
> '--enable-cache-digests' '--enable-htcp' '--enable-snmp'
> '--enable-referer-log' '--enable-useragent-log'
> '--enable-delay-pools'
> '--enable-icmp' '--enable-async-io=500' '--with-maxfd=10240'
> '--enable-removal-policies=lru,heap'
> '--enable-follow-x-forwarded-for'
> '--enable-epoll' '--with-large-files'
>
> Relevant config:
>
> http_port 80 vhost defaultsite=cache.example.com
> cache_mem 512 MB
>
> cache_peer lb.example.com parent 80 0 round-robin no-query
> originserver no-netdb-exchange no-digest name=lb_01
>
> ... <snip>
>
> cache_peer lb.example.com parent 80 0 round-robin no-query
> originserver no-netdb-exchange no-digest name=lb_10
>

 So you are manually load-balancing the way connections are made to a
 load balancer. WHY? what happens if you remove these duplicate peer
 links?

 NP: squid defaults to 10 connection attempts to each peer before it
 gives up. So you have potentially a grand total of 100 TCP connections
 made through the LB before the request fails.

 Update the LB to only make connection attempts to working sources and
 use it once by Squid. If it is already doing that smart logics, this
 configuration setup is not of much use.

 Or if the LB is not smart enough to do that kind of control it is of
 less use than the built in load-balancing which Squid does. Discard it
 and just use the round-robin selection directly to the peers behind the
 LB. All the problems you have with end-to-end path discovery, connection
 up/down status and persistence will disappear.

> cache_peer cache01.example.com sibling 80 3130 proxy-only no-delay
> allow-miss weight=1 no-netdb-exchange no-digest name=cache01
>
> ..<snip>
>
> cache_peer cache08.example.com sibling 80 3130 proxy-only no-delay
> allow-miss weight=1 no-netdb-exchange no-digest name=cache08
>
> client_persistent_connections off
> server_persistent_connections off

 The above will be part of your lag problem. I know why you do it,
 separating persistent connections and load balancing do not work
 together very easily. Just saying that it will be a factor in the
 problem.
 Your Squid is reduced to a pure HTTP/1.0 level of efficiency with TCP
 handshakes (possibly multiple) being done with every single client
 request. All the HTTP/1.1 efficiency features to maintain long-term
 persistent connections become a net loss of performance when connections
 are forced closed all the time.

 You should be able to re-enable persistent connections to clients
 without problem. Given a reasonable timeout this will enable clients to
 pipeline requests through the connection to Squid without leaving them
 unused for long periods. It has no effect on the server facing
 connections and their LB.

> digest_generation off
>
> icp_access allow all
> icp_hit_stale on
> icp_query_timeout 7000
> maximum_icp_query_timeout 10000
> nonhierarchical_direct off
> url_rewrite_host_header off
>
> offline_mode on
> --------------------------------------
>
> TIMEOUT_ROUNDROBIN_PARENT
>
> All the TIMEOUT requests took at least 7000 ms, which is the value of
> icp_query_timeout. Some requests took at over 30 sec to complete. I
> crossed referenced those long requests against the backends and
> notice
> a big mismatch in the times. The backends are tomcat apps w/ Java
> 1.6. I extracted the times from the tomcat access log.
>
> Squid Time Backend time:
> 7922 924
> 8422 1421
> 7488 487
> 12835 5833
> 25098 18096
> 34793 611
> 21806 14804

 Time difference will be multiplied by the time Squid spends waiting for
 a TCP handshake to occur on every connection. This is the full RTT of
 three packets to cycle Squid->LB->tomcat and back again. Multiple that
 by the 10-100 new connections your Squid is configured to make to the LB
 before aborting with failure.

>
> ------------------------------------
> High SIBLING_HIT response time:
>
> The same problem occurs with sibling hits. The logged process time
> on the sibling and the one requesting from the sibling vastly
> differs:
>
> Squid Time Time on Sibling's Log
> 4534 30
> 23994 12959
> 6661 40
>
> ---------------
> Does anyone know of a reason why it would take so long for squid to
> complete a request??
>
> mike

 Thats all I can think of off the top of my head, maybe more later.
 Good luck.

 Amos
Received on Thu Feb 24 2011 - 00:34:46 MST

This archive was generated by hypermail 2.2.0 : Fri Feb 25 2011 - 12:00:03 MST