Re: unresponsive cache and stats with a loaded cache from Andres Kroonmaa on 1997-11-10 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Mon, 10 Nov 1997 14:15:18 +0200 (EETDST)

--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT

Date sent: Fri, 7 Nov 1997 15:08:24 +0200
From: Oskar Pearson <oskar@is.co.za>
To: squid-dev@nlanr.net

> One of my caches just became quite unresponsive - here is
> what a strace -f -c revealed... (It's not for a long period of time,
> though, since I am afraid it's lagging squid)
>
> cache1:
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 27.02 3.412204 131 26048 oldselect
> 21.74 2.745975 1281 2143 open
> 19.03 2.403282 261 9223 3 write
> 11.46 1.447688 236 6126 61 read
> 0.32 0.040916 1137 36 getdents
> 0.11 0.014438 1604 9 stat
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 12.630594 93117 1739 total
>
>
> It appears that people are right about the 'open' being a
> problem... Why are we calling 'getdents' anyway?

Once I chased the way squid uses its L1/L2 directory structures, and
it appeared to me that excessive amount of directories slows things
down quite a bit. I don't recall exactly, but I have an impression that
squid goes through every directory, before returning to any one for a
second time. On the way squid alters both L1 and L2 dirs in way to
distribute load evenly between disks, L1 and L2 directories on each disk.

Whats bad news with it is that Unix FS tries to cache directory entries
to speedup file searches (open) within directory and this kind of pass-through
effectively busts to nothing unix's directory caching. By the time squid
returns to once-used directory, quite alot of time passes and directory
data is not in cache. By default squid creates 4096 dirs per disk, and
with full cache each dir is about 4KB size. To keep all the dir cache
in ram you'd need to have 16MB of spare ram per each disk which is very
rarely the case.
Thus, I believe that with every single file open-for-write squid has a
directory cache miss and needs to do physical disk io for directory.
open-for-read is random in nature and this cannot be very much optimized.
All this results in the reasoning for the suggestion in squid.1.1.relnotes,
you'd want to have the minimum amount of directories needed to hold max
possible number of objects on your disks. Also, you'd want to increase
DNLC cache and have lots of unused ram to make it possible for unix to
cache directory io. Also, disabling inode-accesstime-updates and
sync mode for dir-io helps alot, but you know what you are risking here...

Personally, I'd suggest to change squid's store logic a bit in a way that
it would always use smallest numbered file available, instead of distributing
load between L1/L2 dirs. Clearly, distributing load between disks is another
issue and doesn't add very much to dir-cache load. Having fixed number
of files per any directory there is no speed-loss, but for caches that have
100 times more directories created than there ever could be objects in
them it is IMHO more reasonable to have few directorys full of 256 objects
rather than have zillions of dirs with few files in each.
Possibly allows to resize L1/L2 up when needed on the fly and does not use
directories that are created without a need.
It would be also lovely if there was separate filemap for each disk and its
corresponding swapindex file was written to its cache_dir root. This would
allow to tolerate failed disks and allows to add/remove disk space on the fly.
Also it would allow to distribute load between disks more precisely, based
on actual historical byte/sec io if you like.

----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
----------------------------------------------------------------------

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:44 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:29 MST