--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
> > >> Once I chased the way squid uses its L1/L2 directory structures, and
> > >> it appeared to me that excessive amount of directories slows things
> > >> down quite a bit. I don't recall exactly, but I have an impression that
>
> > >that a configurable number of files (default 512, maybe a little too high)
> > >are written to a directory before going to the next one. Alternating
> > >among all cache_dirs is still done, too.
> >
> > I decided to apply the patch on a real production cache this weekend and
> > have gathered one day of business-hours (08:00h-18:00h) usage data (data
> > has been gathered with my Squid timer patch). The results are quite good
> > if you compare the results of last Friday and today (Monday):
> .
> .
> .
> > open(2)'s for write have dropped from 35ms to 5ms, open(2) for read had
> > dropped from 24ms to 8ms, read(2)'s have dropped from 12ms to 8ms. This
> > all leads to 46.3% idle time, i.e. Squid is waiting in select(2) for
> > something to happen. Waiting for disk I/O (diskr/w+openr/w) has dropped
> > from 68.37% to 25.93%.
>
> A question though - isn't this only going to be useful on a
> new,empty cache (or a cache with only a few files in it).
>
> Given a totally full cache you are still going to get the original
> distribution, aren't you? So for a cache like ours (totall stuffed full
> the whole time) you wouldn't see any real benefit...
Depends on what you mean by full. There are two distinct meanings for it:
full by volume, and full by filecount. When you distribute files from start
on all precreated dirs, then when you reach full by volume, you may have very
many unfilled dirs and already start to reuse files. If you start filling
dirs in a row, by the time you reach full by volume, you have used only
fixed number of dirs, and after starting to reuse files, you operate with
much smaller number of dirs, even if there are created many more.
I believe that randomness of allocating files was introduced when there was
no fixed number of files per dir, then it was the only way to guarantee that
no single directory had excessive amount of files in it. After fixed dir
structure was introduced, this behaviour actually became limiting factor, IMHO.
Squid currently uses bit-array for fileno allocation. its size is fixed to
accomodate 2^21 (2M) files. If using always smallest free fileno, and mapping
this to filename in fixed manner, we have fixed number of dirnames to operate
with and a feature that if our cache size could hold only some fraction of 2M
files, we simply don't reach larger dirs, instead, we'll start reusing old
files. Possibly, we don't need unlinkd at all, instead we just overwrite old
files with new data. LRU just frees "right" files for us. We'd need to unlink
only to cleanup very rarely used dirs. We also do not need to create all
the dirs beforehand, we can create them on the fly as needed. We'd need to only
in case of empty cache and if average object size in cache drops and we can
accomodate more objects, in any case, this would be relatively rarely.
Optimal structure seems to be to take MaxItemsPerDir as function of optimal
disk block size (making sure that MaxItems fit in single block), create 1 L1
dir and 1 L2 dir and start working. increment L2 as needed and when reaching
MaxItemsPerDir then go L1++.
Simply put, user should provide only no of cache_dirs and disk blocksize,
everything else is a job for squid.
Eventually we'd end up lowest numbered dirs to be a hotspot on any cache_dir,
each L1/L2 dir filled optimally and have minimal no of directories overall,
which is good for dir caching.
There not so much problem with the algoritm of mapping fileno to filename,
rather with the fileno/filename usage pattern from OS-es point of view.
----------------------------------------------------------------------
Andres Kroonmaa mail: andre@online.ee
Network Manager
Organization: MicroLink Online Tel: 6308 909
Tallinn, Sakala 19 Pho: +372 6308 909
Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
----------------------------------------------------------------------
--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:44 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:30 MST