Re: [MERGE] wordlist refactoring

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 15 Aug 2008 19:13:15 +1200 (NZST)

> On Wed, Aug 13, 2008 at 9:11 AM, Amos Jeffries <squid3_at_treenet.co.nz>
> wrote:
>> Kinkie wrote:
>>>
>>> Hi all,
>>> this patch is an attempt at streamlining the wordlist implementation
>>> in
>>> 3.HEAD, by:
>>> - OO-izing it [1]
>>> - providing an unit-test for it
>>> - giving it a couple of convenience functions
>>> - documenting it
>>>
>>> It's still a work-in-progress, but in the end a wordlist would probably
>>> be
>>> better implemented by using a set of Strings anyways.
>>> It's however a step forward already, since it gives wordlist callers
>>> better c++ semantics than the current implementation in HEAD.
>>>
>>> The patch has been compile-tested, unit-tested, test-suite'd and
>>> run-tested.
>>>
>>> Please review and hopefully apply.
>>>
>>> Kinkie
>>>
>>
>> NP: all I've done is read the patch and mention things that stand out.
>> Some
>> of the below need fixing, others just explaining why.
>
> Sure. Thanks for taking the time to check it.
>
>>
>> 1)
>>
>> in src/Makefile.am
>> @@ -1188,6 +1188,7 @@
>> tests/testURL \
>> + tests/testWordlist
>> @STORE_TESTS@
>>
>> a) Missing a '\' on that line.
>> b) How did this pass build tests? particularly the new testbed ones.
>
> You're right. Maybe the tests failed silently...
>
> Fixed; re-ran the tests, seems to be fine.. Unfortunately the
> significant output gets drowned in huge noise from libtool and gcc. Is
> there any way to silence it, like for instance is done in the Linux
> kernel? I'm not an automake expert..

I'm looking at it. Theres output being sprayed in all directions from
sub-scripts.

>
>> 2)
>> if (authenticate) {
>> delete authenticate;
>> authenticate = NULL;
>> }
>> is equivalent to:
>> safe_delete(authenticate);
>> provided by the shared squid compat library headers.
>
> Are you sure? I can't find any safe_delete string in the whole
> sources; maybe you mean safe_free? If so, it's probably not a good
> choice to mix different allocators and destructors (new / xxfree),
> especially on platforms with very complex heap managers such as
> Windows.

I meant safe_delete, as you say its not good to mix alloc/deallocators.
My brain going again I think. I'm sure I saw it defined somewhere.
Oh well, its maybe needed anyway.

>
>> same for: Config.Program.redirect
>> same for: dnsservers->cmdline
>> same for: err->ftp.server_msg
>> same for: p->cmdline and p->arguments in external_acl line 194 & 574
>>
>> in fact everywhere you've gone delete. Whether you are using if()
>> already or
>> not. (The non-use of if() before delete is unsafe anyway).
>
> I tried using it only where I felt it was sensible. In the other cases
> the deleted member was expected to be there, and by not testing I'd
> prefer to expose bugs in the initialization code paths.
>
> In other words, there's two choices:
> where pointer is optionally initialized:
>
> if (pointer)
> delete pointer;
>
> where pointer is mandatorily pointing to something meaningful as part
> of the constructor:
> assert(pointer);
> delete pointer;
>
> I just implicitly assert()ed by not specifying it, it'd bomb anyways.

Understood. Just be sure it bombs with a good trace.

>
> It's a matter of style, so I'd like to have your feedback on this. My
> personal take is that I'd rather have squid bomb on broken object
> initialization rather than hiding the bug every time a member is
> accessed (after all in the end it's more efficient to make sure that
> something is there rather than continuously check whether it is, at
> least in rather straightforward cases such as these)

My personal preference is to always check. If something goes badly wrong
it will pass the destruct check and bomb anyways. Having a must-exist
requirement places a lot of burden on the callers.

The issue is a matter of efficiency, whether the pre-condition causes more
OPs allocating data for these objects than it saves from elimination of
duplicate data validation.

(I'm hoping we can come up with a good benchmark tool for this type of
thing at the meet).

>
>> 3)
>> There is a lot of 'new' allocation going on thats followed by 'delete'
>> at
>> the end of the parent scope.
>> ie RemovalPolicySettings member pointer, adaptation ServiceGroup
>> functions
>> On the face of the patch without digging into the usage code, they
>> appear to
>> be one-use memory
>> Why are these not able to be plain non-pointer member or local variable
>> now?
>
> Should be doable in some cases, but it exposes some warts, i.e. in
> parse_wordlist. In fact, I'd rather attack those from that angle.
>

Understood. The mess will need removing at some point. From your later
talk it looks like that is your long-term aim.
The few cases that don't have warts should be fixed in this patch anyway
to bump the performance gains sooner rather than later.

>> 4)
>> I'm sure the ACL Walkee's don't need wordlist**, now that you are
>> pre-allocating the wordlist.
>
> yup. See comment on parse_wordlist.
>
>>
>> 5)
>> === modified file 'src/client_side_request.cc'
>> --- src/client_side_request.cc 2008-06-09 01:58:19 +0000
>> +++ src/client_side_request.cc 2008-08-09 16:13:04 +0000
>> @@ -666,7 +666,7 @@
>> const char *url = http->uri;
>> HttpRequest *request = http->request;
>> HttpRequestMethod method = request->method;
>> - const wordlist *p = NULL;
>> + const wordlist *p;
>>
>>
>> Why is this small piece of pointer-safety being removed? '*p' is not
>> guaranteed to be initialized correctly outside the defining function
>> scope.
>
> p is assigned to 40-some lines later, before being dereferenced anywhere.
>
> Removing this initialization is, however, intentional: calling member
> functions on NULL is going to fail anyways, and that's what would be
> done to manipulate an uninitialized object. On the other hand not
> assigning the variable gives a chance to the compiler's static code
> path checks to detect uninitialized variables usage,

Okay.

>
>> 6)
>> === modified file 'src/tests/stub_tools.cc'
>> --- src/tests/stub_tools.cc 2006-09-15 02:13:23 +0000
>> +++ src/tests/stub_tools.cc 2008-08-11 14:54:43 +0000
>> @@ -45,3 +45,4 @@
>> {
>> fatal ("Not implemented");
>> }
>> +
>>
>> can be reverted from the merge.
>
> Done.
>
>> 7)
>> for the documentation you don't need all the: "\ingroup wordlist"
>> class methods are automatically grouped.
>
> Ok, done. Not an expert in doxygen, and I just copied from elsewhere.
>
>> 8) as mentioned the nastiness in wordlist old behavior. That should go
>> if
>> possible during this cleanup. IIRC it was mainly the O(N+1)
>> insertion/append
>> algorithm, a simple tail pointer should speed that up to O(1).
>> Adrian may have other bits he noticed, or you might have seen others
>> already
>> yourself.
>
> There's problems there, due to the fact that the current wordlist
> object is a C-ism, filling in both roles of a wordlist and
> wordlistNode. adding a tail-pointer would raise each node's memory use
> by 50%. On the other hand, splitting the class into wordlist,
> wordlistNode and wordlistIterator is quite a huge coding effort for
> something which should at the end of it all become something like
>
> typedef boost::intrusive::slist<String> wordlist;
>
> Getting to that point (not just for wordlist, but for all
> genericizable classes) is my long-term goal.

Ah, understood.

>
> The focus of my refactoring effort was (and unless there's consensus
> it should change, remains) to work on the callers' side rather than
> trying to be too clever on the callees' side.
>
>
> I'm attaching a new version of the patch, addressing most concerns. It
> obsoletes the previous one.
>
> --
> /kinkie
>

You will need to submit each as a new thread or at least as direct reply
to the thread head for BB.

Amos
Received on Fri Aug 15 2008 - 07:13:18 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 19 2008 - 12:00:07 MDT