Re: Possible reason for std::vector crashes

From: Kinkie <gkinkie_at_gmail.com>
Date: Wed, 26 Mar 2014 10:13:16 +0100

And instead..
the code testing Alex's suspects triggered in peer_select.cc:337, this
is the relevant snippet of stack trace:

#3 0x0000000000632f2d in Vector<RefCount<Comm::Connection> >::size (
    this=<optimized out>, this=<optimized out>) at ../src/base/Vector.h:355
#4 0x0000000000638a45 in peerSelectDnsResults (ia=0x29ff8b0, details=...,
    data=0x2cf59b0) at peer_select.cc:337
#5 0x000000000060d4e6 in ipcacheCallback (i=i_at_entry=0x29ff890,
    wait=wait_at_entry=19) at ipcache.cc:347
#6 0x000000000060dbd4 in ipcacheHandleReply (data=<optimized out>,
    answers=<optimized out>, na=<optimized out>, error_message=0x0)
    at ipcache.cc:497
#7 0x000000000058b357 in idnsCallback (q=0x24eec80, q_at_entry=0x24f5370,
    error=error_at_entry=0x0) at dns_internal.cc:1127

I'm currently investigating more in depth.

On Wed, Mar 26, 2014 at 8:32 AM, Kinkie <gkinkie_at_gmail.com> wrote:
> Hi,
> I think I have isolated the changelet in FwdState::connectionList.
> However WHAT in that changelet may be causing the crashes is still
> unknown. The hunt for iterate-after-clear has so far turned up
> nothing.
>
> On Wed, Mar 26, 2014 at 8:01 AM, Niki Gorchilov <niki_at_gorchilov.com> wrote:
>> Hello there,
>>
>> Is there any progress regarding the random coredumps related to
>> std::vector migration?
>>
>> Best,
>> Niki
>>
>> On Mon, Mar 17, 2014 at 6:41 PM, Kinkie <gkinkie_at_gmail.com> wrote:
>>> Yes, I will.
>>>
>>> On Mon, Mar 17, 2014 at 5:24 PM, Alex Rousskov
>>> <rousskov_at_measurement-factory.com> wrote:
>>>> Hello,
>>>>
>>>> I have discovered one difference between std::vector and Squid's own
>>>> Vector implementation that might explain some of the random crashes that
>>>> some of us have been seeing after the Vector class was replaced with
>>>> std::vector in trunk.
>>>>
>>>> Squid Vector sets the number of stored elements to zero on destruction.
>>>> Std::vector does not do that. Here is the output of a simple test code
>>>> quoted in the postscriptum:
>>>>
>>>>> alive std::vector size: 1
>>>>> deleted std::vector size: 3
>>>>> alive Squid Vector size: 1
>>>>> deleted Squid Vector size: 0
>>>>
>>>> Both vectors behave correctly, but std::vector code is optimized not to
>>>> do extra work such as maintaining the size member. This means that
>>>> iterating a deleted Squid Vector is often safe (until the freed memory
>>>> is overwritten) because the broken caller code would just discover that
>>>> there are no items in the vector and move on.
>>>>
>>>> Iterating the deleted std::vector usually leads to crashes because all
>>>> iteration-related methods such as size() rely on immediately deleted and
>>>> changed pointers (std::vector does not store its size as a data member
>>>> but uses memory pointers to compute the number of stored elements).
>>>>
>>>> This difference leads to std::vector migration problems such as this
>>>> trivially reproducible one:
>>>>
>>>>> Adaptation::AccessRules &
>>>>> Adaptation::AllRules()
>>>>> {
>>>>> static AccessRules TheRules;
>>>>> return TheRules;
>>>>> }
>>>>
>>>> After TheRules array is deallocated during global destructions,
>>>> iterating AllRules() becomes unsafe. The old Vector-based code usually
>>>> worked because deallocated TheRules had zero size.
>>>>
>>>> The specific bug mentioned above is trivial to work around by allocating
>>>> TheRules dynamically (and never deleting it). However, if similar bugs
>>>> exist in other code where Vector is used as a data member of some
>>>> transaction-related structure, then they will lead to crashes. It is
>>>> quite possible that the affected structure's memory itself is never
>>>> deleted when the offending code accesses it (due to cbdata locks, for
>>>> example) so the [equally buggy] Vector-based code "works".
>>>>
>>>> One way we can try to catch such cases is by temporary switching back to
>>>> Vector and then:
>>>>
>>>> * Modifying Vector to mark deleted Vector objects:
>>>>
>>>> Vector::~Vector() {
>>>> clean();
>>>> deleted = true;
>>>> }
>>>>
>>>> * And adjusting *every* Vector method to assert if a deleted object is
>>>> still being used. For example:
>>>>
>>>> template<class E>
>>>> size_t
>>>> Vector<E>::size() const
>>>> {
>>>> assert(!deleted);
>>>> return count;
>>>> }
>>>>
>>>> If one of those asserts is triggered, we would be closer to solving this
>>>> mystery.
>>>>
>>>>
>>>> Kinkie, if you agree with this analysis, would you be able to make the
>>>> above modifications and test?
>>>>
>>>>
>>>> Thank you,
>>>>
>>>> Alex.
>>>>
>>>>
>>>> Goes into Squid's main.cc:
>>>>> +#include <vector>
>>>>> +#include "base/Vector.h"
>>>>> int
>>>>> main(int argc, char **argv)
>>>>> {
>>>>> + typedef std::vector<int> StdVector;
>>>>> + StdVector *stdv = new StdVector;
>>>>> + stdv->push_back(1);
>>>>> + std::cout << "alive std::vector size: " << stdv->size() << "\n";
>>>>> + delete stdv;
>>>>> + std::cout << "deleted std::vector size: " << stdv->size() << "\n";
>>>>> +
>>>>> + typedef Vector<int> SqdVector;
>>>>> + SqdVector *sqdv = new SqdVector;
>>>>> + sqdv->push_back(1);
>>>>> + std::cout << "alive Squid Vector size: " << sqdv->size() << "\n";
>>>>> + delete sqdv;
>>>>> + std::cout << "deleted Squid Vector size: " << sqdv->size() << "\n";
>>>>> + if (true) return 0;
>>>>> return SquidMainSafe(argc, argv);
>>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Francesco
>
>
>
> --
> Francesco

-- 
    Francesco
Received on Wed Mar 26 2014 - 09:13:28 MDT

This archive was generated by hypermail 2.2.0 : Thu Mar 27 2014 - 12:00:15 MDT