Uploaded image for project: 'Support'
  1. Support
  2. SUPPORT-8

Possible race condition causing CPU-bound loop in signerd?

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: OpenDNSSEC 1.3.0
    • Fix Version/s: None
    • Component/s: Signer
    • Labels:

      Description

      I've a problem with ODS 1.3.2 (or rather the 1.3-branch, as of rev. 5653,
      but I've seen it for all 1.3-versions) running on a RHE 5.7 system.

      The ods-signerd now and then (every second week or so) becomes stuck in a
      CPU loop. Most of the threads use CPU. It looks as if the problem is
      related to a mutex lock (futex, to be more specific)..
      (The pthread_cond_timedwait call in ods_thread_wait in signer/src/shared/locks.c).

      While stuck in the loop, it also keeps a lock on the kasp database.
      I've some other processes (backups of the key database for example) that
      wait for the same database lock. If it is a race condition when accessing
      the kasp database or if it is something internal to ods-signerd is unclear.
      It may even be a RHE Linux bug. It looks as if futex locks have had some
      problems in earlier Linux versions (and programming using them is
      tricky in any version).

      I attach some gdb- and other output from the process and its threads. I
      planned to leave ODS in the "loop" state to allow further info collection
      but after some strace commands, the hang was resolved. However, the next
      sign got stuck in a waitpid-call and the only way to resolve that was
      to stop (ods-control stop and kill of some processes). Maybe some
      internal state got confused by the long time in a CPU-bound loop.

      Known problem? Linux problem? Some resource problem? Other?

      / Göran Bengtson
      Chalmers Univ. of Technology

        Attachments

          Activity

            People

            Assignee:
            matthijs Matthijs Mekking
            Reporter:
            goeran@chalmers.se goeran
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: