Uploaded image for project: 'Support'
  1. Support
  2. SUPPORT-187

1.4.7 -> 1.4.9 upgrade: lost state for IXFR(?)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: OpenDNSSEC 2.0
    • Fix Version/s: None
    • Component/s: Signer
    • Labels:
      None
    • Environment:

      Description

      Hi,

      today a number of our signed zones failed our check for
      "remaining signature lifetime > 7 days" test. Looking at one
      such zone's backup2 file I find:

      ;OpenDNSSEC-backup-v3
      ;;Time: 1457291106
      ;;Zone: name korunikhum.no class 1 inbound 2015062912 internal 2016022600 outbound 2016022600
      ...
      korunikhum.no. 3600 IN SOA ns.uninett.no. elisabeth.uninett.no. 2016030600 28800 3600 604800 900
      ...
      korunikhum.no. 86400 IN RRSIG NS 8 2 86400 20160318015502 20160226120504 31363 korunikhum.no. kd0J2sH6ksROVmfRigVlaGv7to1HtkGE24aI2Z2ihxEuy6Jqn0POXQ7WXq7c1OpER+AGuVsfG9T0vALyn6f6s3kA41zeI+rlXLeb9M21hCqBM59EWKXDtFnHxut9ejtaSOMVM+ex2bSHHTz5DTqCmoqvGXAMazKBdBwzIy5YAZI=;

      {locator 559d9d0ca1306b4c56895e4bc31dfd00 flags 256}

      Note that the "internal" and "outbound" SOA version numbers don't
      actually match what's in the backup2 file itself.

      Also, when I query the downstream public master, I get:

      % dig @ns.uninett.no. korunikhum.no. soa +short
      ns.uninett.no. elisabeth.uninett.no. 2016022500 28800 3600 604800 900
      %
      % dig @ns.uninett.no. korunikhum.no. ns +dnssec +short
      ns.uninett.no.
      nn.uninett.no.
      NS 8 2 86400 20160313052741 20160221021001 22861 korunikhum.no. Q7Q8snzWLtI53jiRK+z6lSzYslVD9kpds+Edo8P+uPvafR9+45221mXx wfAcREDV7NcEFBOCXltrn1XobUOwhoVzRkO1yK2H1/h1K4SSKa6+ZCp1 lNNoCGTosUMvESx6MGmvfVOfLEw9fqmABLLR0LMlm/y6SaoI7KVmdNdj FxA=
      %

      Doing a "rndc refresh" on the downstream public BIND master does
      not change anything: it just logs

      Mar 6 18:28:57 ns named[182]: zone korunikhum.no/IN: Transfer started.
      Mar 6 18:28:57 ns named[182]: transfer of 'korunikhum.no/IN' from 158.38.0.175#53: connected using 158.38.130.5#51471
      Mar 6 18:28:57 ns named[182]: transfer of 'korunikhum.no/IN' from 158.38.0.175#53: Transfer status: up to date
      Mar 6 18:28:57 ns named[182]: transfer of 'korunikhum.no/IN' from 158.38.0.175#53: Transfer completed: 0 messages, 1 records, 0 bytes, 0.009 secs (0 bytes/sec)

      And querying the OpenDNSSEC server from the public master (using
      the TSIG key configured, of course) gives:

      korunikhum.no. 3600 IN SOA ns.uninett.no. elisabeth.uninett.no. 2016022600 28800 3600 604800 900

      and

      korunikhum.no. 86400 IN NS nn.uninett.no.
      korunikhum.no. 86400 IN NS ns.uninett.no.
      korunikhum.no. 86400 IN RRSIG NS 8 2 86400 20160318015502 20160226120504 31363 korunikhum.no. kd0J2sH6ksROVmfRigVlaGv7to1HtkGE24aI2Z2ihxEuy6Jqn0POXQ7W Xq7c1OpER+AGuVsfG9T0vALyn6f6s3kA41zeI+rlXLeb9M21hCqBM59E WKXDtFnHxut9ejtaSOMVM+ex2bSHHTz5DTqCmoqvGXAMazKBdBwzIy5Y AZI=

      So OpenDNSSEC responds with the updated signature on the NS set,
      but also with a newer SOA version number. However, come zone
      transfer time, precious few records are transferred, and neither
      the SOA nor the NS RRSIG records are updated.

      The previous signing of the zone was done on February 26, shortly
      after I upgraded from OpenDNSSEC 1.4.7+patches to 1.4.9 on this
      host. The zone has apparently not been re-signed thereafter.

      From testing with a couple of the other zones, explicitly asking
      for a re-signing with "ods-signer sign <zonename>" fixes the
      problem. However, this should not happen...

      This particular zone was among those where 1.4.9 complained:

      Feb 26 14:03:10 hugin ods-signerd: [zone] corrupted backup file zone korunikhum.no: read key error
      Feb 26 14:03:10 hugin ods-signerd: [engine] unable to recover zone korunikhum.no from backup, performing full sign

      but this problem looks different from the one I experienced
      earlier where the intra-day SOA seguence number was reset to 0,
      and the zone ended up with a lower SOA version number than it had
      published earlier. This problem seems to be a different one, in
      that the IXFR isn't doing its thing properly. Turning up the log
      verbosity to 6 gives the attached log.

      Hmm, the SOA version 2016022500 is from a signing that OpenDNSSEC
      1.4.7 did. Has 1.4.9 lost track of the state related to that
      signing, causing an incomplete IXFR?

      I took a packet trace of the IXFR transfer, and there's no trace
      of the NS RRSIG record in the trace.

      Is this just another fallout from OpenDNSSEC 1.4.9 rejecting the
      backup2 files written by OpenDNSSEC 1.4.7? And I need to
      manually once do "ods-signer sign <zone>" for all the zones which
      failed the reading of the .backup2 files due to a lack of
      backward compatibility in OpenDNSSEC 1.4.9?

      If so, the one who introduced the new rfc5011 flag in the header of
      the backup2 files should have thought of the consequences of not
      including any code to handle backwards compatibility in the function
      reading the backup2 file

      Oh, and By The Way: The list of "Affected Version/s" in Jira dearly needs
      updating, as it seems to indicate that neither 1.4.6, 1.4.7, 1.4.8* or 1.4.9
      have been released...

        Attachments

          Activity

            People

            Assignee:
            Unassigned
            Reporter:
            he Håvard Eidnes
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: