-
Type:
Bug
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: OpenDNSSEC 1.4.7
-
Fix Version/s: None
-
Component/s: Signer
-
Labels:None
-
Environment:
NetBSD/amd64 6.1_STABLE
I'm using OpenDNSSEC in the "zone transfer in, zone transfer
out" mode of operation. Judging from the lack of response to
my mailing list query, it doesn't look like there are many others
which do...
When restarting OpenDNSSEC, all too often it occurs that the
signer complains bitterly about corrupted ixfr files. The code
right after detecting this error unlinks the file and proceeds.
I've created a patch which instead renames the supposedly
corrupt file, so that it can be inspected instead of discarded,
with the aim of getting rid of the supposed corruption, either by
getting a fix to the "writer" or the "reader" part for these files,
because the file contents appears to be "undamaged" from any
external events.
The typical log messages are now:
Oct 13 10:55:55 hugin ods-signerd: [zone] corrupted journal file zone 2.39.128.in-addr.arpa, skipping (General error)
Oct 13 10:55:55 hugin ods-signerd: [zone] corrupted journal for zone 2.39.128.in-addr.arpa saved as 2.39.128.in-addr.arpa.ixfr-bad
Oct 13 10:55:55 hugin ods-signerd: [backup] bad ixfr journal: trailing RRs after final SOA
The ixfr cache files I've found all contain a number of SOA records,
and if I read the code correctly, that's the way incremental changes
are represented. So the file format of the ixfr journal file isn't supposed to be the same as an AXFR, with one SOA at the front and one at the end. However, I'm also not quite able to decipher
from the code what the actual format of the ixfr journal files are supposed to be.
So the first gripe is that the error message being logged is probably misleading – it can direct the operator to think that the file is supposed to use the AXFR format with only two SOAs, and no RRs after the final SOA.
I attach below the patch I have to save the supposedly-bad ixfr files, and a copy of the ixfr file corresponding to the log message above.
The supposedly-bad ixfr file has no less than 8 SOA records, and I'm not able to decipher what's wrong with it.
This is part of my push towards the goal of "OpenDNSSEC should be restartable and start up and run without scary-as-hell error messages, without first removing the temporary files it itself has written and left behind", and if error messages can be made less misleading and/or more informative to the operator in the process, that would be a nice bonus. Getting rid of what appears to be
"self-inflicted" errors (which this appears to be) should also be a goal.
Regards,
Håvard