Care and Feeding of the DOM Serializer Tests
by Akkana Peck, akkzilla at shallowsky dot com.
Introduction
The DOM Serializers are the code which controls all output from Mozilla -- they're what translates html into plaintext when you send as plaintext from the html mail compose window, what produces the plaintext or html you see when you copy something in mozilla and paste it into another window, and what produces the html output from Composer.Since historically the output of the serializers have been very sensitive to changes, the mozilla build now includes a set of automated serializer tests which are run as part of the Tinderbox cycle. This document will describe how those tests work, how to add a new test, and what to do if something breaks one of the tests.
Overview of the tests
In a source tree, the serializer tests live in htmlparser/tests/outsinks. This may seem like a strange place given that the serializers themselves live in content/base/src. Indeed it is -- it's left over from when the serializers were part of the parser. When they got moved, the tests didn't move with them.The tests consist of several components:
- The test program, a C++ program called TestOutput (which is run from dist/bin just like the mozilla application itself). Most of the code implementing this is in Convert.cpp.
- Input samples, generally named *.html in htmlparser/tests/outsinks. During a build these are copied or linked to dist/bin/OutTestData.
- Output samples: generally named *.out in htmlparser/tests/outsinks, copied to dist/bin/OutTestData during a build. TestOutput takes each input sample, runs it through the appropriate serializer with appropriate flags, then compares the result against one of the output samples.
- A perl script, TestOutSinks.pl, which contains the master list of tests, input and output files and flags. This is the script which is run from Tinderbox and which should be run by hand after making changes that might affect the serializer code.
What do I do if I broke the tests and need to fix them, fast?
- TestOutSinks.pl will
print the name of the test that's failing in English -- for example,
"Mail quoting test failed." Read the TestOutSinks.pl script to find
out the command corresponding to that message (a command starting with
./TestOutput, for example, ./TestOutput -i
text/html -o text/plain -f 2 -w 50 -c OutTestData/mailquote.out
OutTestData/mailquote.html). Run that command by hand from dist/bin to see the output, which
should include the offset where the error occurred.
- Then run that command by hand, but omit the "-o
OutTestDate/filename" part to see the actual output. Perhaps
capture it in a file to make it easier to debug. Note the
filename you're omitting (mailquote.out
in this case): this is the "comparison file" and you'll need it for
comparison later.
- In your favorite text editor, view both the .out file and the
saved output from running the test, search forward the appropriate
number of characters from the beginning of the file (in emacs, that's ctrl-U <number> ctrl-f), then
compare the saved output with the .out file to see how they differ.
- If the change is actually correct behavior (it's behaving better now than before), then just modify the comparison file in htmlparser/tests/outsinks/*.out and check that in along with the patch (be sure the reviewer knows about that part of the change and that everybody agrees the change in behavior is appropriate). If it's an error and the old behavior was better, then your patch introduced a bug and should be fixed.
How do I add a new test?
Thanks for asking! There are two ways to add a new test: 1. Add new cases to an existing test, or 2. Add a n entirely new test.- If you are adding another case which relates to one of the existing tests -- for example, if you have been working on a specific behavior in mail quotation blocks and want to make sure that the current behavior does not regress, but it's not covered in the current mail quotation test -- then just add a block into the existing .html file, add a corresponding block to the appropriate .out file (the easiest way may be just to run the appropriate test and capture the output in the .out file), and check them both in. You're done!
- Adding an entirely new html->plaintext test is slightly more difficult (but still pretty easy). You will need to
- Make a new .html file.
- Add an appropriate ./TestOutput line to TestOutSinks.pl.
- Generate a new .out file.
- Add the new files (.html and .out) to the Makefile.in so they'll be installed into dist/bin/OutTestData.
- Check in the new .html, the new .out, and the modified TestOutSinks.pl and Makefile.in.
What if I want to test a different combination, say, html to html or xml to plaintext?
That used to be easy, but unfortunately a change in the serializer architecture has made it harder, and the TestOutput program no longer automatically supports any type besides html to plaintext. That doesn't mean it's impossible; it just means the system needs some work and no one has done it yet.Here's what needs to be done for this to work:
The routine HTML2text currently takes inType and outType parameters (they're mime types), but it returns an error if inType != text/html or outType != text/plain. That's because it next creates an output sink of type NS_PLAINTEXTSINK_CONTRACTID. Back when all serialization was done through creating parser sinks, that was no problem -- there was a contractid for the html content sink too. But that all changed when the sinks were turned into serializers, and now there's no contractid that specifically claims to be an html sink.
Is that a problem? Really all we need is to create a serializer object (serializers can act like parser sinks), so it would probably work to use any contractid that happened to result in an nsHTMLContentSerializer or nsXMLContentSerializer object . So it's possible that all that is needed is to find the appropriate contactid definitions, add them to the top of Convert.cpp, then use them inside HTML2text depending on what inType and outType are.