Working with the SeaMonkey Tree
- Base concepts
- Example of the process in action
- Tree state assurance
- Links to Mozilla's tools
Consider the following to be a high level description of the process for getting code into mozilla.org's SeaMonkey tree (i.e. doing work :-)). We've been refining this process for several years and believe it provides an efficient and safe model for software development. The model is entirely driven by peer understanding/pressure. This model distributes workload and exposure to process failure.
The primary goal is to have the greatest number of engineers productive at any given point in time. The primary thing to avoid in tree management is a tree that doesn't compile. If the tree has reached that state, the entire engineering organization cannot move forward (the most inefficient state). Jane checks out the code (aka "tree") and she can't compile. She either waits awhile and checks out the tree later, or spends precious time tracking down the compilation error that she didn't cause.
There are different ways to enforce these rules and different organizations can instill them in varying ways. In the Mozilla project this model is driven largely by implicit peer pressure . . . those that don't adhere to the model are ostracized.
Base concepts that must be agreed upon (by engineers, build team, and management (including upper management)):
- Breaking (run time, compile time, or link time) the tree is not ok. It costs lots of money (more than you can justify wasting) to have hundreds of engineers sitting idle waiting for a good tree to pull.
- Code will be backed out if it breaks the tree and the offending code isn't being rectified w/ a reasonable time frame set for resolution (usually less than 60 minutes). The point is, if you can't fix your problem quickly, you have a problem to figure out and your code shouldn't be in the tree anyway.
- Someone must always be watching and responsible the tree state. This is best done using a 24 hour rotating schedule so no one person is responsible for it all the time. We've called this person the "sheriff," and the sheriff schedule is laid out so someone's always in the driver's seat. See this link for Mozilla's sheriff schedule.
- Individuals who check-in code are also responsible for watching the
tree until tinderbox has cycled green. This ensures that the individuals code
didn't break something unforseen, and that the individual is around to deal
w/ it immediately.
- Builds are "verified" by appropriate people frequently so the tree is regularly in a known good state. Every weekday at 8am, the SeaMonkey build team automatically closes the tree and produces release builds which are "smoketested" (see this link for Mozilla's smoketests) for basic functionality. If there are any bugs blocking a major portion of the tests, that is considered a blocker bug and the tree will be held closed until the bug is fixed and the build is respun. It is the responsibility of the sheriff and the hook (see below for an understanding of "the sheriff" and "the hook") to fix the blocker bug. Once all blocker bugs have been fixed and QA has builds with which to do further testing, the hook is cleared, the tree is opened for checkins and development continues. If there are no blocker bugs, the tree is typically open by 11am.
- Code is not checked in without meeting a minimum set of pre-checkin tests (like firing up the application and shutting it down). This set has to be small enough for engineers not to waste too much time verifying, yet large enough not to cause serious regressions. See this link for Mozilla's pre-checkin tests.
- Everyone is equal in this process. The senior engineer has no more, or less, right to drag someone out on the carpet for breaking the build than the intern does. There is no "weight" to throw around. Code is either good or bad, it doesn't matter who wrote it.
- Peer pressure is a real and powerful force.
- There are no exceptions to the process.
- Because there is buy in from upper management, escalation to them for an exception gets you nowhere.
- The tree state needs to be known as frequently as possible. This translates to build verification every 24 hours (except for weekends and holidays in the United States when, in theory, tree changes are reduced anyway).
- Extended tree closure is a good way to do even more extensive testing. Mozilla accomplishes this by closing the tree for approximately one week every six weeks in preparation for releasing a milestone.
writing codeI'm Joe and I've written some code that I want to check into the SeaMonkey tree. I've been writing my code for a week, and I want to make sure that it still works w/ the current tree (which has changed (other people have checked in) since I last did a checkout).
If I work too long locally, my local tree will become too out of sync w/ what's in the repository, and I'll have to spend more time updating my code.
updating my local treeBefore I update my tree, I visit the tinderbox URL to see if the tree is green (see this link for Mozilla's SeaMonkey trunk tinderbox). If it is, then I can update my tree, otherwise I'll wait for green because checking out a red tree assures that my local tree will break and therefore I can't do any work. Once I've updated my tree, I re-build to make sure my changes haven't broken anything in the recent tree.
If something broke, I'll need to update my local tree so it's no longer breaking things. If I checked in in this broken state, everyone would break too which isn't cool.
Nothing broke so my tree is "up-to-date".
Before I can checkin I'm required to have others review my code for
contextual accuracy (does my code do the right thing?), and syntactical
correctness (is my code using style/syntax that the general project
is using). These are known as "review" and "super-review"
respectively. See this FAQ for more
information about mozilla.org's code review processes.)
I call up (or email) Jane who knows this area of the code and show her cvs diffs of my modifications. I called Jane because she's widely considered a "module owner" (someone who knows a specific code area). She looks them over using some reviewer guidelines (see this link for Mozilla's reviewer guidelines) and her own expertise. She notices that I could be iterating my array more efficiently, points that out to me, and I update my code, produce new diffs, and she goes over the new diffs to make sure my new iterator is ok.
Technically speaking, the reviewers of code should be held just as accountable as the person who wrote the code.
In almost all cases a super-review of my code is needed as well. (See
this link for an explanation of super-review.)
This is one of many areas where "peer" pressure comes into play. The fact that a peer is required to look at my code ensures that I'm going to do the best I can to produce good, working, code. Otherwise I feel embarrassed.
running the pre-checkin testsBefore I commit my code, I have to run a small set of tests (see this link for Mozilla's pre-checkin tests) to ensure that basic (defined by the project) functionality works. If I check-in code that prevents a URL from being loaded for example, even though tinderbox won't break (it's not a compile time error), the build is pretty much useless. Running the pre-checkin tests ensures some level of verification before I checked my stuff in.
If I want to be extra careful, I can run the actual smoketests that are used to verify a tree. If I do that, and I pass all of those tests, I'm an extra step ahead of the game because I know my stuff _really_ didn't break anything, and the odds of getting called in because I broke something go down even further.
checking in code (first attempt)I check tinderbox again to make sure that the state of the tree is open (no-one is allowed to checkin if the tree is closed). The tree is closed this time because someone checked in code that caused a compiler error.
checking in code (second attempt)I go get a cup of coffee, and when I come back, I see that the tree is open again (the bustage was resolved), so I checkin. My checkin comments indicate who did my code review(s), and describe, in fairly good detail, what my modifications do.
Being on the hook comes w/ responsibility. I'm required to watch tinderbox go through a green cycle on all of its builds before I can walk away from my machine. If a build fails, I have to check to see if it was me (tinderbox provides mechanisms to do this "checking"). Once everything cycles green, I can stop watching tinderbox, but I'm still on the hook (I may have introduced run-time bustage for example, and I need to be held accountable for that).
The "hook" is an important concept in that it is a list of people (automatically deducible by tinderbox) that can be emailed/contacted regarding a particular build cycle. If someone notices a runtime problem w/ a build, they can email the hook to contact everyone who had checked into the tree since the last build verification (the last time the hook was "cleared"). So, let's say I notice that a particular dialog isn't coming up that used to come up. I can email the hook asking "did anyone mess w/ code that could affect dialog throwing?" Because this is a communal effort, chances are someone will respond. If you were on the hook, and you didn't respond, you'll be dragged out on to the carpet for not paying attention and wasting other people's time (remember, this is the ultimate inefficiency). See this link for more information on the hook and how our bonsai tool can help you.
runtime bustageIt turns out that the code I wrote is causing runtime problems in some functional area that neither the review process nor tinderbox caught. George noticed this because he was working in this area after I checked in. George contacts the sheriff indicating that a regression has been introduced. The sheriff and George determine the the regression is severe enough that the tree be closed; so the sheriff closes it. Because the tree is closed there's a large number of people who are blocked from proceeding with their checkin, and some of those people scramble to help find the offending code. Generally the sheriff sends mail to the hook pointing out the regression and asking for folks to determine if it was their checkin that caused it. The offending code is found to be mine. My code cycled green so I walked away from my machine to get some dinner. Because I'm still on the hook, I'm ultimately responsible for my code, and if I'm not available (in this case I didn't bring my pager or cell phone with me), and no-one else wants to, or can, fix my problem, I can be backed out. Because this regression was serious enough, the sheriff indicates that my code should be backed out. The sheriff backs it out (the commands to do so are automatically generated by tinderbox), so he can re-open the tree and unblock people.
I come back from dinner and see that I've been backed out. I wasn't reasonably reachable (no-one was able to get a hold of me), so I accept that my code got yanked. I try my changes out in the scenario where the regression surfaced, notice the problem, fix it, and start the review/checkin process over again.
- Developers only checkin code that has been verified to "do the right thing" (the review process), compile, and not break basic functionality (pre-checkin tests). This level is continual and done at the individual level, to test an individual's code.
- Every 24 hours the tree closes to all checkins while the builds are "verified." This provides an even more robust test cycle guarantees the trees state across many individual's checkins.
- This generally occurs every six weeks, and QA hammers on these builds. This
provides an even more robust test cycle across weeks of development.
- The browser/mail/news application suite that forms the basis of the Mozilla 1.0 release.
- cvs's representation of a set of code is referred to as a tree. Mozilla.org maintains a tree that encompasses all the code for the SeaMonkey application suite. Not surprisingly, this is known as the SeaMonkey tree.
- A broken tree is one that doesn't compile or link. I can break my local tree by introducing a compile or link time error. If I check that error into cvs, anyone else checking out a tree will also get a broken tree. Broken trees are bad :-).
- Committing or checking-in code writes it into the cvs repository.
- Software (largely a bunch of perl scripts) that continuously checks out a tree and builds it. Tinderbox provides graphical representation of the state of the tree by doing builds. See this link for mozilla's SeaMonkey trunk tinderbox. There are generally three states that tinderbox can be in: red, representing a broken tree; green, indicating that everything is building/linking/testing fine; and orange, indicating that automated tests run after the build successfully completes are failing. Tinderboxes (or "tinderboxen") are always maintained for the SeaMonkey trunk. Additional tinderboxen may be added when a branch is under active development. For example, see this link to the Mozilla 1.0 branch tinderboxen.
- The tree is built and verified for regressions and bugs. This is usually done by a combination of QA and the build team. The tree is in a closed state during verification so the verifiers are sure that nothing's changing out from underneath them. Code checked in to a closed tree gets backed out.
- No one is allowed to commit code to a closed tree. A tree is generally closed while a build is being verified so the tree is not changed until we know that the tree is in a "good" state. The tree is also closed by the sheriff when the tree state becomes unknown.
- The tree is open for checkins. Code for which the pre-checkin process has been completed may be checked into the tree.
- During periods of increased tree control (sometimes referred to as "lock down" periods) only specifically approved checkins are permitted into the tree. This generally happens before a milestone release. At such times, a group (such as "email@example.com") or an individual may be designated as the approval granting entity.
- Absolutely no code (no exceptions regardless of title, skill level, etc.) is checked into the tree without it being reviewed by someone qualified to do a review. See this link for Mozilla's review guidelines (different organizations will build different guidelines accordingly).
- A tree is never unattended and the sheriff is the attendant. A tree w/ out a sheriff is a broken tree. The sheriff has the power to close the tree if things go awry (the tree goes "red" for example). The sheriff also has the power to have code backed out (removed) from the tree. It is also understood that the sheriff can call you at 2am on a Sunday morning if you were deemed to be the person that broke the tree. Again, the sheriff can be a single person, or can rotate shifts across members of various groups (as occurs with mozilla.org and Netscape).
- A set of people who have checked in code to the tree since the last time the hook was cleared. The SeaMonkey build is verified daily, and after the build has been verified, the hook is cleared, meaning that everyone that had checked in since the last verification is "off the hook" for this build. The hook is responsible for getting the tree back to the state it was in at the time of the previous verification.
LXR - http://lxr.mozilla.org/seamonkey/
Bonsai - http://bonsai.mozilla.org/