You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



Working with the SeaMonkey Tree

Contents


Consider the following to be a high level description of the process for getting code into mozilla.org's SeaMonkey tree (i.e. doing work :-)). We've been refining this process for several years and believe it provides an efficient and safe model for software development. The model is entirely driven by peer understanding/pressure. This model distributes workload and exposure to process failure.

The primary goal is to have the greatest number of engineers productive at any given point in time. The primary thing to avoid in tree management is a tree that doesn't compile. If the tree has reached that state, the entire engineering organization cannot move forward (the most inefficient state). Jane checks out the code (aka "tree") and she can't compile. She either waits awhile and checks out the tree later, or spends precious time tracking down the compilation error that she didn't cause.

There are different ways to enforce these rules and different organizations can instill them in varying ways. In the Mozilla project this model is driven largely by implicit peer pressure . . . those that don't adhere to the model are ostracized.

Base concepts that must be agreed upon (by engineers, build team, and management (including upper management)):
  • Breaking (run time, compile time, or link time) the tree is not ok. It costs lots of money (more than you can justify wasting) to have hundreds of engineers sitting idle waiting for a good tree to pull.
  • Code will be backed out if it breaks the tree and the offending code isn't being rectified w/ a reasonable time frame set for resolution (usually less than 60 minutes). The point is, if you can't fix your problem quickly, you have a problem to figure out and your code shouldn't be in the tree anyway.
  • Someone must always be watching and responsible the tree state. This is best done using a 24 hour rotating schedule so no one person is responsible for it all the time. We've called this person the "sheriff," and the sheriff schedule is laid out so someone's always in the driver's seat. See this link for Mozilla's sheriff schedule.
  • Individuals who check-in code are also responsible for watching the tree until tinderbox has cycled green. This ensures that the individuals code didn't break something unforseen, and that the individual is around to deal w/ it immediately.
  • Builds are "verified" by appropriate people frequently so the tree is regularly in a known good state. Every weekday at 8am, the SeaMonkey build team automatically closes the tree and produces release builds which are "smoketested" (see this link for Mozilla's smoketests) for basic functionality. If there are any bugs blocking a major portion of the tests, that is considered a blocker bug and the tree will be held closed until the bug is fixed and the build is respun. It is the responsibility of the sheriff and the hook (see below for an understanding of "the sheriff" and "the hook") to fix the blocker bug. Once all blocker bugs have been fixed and QA has builds with which to do further testing, the hook is cleared, the tree is opened for checkins and development continues. If there are no blocker bugs, the tree is typically open by 11am.
  • Code is not checked in without meeting a minimum set of pre-checkin tests (like firing up the application and shutting it down). This set has to be small enough for engineers not to waste too much time verifying, yet large enough not to cause serious regressions. See this link for Mozilla's pre-checkin tests.
  • Everyone is equal in this process. The senior engineer has no more, or less, right to drag someone out on the carpet for breaking the build than the intern does. There is no "weight" to throw around. Code is either good or bad, it doesn't matter who wrote it.
  • Code can only be checked in if it's been reviewed. Again, everyone's equal here. The inventor of Javascript gets backed out if he didn't go through the appropriate review process; no joke. See this link for Mozilla's review process.
  • Peer pressure is a real and powerful force.
  • There are no exceptions to the process.
  • Because there is buy in from upper management, escalation to them for an exception gets you nowhere.
  • The tree state needs to be known as frequently as possible. This translates to build verification every 24 hours (except for weekends and holidays in the United States when, in theory, tree changes are reduced anyway).
  • Extended tree closure is a good way to do even more extensive testing. Mozilla accomplishes this by closing the tree for approximately one week every six weeks in preparation for releasing a milestone.

Example

Here's an example of getting code into the tree.

writing code

I'm Joe and I've written some code that I want to check into the SeaMonkey tree. I've been writing my code for a week, and I want to make sure that it still works w/ the current tree (which has changed (other people have checked in) since I last did a checkout).

If I work too long locally, my local tree will become too out of sync w/ what's in the repository, and I'll have to spend more time updating my code.

updating my local tree

Before I update my tree, I visit the tinderbox URL to see if the tree is green (see this link for Mozilla's SeaMonkey trunk tinderbox). If it is, then I can update my tree, otherwise I'll wait for green because checking out a red tree assures that my local tree will break and therefore I can't do any work. Once I've updated my tree, I re-build to make sure my changes haven't broken anything in the recent tree.

If something broke, I'll need to update my local tree so it's no longer breaking things. If I checked in in this broken state, everyone would break too which isn't cool.

Nothing broke so my tree is "up-to-date".

getting reviews

Before I can checkin I'm required to have others review my code for contextual accuracy (does my code do the right thing?), and syntactical correctness (is my code using style/syntax that the general project is using). These are known as "review" and "super-review" respectively. See this FAQ for more information about mozilla.org's code review processes.)

I call up (or email) Jane who knows this area of the code and show her cvs diffs of my modifications. I called Jane because she's widely considered a "module owner" (someone who knows a specific code area). She looks them over using some reviewer guidelines (see this link for Mozilla's reviewer guidelines) and her own expertise. She notices that I could be iterating my array more efficiently, points that out to me, and I update my code, produce new diffs, and she goes over the new diffs to make sure my new iterator is ok.

Technically speaking, the reviewers of code should be held just as accountable as the person who wrote the code.

In almost all cases a super-review of my code is needed as well. (See this link for an explanation of super-review.)

This is one of many areas where "peer" pressure comes into play. The fact that a peer is required to look at my code ensures that I'm going to do the best I can to produce good, working, code. Otherwise I feel embarrassed.

running the pre-checkin tests

Before I commit my code, I have to run a small set of tests (see this link for Mozilla's pre-checkin tests) to ensure that basic (defined by the project) functionality works. If I check-in code that prevents a URL from being loaded for example, even though tinderbox won't break (it's not a compile time error), the build is pretty much useless. Running the pre-checkin tests ensures some level of verification before I checked my stuff in.

If I want to be extra careful, I can run the actual smoketests that are used to verify a tree. If I do that, and I pass all of those tests, I'm an extra step ahead of the game because I know my stuff _really_ didn't break anything, and the odds of getting called in because I broke something go down even further.

checking in code (first attempt)

I check tinderbox again to make sure that the state of the tree is open (no-one is allowed to checkin if the tree is closed). The tree is closed this time because someone checked in code that caused a compiler error.

sheriff

The "sheriff" "closed" the tree so that everyone knows not to checkin until things are fixed (either the offending code is backed out of the tree, or the issues is quickly resolved). The sheriff is responsible for closing the tree anytime it is in an unknown or broken state. This is necessary so that even more uncertainty/bustage is not checked in, resulting in an even further degraded tree. You can imagine what happens if multiple people check in code that doesn't compile. A nasty spiraling affect occurs that can take a long time to sort out.

checking in code (second attempt)

I go get a cup of coffee, and when I come back, I see that the tree is open again (the bustage was resolved), so I checkin. My checkin comments indicate who did my code review(s), and describe, in fairly good detail, what my modifications do.

the hook

Now that my code has been committed to the tree I am "on the hook" to make sure that I didn't break anything. The sheriff may call on me w/ questions until I am "off the hook." The hook is cleared after the build team has smoketested the tree (done daily in the mornings) to ensure the tree is back to a known good state. The cycle is daily, so if I checkin code again, I'll be on the hook again, and so on and so on.

Being on the hook comes w/ responsibility. I'm required to watch tinderbox go through a green cycle on all of its builds before I can walk away from my machine. If a build fails, I have to check to see if it was me (tinderbox provides mechanisms to do this "checking"). Once everything cycles green, I can stop watching tinderbox, but I'm still on the hook (I may have introduced run-time bustage for example, and I need to be held accountable for that).

The "hook" is an important concept in that it is a list of people (automatically deducible by tinderbox) that can be emailed/contacted regarding a particular build cycle. If someone notices a runtime problem w/ a build, they can email the hook to contact everyone who had checked into the tree since the last build verification (the last time the hook was "cleared"). So, let's say I notice that a particular dialog isn't coming up that used to come up. I can email the hook asking "did anyone mess w/ code that could affect dialog throwing?" Because this is a communal effort, chances are someone will respond. If you were on the hook, and you didn't respond, you'll be dragged out on to the carpet for not paying attention and wasting other people's time (remember, this is the ultimate inefficiency). See this link for more information on the hook and how our bonsai tool can help you.

runtime bustage

It turns out that the code I wrote is causing runtime problems in some functional area that neither the review process nor tinderbox caught. George noticed this because he was working in this area after I checked in. George contacts the sheriff indicating that a regression has been introduced. The sheriff and George determine the the regression is severe enough that the tree be closed; so the sheriff closes it. Because the tree is closed there's a large number of people who are blocked from proceeding with their checkin, and some of those people scramble to help find the offending code. Generally the sheriff sends mail to the hook pointing out the regression and asking for folks to determine if it was their checkin that caused it. The offending code is found to be mine. My code cycled green so I walked away from my machine to get some dinner. Because I'm still on the hook, I'm ultimately responsible for my code, and if I'm not available (in this case I didn't bring my pager or cell phone with me), and no-one else wants to, or can, fix my problem, I can be backed out. Because this regression was serious enough, the sheriff indicates that my code should be backed out. The sheriff backs it out (the commands to do so are automatically generated by tinderbox), so he can re-open the tree and unblock people.

I come back from dinner and see that I've been backed out. I wasn't reasonably reachable (no-one was able to get a hold of me), so I accept that my code got yanked. I try my changes out in the scenario where the regression surfaced, notice the problem, fix it, and start the review/checkin process over again.

Tree state assurance

By now you can probably see how tree state is determined at various levels.

Developer
  • Developers only checkin code that has been verified to "do the right thing" (the review process), compile, and not break basic functionality (pre-checkin tests). This level is continual and done at the individual level, to test an individual's code.
Verification
  • Every 24 hours the tree closes to all checkins while the builds are "verified." This provides an even more robust test cycle guarantees the trees state across many individual's checkins.
Extended (milestone level) tree closure
  • This generally occurs every six weeks, and QA hammers on these builds. This provides an even more robust test cycle across weeks of development.

Terminology

SeaMonkey

  • The browser/mail/news application suite that forms the basis of the Mozilla 1.0 release.

Tree

  • cvs's representation of a set of code is referred to as a tree. Mozilla.org maintains a tree that encompasses all the code for the SeaMonkey application suite. Not surprisingly, this is known as the SeaMonkey tree.
Broken Tree (also known as Burning Tree, Red Tree, Busted Tree, Flaming Tree or "In Flames")
  • A broken tree is one that doesn't compile or link. I can break my local tree by introducing a compile or link time error. If I check that error into cvs, anyone else checking out a tree will also get a broken tree. Broken trees are bad :-).
Commit or check-in
  • Committing or checking-in code writes it into the cvs repository.
tinderbox
  • Software (largely a bunch of perl scripts) that continuously checks out a tree and builds it. Tinderbox provides graphical representation of the state of the tree by doing builds. See this link for mozilla's SeaMonkey trunk tinderbox. There are generally three states that tinderbox can be in: red, representing a broken tree; green, indicating that everything is building/linking/testing fine; and orange, indicating that automated tests run after the build successfully completes are failing. Tinderboxes (or "tinderboxen") are always maintained for the SeaMonkey trunk. Additional tinderboxen may be added when a branch is under active development. For example, see this link to the Mozilla 1.0 branch tinderboxen.
Tree verification
  • The tree is built and verified for regressions and bugs. This is usually done by a combination of QA and the build team. The tree is in a closed state during verification so the verifiers are sure that nothing's changing out from underneath them. Code checked in to a closed tree gets backed out.
Closed tree
  • No one is allowed to commit code to a closed tree. A tree is generally closed while a build is being verified so the tree is not changed until we know that the tree is in a "good" state. The tree is also closed by the sheriff when the tree state becomes unknown.
Open tree
  • The tree is open for checkins. Code for which the pre-checkin process has been completed may be checked into the tree.
Checkin approval
  • During periods of increased tree control (sometimes referred to as "lock down" periods) only specifically approved checkins are permitted into the tree. This generally happens before a milestone release. At such times, a group (such as "drivers@mozilla.org") or an individual may be designated as the approval granting entity.
Checkin review
  • Absolutely no code (no exceptions regardless of title, skill level, etc.) is checked into the tree without it being reviewed by someone qualified to do a review. See this link for Mozilla's review guidelines (different organizations will build different guidelines accordingly).
Sheriff
  • A tree is never unattended and the sheriff is the attendant. A tree w/ out a sheriff is a broken tree. The sheriff has the power to close the tree if things go awry (the tree goes "red" for example). The sheriff also has the power to have code backed out (removed) from the tree. It is also understood that the sheriff can call you at 2am on a Sunday morning if you were deemed to be the person that broke the tree. Again, the sheriff can be a single person, or can rotate shifts across members of various groups (as occurs with mozilla.org and Netscape).
The hook
  • A set of people who have checked in code to the tree since the last time the hook was cleared. The SeaMonkey build is verified daily, and after the build has been verified, the hook is cleared, meaning that everyone that had checked in since the last verification is "off the hook" for this build. The hook is responsible for getting the tree back to the state it was in at the time of the previous verification.

Mozilla's tools

SeaMonkey trunk tinderbox - http://tinderbox.mozilla.org/showbuilds.cgi?tree=SeaMonkey&hours=12&nocrap=1
LXR - http://lxr.mozilla.org/seamonkey/
Bonsai - http://bonsai.mozilla.org/