You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



Text Searching in Bugzilla

By Sean Richardson

Often a query including a well chosen text search pattern can result in a list of bugs that has both more relevant and fewer irrelevant bugs than would be listed if a simple substring was used. That translates directly to less time wasted repeating similar queries or looking through long bug lists.

With experience, and some background knowledge about how the various text search types available in Bugzilla work, you can make Bugzilla queries that are faster and better targeted. This guide is designed to provide you with enough background knowledge and real-life examples to be able use the other text matching types mozilla offers (especially regular expressions) -- and that's how you'll get the experience.

Search Types

Bugzilla provides six types of text searches within the Summary, the description and comments, and the status whiteboard. Often there is more than one way to do a text search, but each search type is best for some purposes:

  Good for matching single words (no variations) or exact phrases 
  Good for the rare searches where case must match exactly. 
  Good for requiring that two or more words (no variations) must appear. 
  Good for matching if any of a list of synonyms match; no variations. 
  Good for matching variations on a text pattern. 
  Good for excluding matches of a text pattern with variations. 

Substring searches match exactly within the text, but word boundaries do not matter. That's good if you want to match widely -- "color" will match any of "color", "bgcolor", "colored", and more -- but it is not so good if you want to match more specifically. As an example, using case-insensitive substring matching, "scroll" will match text including "scrollbar" or "Scroll Bar", but it will also match text including "autoscroll" or "scrolling".

Case-sensitive substring matching is the best choice when searching for an acronym whose letters can also be found in the middle of words, especially if you end up with many irrelevant bugs on the list otherwise. Be aware though, that very common acronyms ( e.g., html, css) may not be capitalized in bug summaries.

If you use the all words search type, each word you list must appear in the text to get a match. There can be no variations, but the search is case insensitive. For example, "cell border" will match text that contains both "cell" and "border", no matter what case, but if "cells" appears instead of "cell" or "bordering" instead of "border", no match will be found. If you find yourself scanning a long bug list for a second search word, the all words search type might work better.

The any words search type is good for synonyms. Again, only exact word-for-word matches are found, and the search is case-insensitive. So "center middle" will match any text containing any of "center", "Middle", "CENTER", etc., but not text containing "centered".

The regular expression search type is the most versatile. As an example, "[Ss]croll.?[Bb]ar" will match variations on "scrollbar" without matching other text that happens to have the substring "scroll" in it.

Finally, the not (regular expression) search type allows you to specify that matching bugs cannot contain certain text.

The focus in the rest of this guide is on regular expression (regex) text searching because they give the most versatility and precision, so, for those who aren't familiar, this guide includes a tutorial. If you are already familiar with regex searching, skip to the Real-life Examples section. It explains which search types work best for common text searching scenarios.

Regex Basics

To use regular expressions you create a pattern, which is a combination of literal characters, as in substring matching, and operators. The operators let you create constraints for the match that would be impossible with substring matching. This lets you create patterns that will match variations you want without widening the search to the point that you get many unuseful matches to get the matches you wanted.

Character and Position Matching

All alphanumeric characters, and some others, match themselves literally. Other characters and sequences of characters are operators, and do not match literally. To match a character used as an operator literally, precede it with "\". This applies to all characters, so if in doubt, use the backslash. For example, "\^" matches "^", and "\\", of course, matches "\".

 a  matches the single character "a"
 .  matches any character
 \.  matches only "." and no other character

The first set of operators match specific positions in the text:

 ^  matches the beginning of the text
 $  matches the end of the text
 [[:<:]]  matches the point between the last non-word character and the beginning of a word
 [[:>:]]  matches the point between the last character of a word and the following non-word character

The last two look odd but are very powerful. "[[:<:]]a" will match any "a" that is at the beginning of a word (defined as a contiguous sequence of alphabetic characters and underscore), but no other "a". Similarly, "z[[:>:]]" will match any "z" at the end of a word, but no other "z".
Examples:

 zilla  matches the "zilla" in any text containing it, including "Bugzilla" or "mozilla.org"
 ^bar  matches "bar" but not "foobar" or "button bar"
 foo$  matches "foo" or "bugzap foo" but not "foobar"
 .  matches any single character (and thus the first character of any text with at least one)
 .n  matches "an", "on", "3n", etc.
 [[:<:]]tip  matches "tip" but not "tooltip" (no alpha characters before "tip")
 tip[[:>:]]  matches "tip" but not "multiple" (no alpha characters after "tip")
 [[:<:]]tip[[:>:]]  matches "tip" but not "tooltip" or "multiple"

Character Classes

A sequence of ordinary characters placed inside square brackets will match a single character if any of them match. A character class can be negated by prepending "^" inside the beginning square bracket. A range can be specified using "-" between the beginning and end of the range.

 [  begins a character class
 ]  ends a character class
 ^  used inside square brackets, it negates the character class
 -  used inside square brackets, it separates the beginning and end of a range

Regex patterns are case sensitive, so one of the uses of character classes is to match either upper or lower case for a letter. Examples:

 [abc]  matches one of "a", "b", or "c"
 [abc][abc]  matches the "ac" in "action", but not any part of "auction"
 [Mm]ozilla  matches "Mozilla" or "mozilla"
 Windows [^N2]  matches "Windows 95" or "Windows 98" or "Windows ME", but not "Windows NT" or "Windows 2000"
 [x-zX-Z]  matches one of "x", "y", "z", "X", "Y" or "Z"

Alternation

A "|" between any sequence of matching characters and another is a logical OR: if either part matches, the regex has matched.

 |  matches nothing on its own, but lets the regex match if either the pattern before or after matches

Examples:

 slider|thumb  matches any string containing either "slider" or "thumb"
 DULL|dull  matches "DULL" or "dull", but not "Dull"
 DULL|dull|Dull  matches any of "DULL", "dull", or "Dull"

Repetition Operators

The repetition operators give regex patterns much of their flexibility, by letting you specify that some parts of the pattern may or may not appear, or can appear more than once, while other parts of the pattern must appear exactly as specified.

 ?  matches zero or one of the preceding character or sequence
 *  matches zero or more (any number) of the preceding character or sequence
 +  matches one or more of the preceding character or sequence

Examples:

 colou?r   matches "color" or "colour"
 [Ww]in ?32   matches "Win32", "Win 32", "win32", or "win 32"
 app.*s   matches "apps", "apples", "applications" , "application forms", etc. 
 cho+se   matches "chose" or "choose" (or "choooooose") 

Sequence Grouping

Any sequence of matching characters can be surrounded with parentheses to group it as a single unit. On its own this does little except force you to use "\(" to match "(", but combined with the repetition and alternation operators, and character classes, it can be very powerful.

 (  matches nothing on its own; delimits the beginning of a sequence
 )  matches nothing on its own; delimits the end of a sequence

Examples:

 win(dow)?  matches "win" or "window" but not "windscreen
 (MacOS|Mac OS) ?8\.[^0]  matches "MacOS8.1" or "Mac OS 8.5", but not "Mac OS 8.0 or "Mac OS 8" 
 (MSVC.*|msvc.*).(DLL|dll)  "matches "MSVCRT.DLL" or "msvc40rt.dll", among others 

One final note: regex patterns are tried in every position of the text until they match or there is no more text. That means that "abc" will match anywhere in the text, just as if it were ".*abc.*", so there is never any need to use ".*" at the beginning or end of a pattern.

Real-life Examples

Following are a number of types of text searches where substring matching on its own cannot give the best results. For each type, examples of possible searches are given exactly as they would be entered into the Bugzilla query page.

Synonyms

Often there are technical and non-technical names for the same UI elements. If you want to find bugs using any of them and no other limitations are needed, it is easiest to just use the Any Words search type. For example:

 Summary: 

Other times, the search needs to be more specific. For example, the scrollbar thumb is also knows as a slider. The example below will match either "scrollbar thumb" or "scrollbar slider":

 Summary: 

Another way to do that search would be to search in both the summary and the description, on the assumption that key words mentioned in the summary will also appear in the description. The example below will match either "scrollbar thumb" or "scrollbar slider", and also text like "the thumb of the scrollbar": :

 Summary: 
Description:

For a more complex example, consider searching for bugs about the middle mouse button, which some know by other names: "((center|centre|middle) (mouse|button))|button 2" will match text containing any of: "center mouse button", "button 2" "middle button", "centre mouse", among other variations:

 Summary: 

Compound words

Rather than making up to three substring or word searches for compound words, it is easier to do a regex search that covers all of the possibilities. "mouse.?over" will match "mouse over", "mouse-over", or "mouseover":

 Summary: 

Similar words

Suppose you were looking for bugs about mousing actions. You might use the following regex pattern:

 Summary: 

Some words, like "colour" or "grey" are spelled differently by English speakers outside the U.S., many of whom enter bug reports using their native spelling. Simple regex patterns will match the spelling variants:

 Summary: 
 
 Summary:  or
 Summary: 

Varied compound words

The pattern "(left|right).?click" will match "left-click", "right click", "rightclick" or "left-clicked", among other variations:

 Summary: 

Varied phrases

If a a query using a specific phrase results in a short bug list, try using a regex pattern that allows for other wordings -- particularly if you found variants of that phrase in the comments added to the bugs on the list.

As an example, the pattern "[Ff]ind .n .*[Pp]age" will match "Find in this Page", "find on page", or "Find On The Page", among other variations.

 Summary: 

The pattern "horiz.*scroll" will match "horizontal scrollbar", "horiz. scroll", or "horizontal scrolling", among other variations:

 Summary: 

Case-insensitive Searching

There is no case-insensitive regex matching in any Bugzilla implementation that uses MySQL as its database engine. This isn't as much of a limitation as it might seem, as there is almost always another approach that can be used.

Although the form does not say so, both the "all words" and "any words" search types are case-insensitive, so the following example will find all bugs with both "table" and "border" in the summary, no matter what case or what order they appear in:

 Summary: 

You can also do a case-insensitive search in the summary for the word that may be in upper, lower, or mixed case, and search in the description for the other search words, like so:

 Summary: 
 Description: 

Note that the previous example will find bugs with either "embed" or "nest" in the description, but will not match variations like "embedding" or "nesting". To match those as well, use a regex pattern instead:

 Summary: 
 Description: 

If all the search words must appear in the summary, you can use the boolean charts to match a case-insensitive substring, as shown in the next section.

Finally, if you want to be sure, and you need to use regex patterns, you can always use character classes for every character, like so:

 Summary: 

Using the Boolean Charts

By using the Boolean charts feature of the query page, you can AND together more than one regex pattern: useful when you can't predict what order two regex expressions might appear in.

For example, to find bugs about context menus on the scrollbar, not knowing whether the scrollbar will be mentioned first or not, specify one regex pattern in the Summary field and the other in the Boolean Chart:

 Summary: 

Text searching as part of a query

As powerful as regex pattern matching and boolean charts are, neither one is the only tool you'll use. Text searching will normally be done as part of a query that includes other constraints, and it is easier and more efficient to constrain your search using other fields when you can, rather than trying to specify multiple constraints using only text matching.

For example, rather than creating a regex pattern to match variations on Windows operating system names, select each of them from the Operating System field, using ctrl-click or cmd-click, to constrain your search to bugs that afflict Windows. Or, to use a Mozilla example, rather than matching "table" as text, it is easier to constrain the Component to "HTMLTables". Setting the Product or Version may also help.

If you remember seeing a bug before and you are searching for it again, you might be able to narrow down a date range for its creation, and reduce the bug list to only those entered in that period:

Where the field(s) changed. dates to
changed to value (optional)

If you know that you added a comment to a bug or receive mail about it for any reason, specifying your e-mail address and role can dramatically shorten the bug list:

Email:  matching as Assigned To
Reporter
QA Contact
(Will match any of the selected fields)   CC
Added comment

Limitations

No GNU regex extensions

Bugzilla uses the matching facilities provided by the underlying database engine. bugzilla.mozilla.org and most other installations use MySQL, which implements POSIX-compatible regex -- GNU extensions such as "\w" for word characters are not available. PERL-only extensions are also not available. Regex searching may work differently at Bugzilla installations that use another database engine.

Database-specific (MySQL)

The information on regex matching in this guide is specific to Bugzilla installations using MySQL; some of it may not apply to Bugzilla installations using other database engines.

Regex * is not a wildcard

"*" matches nothing on its own. If you are used to searching on Windows with patterns like "*.html", you need to know that in regex patterns, "*" matches zero or more occurrences of the preceding character. The equivalent regex pattern would be ".*\.html", although "\.html" will work just as well. (Using ".html" would not, because the "." would match any character, including a space character). Similarly, "inter*n" would not match "internationalization"; the "*" would match any number of "r" characters. The equivalent regex pattern would be "inter.*n".

Incomplete

Finally, this guide itself is limited; regex searching can get much more involved and powerful. Specifically, character classes like "[[:alnum:]]" can be very useful for some purposes. If you want to know more, consult the quick reference or a complete reference.

(Thanks to Terry Weissman for contributing to this document, and to Jon Robertson and Jesse Ruderman for reviewing it. Additional suggestions welcome.)