****** Generic Genome Browser: A Tutorial ******
***** Author: Lincoln Stein, 11 March 2008 *****
This is an extensive tutorial to take you through the main features and gotchas
of GBrowse. This tutorial assumes that you have successfully setup Perl, GD,
BioPerl and the other GBrowse dependencies. During most of the tutorial, we
will be using the "in-memory" GBrowse database (no relational database
required!) Later we will show how to set up a genome size database using the
berkeleydb and MySQL adaptors. For a tutorial that uses the older Bio::DB::GFF
adaptor, see Using_GBrowse_with_Bio::DB::GFF.
***** Table of Contents *****
   1. The_Basics
         1. The_Data_File
         2. Defining_Tracks
         3. Adding_Descriptions_to_a_Feature
         4. Adjusting_GBrowse_Name_Searches
         5. Linking
         6. Adding_Popup_Balloons_to_Tracks
               1. Customizing_Balloons
   2. Displaying_Common_Types_of_Features
         1. Multi-segmented_features
         2. Protein-Coding_Genes
               1. Simpler_Genes
         3. Reading_Frames
         4. Grouped_Features
         5. Quantitative_Data_(basic)
         6. Quantitative_Data_(advanced)
         7. DNA_and_3-frame_translations
         8. ESTs_and_Other_Alignments
               1. Adding_DNA_to_Alignments
         9. Trace_Data
   3. GBrowse_Enhancements
         1. Adding_a_"Region"_Panel
         2. Putting_Features_into_the_Overview_&_Regionview
         3. Semantic_Zooming
         4. Grouping_Tracks
         5. Grouping_Tracks_into_a_Table
         6. Using_Plugins
   4. Adding_Features_from_External_Sources
         1. Uploading_an_Annotation_File
         2. Sharing_an_Annotation_File
         3. Using_GBrowse_as_a_DAS_Server_or_Client
               1. Combining_Databases_with_DAS
               2. Exporting_DAS_Tracks_to_Ensembl_and_other_Genome_Browsers
                  Running_GBrowse_off_DAS_Entirely
   5. Using_Other_Backends
         1. The_Berkeleydb_Backend
               1. The_bp_seqfeature_load.pl_script
         2. The_MySQL_Backend
         3. Other_Backends
   6. Conclusion
***** 1. The Basics *****
We will be working with simulated Volvox genome annotation data. The database
will be named "volvox" and GBrowse will be invoked with this URL:
     http://localhost/cgi-bin/gbrowse/volvox
These directories contain data files used during the tutorial:
  data_files
      DNA and features files to load into the local database.
  conf_files
      GBrowse configuration files for you to take and modify.
To introduce you to the system we will be using a file-based database which
allows GBrowse to run directly off text files. To prepare this database for
use, find the GBrowse databases directory which was created in your Apache web
server directory at the time of installation. It should be located at /var/www/html/
gbrowse/databases, but check to make sure.
Similarly, check that you can find the gbrowse.conf configuration directory. It
should be located at /etc/httpd/conf/gbrowse.conf and contain the configuration file
"yeast_chr1.conf."
Now you will change the permissions of the database and configuration
directories so that you can write to them without root privileges. This is only
an issue on Unix systems, and Windows users can safely ignore this step.
     % su
     Password: *********
     # chown my_user_name /var/www/html/gbrowse/databases
     # chown my_user_name /etc/httpd/conf/gbrowse.conf
     # exit
     %
(Be sure to replace "my_user_name" with your login name!)
Now look around inside the databases directory. There should be a single
subdirectory named "yeast_chr1." The yeast subdirectory is where the example
yeast chromosome 1 data set is stored.
You will create an empty volvox subdirectory, and make it world writable. On
Unix systems:
     % cd /var/www/html/gbrowse/databases
     % mkdir volvox
     % chmod go+rwx volvox
     NOTE: The "%" sign in these examples is the command-line prompt. On
     Windows systems, the command-line prompt is something like C:\Program
     Files\Apache Group\Apache2\htdocs\gbrowse\databases>. Unix systems
     are more variable, but the prompt usually ends with a "%" or a "#".
     In all the examples in this tutorial, what you type is rendered in
     boldface, while prompts and command-line results are shown in medium
     typeface.
On Windows systems, use the file manager ("Explorer") to create a new folder
named "volvox." If you are using Windows NT, 2000 or XP, right click on the new
folder and grant write privileges to all.
You'll now put the first of several data files into the volvox database
directory. In the data_files subdirectory of this tutorial you will find the
file volvox1.gff3. Copy this into the volvox database directory. On Unix
systems:
     % cd /var/www/html/gbrowse
     % cp tutorial/data_files/volvox1.gff3 databases/volvox
On Windows systems, use Explorer to copy the file into the volvox database
directory.
Now we will need a GBrowse config file to tell GBrowse how to render this data
set. In the subdirectory conf_files, you will find a sample configuration file
named volvox.conf. Copy this into your GBrowse configuration directory (/etc/httpd/conf/
gbrowse.conf).
You should now be able to view the data set. Point your web browser at http://
localhost/cgi-bin/gbrowse/volvox and type in "ctgA" in the search box. The
result is shown in Figure 1.
     [figures/basics1.gif]
     Figure 1: volvox1.gff3 data with volvox.conf config file.
**** If You are Having Problems... ****
If for some reason you get a blank page or an "Internal server error," there
are a couple of things to check. First, open the file volvox.conf with a text
editor ("Notepad" on Windows systems, emacs, pico or vi on Unix systems) and
confirm that the path to the volvox database directory in this section is
correct:
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor memory
		-dir     '/var/www/html/gbrowse/databases/volvox'
If there is a space in "/var/www/html" then you must be certain to put single quotes
around the path as shown in the example above.
Next check that the volvox1.gff file does exist inside the volvox database
directory and that it is readable by all users on your system. Similarly, check
that the volvox.conf configuration file is in the same directory as
yeast_chr1.conf, and that it is readable by all users on your system.
Microsoft Windows has an unpleasant tendency to add a ".txt" extension to files
without warning. If something seems to be wrong with the config or GFF file and
you can't figure out what, check that the file extension hasn't been modified.
To avoid this phenomenon, I suggest that you select "All File Types" from the
popup menu in the File Save dialog. You might also want to configure your
Folder display to show known file extensions.
If you're still having no luck, check the bottom of the Apache server error log
for error messages. This file is located in various places depending on how
Apache is installed. Look for the file error_log, typically located in /usr/
local/apache/logs, C:\Program Files\Apache Group\Apache2\logs, /var/log/www, or
/var/log/httpd. The error message will usually point you in the right
direction.
     If this doesn't fix the problem, please stop the tutorial and send an
     e-mail to GBrowse support at gmod-gbrowse@lists.sourceforge.net.
     Someone will be happy to assist you.
**** 1.1 The Data File ****
Let's look at the data file we loaded in detail now. If you open the
volvox1.gff3 file in a text editor, you will see that it contains a series of
15 genome "features" that look like this:
     ctgA example contig 1     50000 . . . Name=ctgA
     ctgA example remark 1659  1984  . + . Name=f07;Note=This is an
     example
     ctgA example remark 3014  6130  . + . Name=f06;Note=This is another
     example
     ctgA example remark 4715  5968  . - . Name=f05;Note=Ok! Ok! I get the
     message.
     ctgA example remark 13280 16394 . + . Name=f08
     ...
Each feature has a "source" of "example", a type of "remark", and occupies a
short range (roughly 1.5k) on a contig named "ctgA." In addition to the
features themselves, there is an entry for the contig itself (type "contig").
This entry is needed to tell GBrowse what the length of ctgA is.
The load file uses a standard known as GFF3_(General_Feature_Format_version_3).
Each line of the file corresponds to a feature on the genome, and the nine
columns are separated by tabs.
The 9 columns are as follows:
   1. reference sequence
      This is the name of the feature that will be used to establish the
      coordinate system for the annotation. This is usually the name of a
      chromosome, a clone, or a contig. In our example, the reference sequence
      is "ctgA". A single GFF file can refer to multiple reference sequences.
   2.
   3. source
      The source of the annotation. This field describes how the feature was
      derived. In the example, the source is "example" for want of a better
      description. Many people find the source as a way of distinguishing
      between similar features that were derived by different methods, for
      example, gene calls derived from different prediction software. You can
      leave this column blank by replacing the source with a single dot (".").
   4.
   5. type
      This column describes the feature type. Although, you can choose anything
      you like to describe the feature type, you are strongly encouraged to use
      well-recognized sequence ontology (SO) terms such as "gene",
      "repeat_region", "exon", and "CDS." You can find a list of the recognized
      SO terms at the_Sequence_Ontology_Project_web_site. For lack of a better
      name, the features in the volvox example are of type "remark." Another
   6.
   7. start position
      The position that the feature starts at, relative to the reference
      sequence. The first base of the reference sequence is position 1.
   8.
   9. end position
      The end of the feature, again relative to the reference sequence. End is
      always greater than or equal to start.
  10.
  11. score
      For features that have a numeric score, such as sequence similarities,
      this field holds the score. Score units are arbitrary, but most people
      use the expectation value for similarity features. You can leave it blank
      by replacing the column with a dot.
  12.
  13. strand
      For features that are strand-specific, this field is the strand on which
      the annotation resides. It is "+" for the forward strand, "-" for the
      reverse strand, or "." for annotations that are not stranded. If you are
      unsure of whether a feature is stranded, it won't hurt to use a "+" here.
  14.
  15. phase
      For CDS features that encode proteins, this field describes where the
      next codon starts. The phase is one of the integers 0, 1, or 2,
      indicating the number of bases that should be removed from the beginning
      of this feature in order to reach the first base of the next codon. In
      other words, a phase of "0" indicates that the next codon begins at the
      first base of the region described by the current line, a phase of "1"
      indicates that the next codon begins at the second base of this region,
      and a phase of "2" indicates that the next codon begins at the third base
      of this region. This information is used by the "cds" glyph to show how
      the reading frame changes across splice sites. For all other feature
      types, use a dot here.
  16.
  17. attributes
      A list of feature attributes in the format tag=value. Multiple tag=value
      pairs are separated by semicolons. URL escaping rules are used for tags
      or values containing the following characters: ",=;". Spaces are allowed
      in this field, but tabs must be replaced with the %09 URL escape.

      These tags have predefined meanings:
        ID
            Gives the feature a unique identifier. Useful when grouping
            features together (such as all the exons in a transcript).
        Name
            Display name for the feature. This is the name to be displayed to
            the user.
        Alias
            A secondary name for the feature. It is suggested that this tag be
            used whenever a secondary identifier for the feature is needed,
            such as locus names and accession numbers.
        Note
            A descriptive note to be attached to the feature. This will be
            displayed as the feature's description.
      Alias and Note fields can have multiple values separated by commas. For
      example:
           Alias=M19211,gna-12,GAMMA-GLOBULIN
      Other good stuff can go into the attributes field, as we shall see later.
It is very important to have a full-length entry (such as the one for ctgA) for
each reference sequence mentioned in the first column of the GFF3 file.
However, the reference sequence can have any source and type you choose.
Commonly used types are "clone", "chromosome" and "contig."
**** 1.2. Defining Tracks ****
Now we'll look at the configuration file in more detail. Using a text editor,
open the volvox.conf file from its location in the gbrowse.conf configuraton
directory. (If you mess up, you can always copy a fresh version from
volvox.conf in the tutorial directory).
Ignore all the stuff in the top 90% of the file, and focus on the last little
bit, which starts with the line: ### TRACK CONFIGURATION ###:
     [ExampleFeatures]
     feature      = remark
     glyph        = generic
     stranded     = 1
     bgcolor      = blue
     height       = 10
     key          = Example Features
This is a "stanza" that describes one of the tracks displayed by GBrowse. The
track has an internal name of "ExampleFeatures" which you can use in the URL to
turn the track on. The internal name is enclosed by square brackets.
Following the track name are a series of options that configure the track. The
"feature" option indicates what feature type(s) to display inside the track.
It's currently set to display the "remark" feature type. The "glyph" option
specifies the shape of the rendered feature. The default is "generic", which is
a simple filled box, but there are dozens of glyphs to choose from. The
"stranded" option tells the generic glyph to try to display the strandedness of
the feature -- this is what creates the little arrow at the end of the box.
"bgcolor" and "height" control the background color and height of the glyph
respectively, and "key" assigns the track a human-readable label.
Let's experiment with changing the track definition. First, let's change the
color of the glyph. With your text editor, change the bgcolor option from blue
to "orange", save it, and reload the page. The features should change
immediately as shown in Figure 2
     [figures/basic_conf1.gif]
     Figure 2: A Feature of a Different Color
Note: Many of the screenshots in this tutorial are from earlier versions of
GBrowse and may not look exactly the same as the current version.
Please experiment with other changes! Try changing the height to 5, the key to
"Skinny features" and the stranded option to 0 (which means "false"). Just by
changing a few options, you can create a very distinctive track.
Now let's try changing the glyph. One of the standard glyphs was designed to
show PCR primer pairs and is called "primers". Change "glyph = generic" to
"glyph = primers" and reload the page. Depending on other changes that you
might have made earlier, the result will look something like Figure 3.
     [figures/basic_conf2.gif]
     Figure 3: Using the primers Glyph
We'll see other examples of glyphs later on. To get a list of the most popular
glyphs and the options that are available for them, see the file
CONFIGURE_HOWTO.txt, located in the docs/ subdirectory of the GBrowse
distribution. Or for the gory and bleeding edge details, run the command:
      % perldoc Bio::Graphics::Panel
This produces copious documentation on the Perl interface to all the glyphs,
including some amazingly obscure ones, from which you should be able to deduce
the GBrowse equivalents.
**** 1.3. Adding Descriptions to a Feature ****
By default, GBrowse will display the name of the feature above its glyph
provided that there is sufficient space to do this. Optionally, you can also
attach some descriptive text to the feature. This text will be displayed below
the feature, and can also be searched.
You can place descriptions, notes and other comments into the ninth column of
the GFF load file. The example file volvox2.gff3 shows how this is done. An
excerpt from the top of the file looks like this:
     ctgA example polypeptide_domain 11911 15561 . + .
     Name=m11;Note=kinase
     ctgA example polypeptide_domain 13801 14007 . - . Name=m05;Note=helix
     loop helix
     ctgA example polypeptide_domain 14731 17239 . - .
     Name=m14;Note=kinase
     ctgA example polypeptide_domain 15396 16159 . + . Name=m03;Note=zinc
     finger
This defines several new features of type "polypeptide_domain". The ninth
column, in addition to giving each of the motifs names adds a "Note" attribute
to each feature. As described earlier, each attribute is a name=value pair
separated by semicolons.
The attribute named Note is automatically displayed and made searchable. To see
this work, add volvox2.gff3 to the volvox database. You can do this just by
copying the file into /var/www/html/gbrowse/databases/volvox so that the directory
contains both the original volvox1.gff3 and the new volvox2.gff3 files.
To display this newly-loaded data set, open up volvox.conf and add the
following new stanza to the config file:
     [Motifs]
     feature      = polypeptide_domain
     glyph        = span
     height       = 5
     description  = 1
     key          = Example motifs
This defines a new track whose internal name is "Motifs." The corresponding
feature type is "motif" and it uses the "span" glyph, a graphic that displays a
horizontal line capped by vertical endpoints. The height is set to five pixels,
and the human-readable key is set to "Example motifs." A new option,
"description" is a flag that tells GBrowse to display the Note attribute, if
any. Any non-zero value means true.
After updating the configuration file, you will need to reload the browser page
and turn on the "Example motifs" checkbox below the main image. The result is
shown in Figure 4.
     [figures/descriptions1.gif]
     Figure 4: Showing the Notes attribute
A copy of this config file is also available for you to use in volvox2.conf.
To show that GBrowse will search the notes for keyword matches, try typing in
"kinase." You will be presented with a list of all the motifs whose Note
attribute contains the word "kinase."
**** 1.4. Adjusting GBrowse Name Searches ****
GBrowse has a very flexible search feature. You can type in the name of a
reference sequence, such as "ctgA", and it will display the entire thing, or
you can type in a range in the format "ctgA:start..stop". Try "ctgA:5000..8000"
to see this at work.
In addition, GBrowse can search for features by name. Anything that has a Name
or Alias attribute in the GFF3 file can be searched for by name. For example,
try searching for "f10" or even "f1*". The only drawback to this is that you
may have name collisions. For example, some research communities distinguish
genes from their products using differences in capitalization, for example hga
and HGA. However, GBrowse's searches are case insensitive. To avoid name
collisions, you can give each type of feature a distinctive naming prefix, for
example "Gene:hga" and "Protein:HGB".
To illustrate how this works, have a look at volvox2b.gff3:
     ctgA example remark                             1000 2000 . . .
     Name=Remark
     ctgA example protein_coding_primary_transcript  1100 2000 . + .
     Name=Gene:hga
     ctgA example polypeptide                        1200 1900 . + .
     Name=Protein:HGA
     ctgA example protein_coding_primary_transcript  1600 3000 . - .
     Name=Gene:hgb
     ctgA example polypeptide                        1800 2900 . - .
     Name=Protein:HGB
Copy this file into the databases/volvox folder. Note that as you add new files
to the database folder, you may need to disable caching in to see the new
features show up immediately. To do this, simply scroll down to the "Display
Settings" portion of the GBrowse display and unselect "Cache tracks."
Now add the following configuration stanza to volvox.conf to create a track
that displays both protein_coding_primary_transcript and polypeptide features:
     [NameTest]
     feature      = protein_coding_primary_transcript polypeptide
     glyph        = generic
     stranded     = 1
     bgcolor      = green
     height       = 10
     key          = Name test track
This stanza creates a new track named "Name test track" and displays features
of type "protein_coding_primary_transcript" and "polypeptide" using green
generic glyphs that are 10 pixels high. When you look at the data file, you'll
see that there are three things potentially named "HGA", a remark which uses
the unqualified name, a gene which uses the qualified name "Gene:hga", and a
polypeptide region which uses the qualified name "Protein:HGA." There is also a
protein_coding_primary_transcript named "Gene:hgb" and a protein named
"Protein:HGB." (Note, in this track we are using slightly awkward sequence
ontology terms, like "protein_coding_primary_transcript," rather than more
natural terms like "gene" in order to avoid these example features from
appearing in the real "gene" track that we create later on in this tutorial.)
To see how GBrowse searches for names, type "HGA" (either upper or lowercase)
in the search textbox and press "Search." Because the search term matches the
remark whose unqualified name is HGA, GBrowse will bring up the region between
1000..2000 and highlight the HGA remark.
Now search for "Protein:HGA." Because you searched with the qualified name,
GBrowse will find and highlight the protein feature.
Now try to search for "HGB." This search fails because HGB only exists in
qualified form in the database. You can still, however, search for "Gene:HGB"
or "Protein:Hgb" (capitalization doesn't matter). This may or may not be the
behavior that you desire. If you would like GBrowse to search through qualified
names when the user types the unqualified version, you can configure this
easily by adding the following line to volvox.conf under the [General] section:
     automatic classes = Gene Protein
This option directs GBrowse to search for the unqualified name first, followed
by names prefixed with "Gene:" and then names prefixed with "Protein:".
Whichever is found first will be displayed. Now searching for "HGB" will find
"Gene:hgb". Swapping the order of Gene and Protein on this line will cause the
"Protein:HGB" to be found.
Another way to approach this is to make liberal use of the Alias attribute. For
example:
     ctgA example remark                             1000 2000 . . .
     Name=Remark:HGA;Alias=hga
     ctgA example protein_coding_primary_transcript  1100 2000 . + .
     Name=Gene:hga;Alias=hga
     ctgA example polypeptide                        1200 1900 . + .
     Name=Protein:HGA;Alias=hga
     ctgA example protein_coding_primary_transcript  1600 3000 . - .
     Name=Gene:hgb;Alias=hga
     ctgA example polypeptide                        1800 2900 . - .
     Name=Protein:HGB;Alias=hga
This assigns the alias of "hga" to each of the three HGA features, and an alias
of "hgb" to each of the two HGB features. This keeps the identities of these
features distinct so that you can find particular ones by typing in the fully
qualified name ("Gene:hga"), but find all candidates when you type in the
unqualified name. For instance, when you search with "hga", GBrowse will now
offer you three matches:
     [figures/aliases.gif]
     Figure 5: Searching for aliases
**** 1.5. Linking ****
The next topic we'll cover in this tutorial is configuring GBrowse's outgoing
links. When the user clicks on a glyph in the details image, he will be taken
to another page by following a URL. The URL to follow is generated from the
link option. The default link option is located in the [TRACK DEFAULTS] section
of the config file; you can specify track-specific links by placing a link
option in one or more of the individual track stanzas.
The volvox.conf track defaults looks like this:
     [TRACK DEFAULTS]
     glyph         = generic
     height        = 10
     bgcolor       = lightgrey
     fgcolor       = black
     font2color    = blue
     label density = 25
     bump density  = 100
     # where to link to when user clicks in detailed view
     link          = AUTO
In this case, we've been using a special link URL of "AUTO." This generates an
automatic link to a helper script named "gbrowse_details." If you click on some
of the features in the current volvox page you'll get an idea of what this
script displays. Try clicking on a motif, a spliced transcript, the EDEN gene,
and an EST. When you click on the spliced transcript, notice that the content
of the "Gene" attribute is displayed. By adding attributes like this one, you
can build up a very modest web-browsable database of facts about your features.
We're going to override the default link rule for the motif track. There's
nothing sensible to link to, so we'll link to Google using first the motif's
name, and then the motif's description.
Go to the [Motifs] stanza in the volvox.conf config file and modify it so that
it looks like this:
     [Motifs]
     feature      = polypeptide_domain
     glyph        = span
     height       = 5
     description  = 1
     link         = http://www.google.com/search?q=$name
     key          = Example motifs
The only change we've made is to add a "link" option to the stanza, where the
value is a Google search URL. "$name" is a Perl variable. GBrowse will fill in
this variable with the name of the motif. Reload the page and click on a motif
to see that this works as advertised ("m01," "m02" and the other example motifs
are similar to the names for galactic clusters, so be prepared for some
astronomy hits).
It would be more sensible to link to the description of the motif, for example
"helix loop helix." Fortunately we can do that too. Just change the link option
to:
     link         = http://www.google.com/search?q=$description
There are a large number of possible variables that you can use inside link
rules. See the CONFIGURE_HOWTO document in the GBrowse distribution for the
full list. You can also construct links using Perl callbacks as described in
the section on displaying_ESTs. This gives you the ability to generate any
arbitrary URL.
If you want nothing to happen when the user clicks on a feature, just set link
to empty ("link = ").
The last thing we'll do is to change the behavior of the [Motif] track so that:
   1. a new window pops up with the google search rather than replacing the
      contents of the current window
   2. when the user mouses over a motif, a hints box will appear telling him
      that clicking there will initiate a google search
These changes are easy:
     [Motifs]
     feature      = polypeptide_domain
     glyph        = span
     height       = 5
     description  = 1
     link         = http://www.google.com/search?q=$description
     link_target  = _blank
     title        = Search Google for $description.
     key          = Example motifs
There's now a link_target option. This contains the name of a browser window in
which to load the content when the user clicks on the feature. If there's no
window of that name, the browser will create a new window and give it the
desired name. Choose an ordinary name like "Google" if you want the Google
content to be loaded into the same window each time, or choose "_blank" as
we've done here in order to pop up a new fresh window each time the user
clicks.
The title option contains a bit of text that will be displayed whenever the
user hovers the mouse over the feature for a second or two. The same variable
substitution rules apply, so when the user mouses over feature "m06", a hints
window will pop up that says "Search Google for SUSHI repeat." Give it a try!
**** 1.6. Adding Popup Balloons to Tracks ****
GBrowse can display popup balloons when the user hovers over or clicks on a
feature. The balloons can display arbitrary HTML, either provided in the config
file, or fetched remotely via a URL. You can use this feature to create
multiple choice menus when the user clicks on the feature, to pop up images on
mouse hovers, or even to create little embedded query forms. See http://
mckay.cshl.edu/balloons.html for examples.
In the config file for the database you wish to modify, set ``balloon tips'' to
a true value:
          [GENERAL]
          ...
          balloon tips = 1
Then add ``balloon hover'' and/or ``balloon click'' options to the track
stanzas that you wish to add buttons to. You can also place these options in
[TRACK DEFAULTS] to create a default balloon.
``balloon hover'' specifies HTML or a URL that will be displayed when the user
hovers over a feature. ``balloon click'' specifies HTML or a URL that will
appear when the user clicks on a feature. The HTML can contain images,
formatted text, and even controls. Examples:
      balloon hover = <h2>Gene $name</h2>
      balloon click = <h2>Gene $name</h2>
            <a href='http://www.google.com/search?q=$name'>Search Google</
     a><br>
            <a href='http://www.ncbi.nlm.nih.gov/entrez/
     query.fcgi?db=pubmed&term=$name'>Search NCBI</a><br>
For example, to add a balloon to the motifs track of the Volvox browser, add
"balloon tips = 1" near the top of the volvox.conf file, and then add balloon
hover and balloon click options like this:
     [Motifs]
     feature      = polypeptide_domain
     glyph        = span
     height       = 5
     description  = 1
     category     = Proteins
     balloon hover = <h2>Gene $name</h2>
     balloon click = <h2>Gene $name</h2>
            <a href='http://www.google.com/search?q=$name'>Search Google</
     a><br>
            <a href='http://www.ncbi.nlm.nih.gov/entrez/
     query.fcgi?db=pubmed&term=$name'>Search NCBI</a><br>
     key          = Example motifs
Alternatively, you can populate the balloon using data from an HTML page or
dynamic CGI script running on the same server as GBrowse. This uses AJAX; it
can often speed up page loading by reducing the amount of text that must be
downloaded by the client. To dynamically load the balloon contents from the
server, use a balloon hover or balloon click option like this:
      balloon click = /cgi-bin/get_gene_data?gene=$name
In this case, when the user clicks on the feature, it creates a balloon whose
content contains the HTML returned by the CGI script ``get_gene_data''. GBrowse
knows that this is a URL rather than the contents of the balloon by looking for
the leading slash. However, to reduce ambiguity, we recommend that you prefix
the URL with ``url:'' as so:
      balloon click = url:/cgi-bin/get_gene_data?gene=$name
This also allows you to refer to relative URLs:
      balloon click = url:../../get_gene_data?gene=$name
It is also possible to fill the balloon with content from a remote source.
Simply specify a full URL beginning with ``http:'' ``https:'' or ``ftp:''
      balloon hover = http://www.wormbase.org/db/get?name=$name;class=gene
Note that the balloon library uses an internal <iframe> to populate the balloon
with the content of external URLs. This means that vertical and horizontal
scrollbars will appear if the content is too large to be contained within the
balloon. If the formatting does not look right, you can design a custom balloon
of the proper size as described in the next section.
The usual option value substitution rules ($name, $start, etc) work with
balloons, as do callbacks. GBrowse will automatically escapes single (') and
double (``) quotes in the values returned by the ''balloon hover`` and
''balloon click`` options so that you don't have to worry about them messing up
the HTML.
You might also wish to specify ``titles are balloons'' in the [GENERAL]
section:
      [GENERAL]
      titles are balloons = 1
This will generate balloons on all mouse hover events, using the content that
would otherwise have been placed in the built-in browser tooltip.
There is a limited amount of balloon customization that you can perform within
each [track] section. If you wish the balloon to be sticky (require the user to
press the close button) even if it is a hover balloon, then place this option
in the [track section]:
      balloon sticky = 1
Setting ``balloon sticky'' to 0 will have the effect of making balloons
disappear as soon as the mouse leaves them, even if it was created by a mouse
click event.
If you are displaying content from a remote web or FTP server and you do not
like the height of the balloon, you can adjust the height with the ``balloon
height'' option:
      balloon height = 400
*** 1.6.1. Customizing Balloons ***
GBrowse supports multiple balloons with different shapes, sizes, background
images and timing properties. There is one built-in balloon, named "balloon",
which should meet most peoples' needs. However, you can configure any number of
custom balloons.
To declare two new balloons, create a "custom balloons" option in the [GENERAL]
section:
      custom balloons = [blue_balloon]
                       images   =  /gbrowse/images/blue_balloons
                       maxWidth = 300
                       shadow   = 0

                       [wide_balloon]
                       maxWidth = 800
This creates two new balloons. The first, named "blue_balloon" will look for
its images and icons at the local URL /gbrowse/images/blue_balloons. It will
have a maximum width of 300 pixels, and will cast no shadow. The second, named
"wide_balloon" takes all the defaults for the default balloon, including the
location of its images in the directory /gbrowse/images/balloons, except that
it has a maximum width of 800 pixels. The various balloon options are described
in the GMOD_wiki.
To use the blue balloon rather than the standard one, format the "balloon
hover" and/or "balloon click" options like this:
      balloon click = [blue_balloon] /cgi-bin/get_gene_data?gene=$name
The [blue_balloon] keyword tells gbrowse to use the blue balloon for clicks on
these features. The standard balloon is called "balloon", and so the following
two options are equivalent:
      balloon click = /cgi-bin/get_gene_data?gene=$name
      balloon click = [balloon] /cgi-bin/get_gene_data?gene=$name
The images for custom balloons reside in the default location of /gbrowse/
images/balloons, unless indicated otherwise using the ``images'' config option.
To use custom balloon images, point "images" to a a web-accessible directory in
your document tree which contains the seven PNG images described in the
documentation.
These images must be named as listed below:
      balloon.png     down_right.png  up_right.png
      balloon_ie.png  down_left.png   up_left.png
      close.png
Tips for creating these images can be found here.
===============================================================================
***** 2. Displaying Common Types of Features *****
Now that you've seen the basics, we'll discuss techniques to display multi-part
features, genes, alignments, quantitative data and other special feature types.
**** 2.1. Multi-segmented features ****
Many features are discontinuous. Examples include spliced transcripts, and
gapped sequence similarity alignments, such as the alignment of cDNAs to the
genome. GBrowse can deal with such features easily provided that you take a
little care in setting them up.
The data file volvox3.gff3 contains a simulated data set of a series of gapped
nucleotide alignments. An excerpt from the file is here:
     ctgA example match 32329 32359 . + . ID=match-seg01;Name=seg01
     ctgA example match 26122 26126 . + . ID=match-seg02;Name=seg02
     ctgA example match 26497 26869 . + . ID=match-seg02;Name=seg02
     ctgA example match 27201 27325 . + . ID=match-seg02;Name=seg02
     ctgA example match 27372 27433 . + . ID=match-seg02;Name=seg02
     ctgA example match 27565 27565 . + . ID=match-seg02;Name=seg02
     ctgA example match 27813 28091 . + . ID=match-seg02;Name=seg02
     ctgA example match 28093 28201 . + . ID=match-seg02;Name=seg02
     ctgA example match 28329 28377 . + . ID=match-seg02;Name=seg02
     ctgA example match 28829 29194 . + . ID=match-seg02;Name=seg02
     ctgA example match  6885  7241 . - . ID=match-seg03;Name=seg03
     ctgA example match  7410  7737 . - . ID=match-seg03;Name=seg03
     ctgA example match  8055  8080 . - . ID=match-seg03;Name=seg03
     ctgA example match  8306  8999 . - . ID=match-seg03;Name=seg03
This file uses a new GFF3 attribute, "ID". The ID attribute is used to group
features together and to indicate when a single feature occupies multiple
discontinuous locations. In the case of a gapped alignment, each ungapped
segment is represented by a single GFF3 line. The segments of a single
alignment are then grouped together by using the same ID. For example "match-
seg03" starts at position 6885 and ends at 8999. It has four subsegments, one
from 6885..7241, another from 7410..7737, and so forth.
The ID attribute is not the same as the Name attribute. If you give three lines
the same ID, they will be grouped together into a single displayed feature. If
you give three lines the same Name you will end up with three distinct features
that all happen to share the same name. Also note that except for the
coordinates and the score (which we'll discuss later) all columns for each of
the parts of a multisegmented feature should be the same. For example, you
can't have one part of a feature on the (+) strand and another part on the (-
) strand.
Copy volvox3.gff into the volvox database directory. Then edit volvox.conf to
add the following track definition:
     [Alignments]
     feature      = match
     glyph        = segments
     key          = Example alignments
This is declaring a new track named "Alignments" which displays features of
type "match" using a glyph named "segments". The segments glyph is specialized
for displaying objects that have multiple similar subparts. Reload the page and
activate the "Example alignments" track. You should see a track similar to
Figure 6.
     [figures/segmented_features2.gif]
     Figure 6: Use the "segments" glyph to display discontinuous multipart
     features.
**** 2.2. Protein-Coding Genes ****
GBrowse can display protein-coding genes in various shapes and styles. The
easiest way to set this up is to use the sequence_ontology's_canonical
description_of_a_gene along with the "gene" glyph. Take a look at the file
volvox4.gff3, which defines a gene named EDEN, and its three spliced forms
named EDEN.1, EDEN.2 and EDEN.3. Here is the contents of the file:
     ctgA example gene            1050 9000 . + .
     ID=EDEN;Name=EDEN;Note=protein kinase

     ctgA example mRNA            1050 9000 . + .
     ID=EDEN.1;Parent=EDEN;Name=EDEN.1;Index=1
     ctgA example five_prime_UTR  1050 1200 . + . Parent=EDEN.1
     ctgA example CDS             1201 1500 . + 0 Parent=EDEN.1
     ctgA example CDS             3000 3902 . + 0 Parent=EDEN.1
     ctgA example CDS             5000 5500 . + 0 Parent=EDEN.1
     ctgA example CDS             7000 7608 . + 0 Parent=EDEN.1
     ctgA example three_prime_UTR 7609 9000 . + . Parent=EDEN.1

     ctgA example mRNA            1050 9000 . + .
     ID=EDEN.2;Parent=EDEN;Name=EDEN.2;Index=1
     ctgA example five_prime_UTR  1050 1200 . + . Parent=EDEN.2
     ctgA example CDS             1201 1500 . + 0 Parent=EDEN.2
     ctgA example CDS             5000 5500 . + 0 Parent=EDEN.2
     ctgA example CDS             7000 7608 . + 0 Parent=EDEN.2
     ctgA example three_prime_UTR 7609 9000 . + . Parent=EDEN.2

     ctgA example mRNA            1300 9000 . + .
     ID=EDEN.3;Parent=EDEN;Name=EDEN.3;Index=1
     ctgA example five_prime_UTR  1300 1500 . + . Parent=EDEN.3
     ctgA example five_prime_UTR  3000 3300 . + . Parent=EDEN.3
     ctgA example CDS             3301 3902 . + 0 Parent=EDEN.3
     ctgA example CDS             5000 5500 . + 1 Parent=EDEN.3
     ctgA example CDS             7000 7600 . + 1 Parent=EDEN.3
     ctgA example three_prime_UTR 7601 9000 . + . Parent=EDEN.3
GFF3 uses a three-tiered structure to represent the gene, descending from gene
to mRNA to CDS and UTR features. A gene has potentially many mRNAs, and each
mRNA has potentially several CDS and UTR features. To describe how the parts
fit together, we use ID and Parent features.
We start with a feature of type "gene" with the ID "EDEN". This has three
alternative splice forms named EDEN.1, EDEN.2 and EDEN.3. To tell GBrowse that
each of these splice forms are part of the same gene, we give each one a Parent
attribute of "EDEN" corresponding to the ID of the parent gene. Now consider
mRNA EDEN.1. It has a five_prime_UTR feature, a three_prime_UTR feature, and
four CDS features. To indicate that the CDS and UTR features belong to the
mRNA, we give the mRNA a unique ID of "EDEN.1" and give each of the subfeatures
a corresponding parent. This pattern repeats for each of the other two splice
forms. Note how the five_prime_UTR of EDEN.3 is split in two parts.
As before, we use "Name" to give the gene and its alternative splice forms a
human-readable name, and use Note to provide a description for the gene as a
whole (you can add notes to the individual mRNAs but they won't display by
default). The Index=1 attribute is a hint to the database to make the mRNAs
searchable by name. This lets users find the gene by searching for the mRNA
names ("EDEN.1") as well as by the gene name ("EDEN"). However, it is usually
unecessary to do this. Also notice that we are using the Phase column for the
CDS features to describe how the CDS is translated into protein. See the
description of phase in the data file section.
This is the full way to describe genes. Simpler ways are described later in
this section.
     HINT: If you prefer not to distinguish between 5' and 3' UTRs, you
     can simply use "UTR" as the type. If you don't know where the UTRs
     are, just leave them blank. If you'd rather think in terms of exons
     and introns, then check out so_transcript glyph.
Go ahead and add volvox4.gff3 to the database. Then add the following new
stanza to the bottom of the file:
     [Genes]
     feature      	   = gene
     glyph              = gene
     bgcolor            = peachpuff
     label_transcripts  = 1
     draw_translation   = 1
     category           = Genes
     key                = Protein-coding genes
The updated aggregators option loads the processed_transcript The new Genes
track associates "gene" features with the "gene" glyph, sets its background
color to peachpuff (yes, there really is a color by this name!), turns on the
description lines, and sets the human readable track name to "Protein-coding
genes."
Upon reloading the page, turning on the new "Protein-coding genes" track, and
viewing the region around 1..10K, you'll see this:
     [figures/canonical_gene1.gif]
     Figure 7: The canonical gene
The gene glyph has a number of options that you can use to customize its
appearance:
 _________________________________________________________________________
|Option_Name_____|Possible_values________|Description_____________________|
|thin_utr________|0_(false),_1_(true)____|If_true,_makes_UTRs_half-height.|
|utr_color       |a color name ("gray" by|Changes the UTR color.          |
|________________|default)_______________|________________________________|
|                |                       |If true, puts little arrowheads |
|decorate_introns|0 (false), 1 (true)    |on the introns to indicate      |
|________________|_______________________|direction_of_transcription._____|
Using these options, we can make the track look like the UCSC Genome Browser
(Figure 8).
     [Transcripts]
     feature      = processed_transcript gene
     glyph        = processed_transcript
     height       = 8
     bgcolor      = black
     utr_color    = black
     thin_utr     = 1
     decorate_introns = 1
     description  = 1
     key          = Protein-coding genes
     [figures/canonical_gene3.gif]
     Figure 8: A UCSC Genome Browser lookalike
*** 2.2.1. Simpler Genes ***
If the full three-tiered representation of a gene bugs you, there are simpler
alternatives. To represent a typical predicted gene that only has a translated
region, you can represent the translation as a single CDS line for a single-
exon gene, or a series of linked lines for a spliced gene. data_files/
volvox4b.gff3 shows how to do this:
     ctgA predicted CDS 10000 11500 . + 0 Name=Apple1

     ctgA predicted CDS 13000 13800 . + 0 ID=cds-Apple2;Name=Apple2
     ctgA predicted CDS 15000 15500 . + 1 ID=cds-Apple2;Name=Apple2
     ctgA predicted CDS 17000 17200 . + 2 ID=cds-Apple2;Name=Apple2
This creates two linked CDS sets: a single exon predicted called Apple1 and a
three-exon gene called Apple2. Note that we use a common ID to tie the three
Apple2 exons together.
The corresponding stanza will look like this:
     [CDS]
     feature      	   = CDS:predicted
     glyph              = gene
     bgcolor            = white
     category           = Genes
     key                = Predicted genes
We are still using the gene glyph. However, be aware that this depends on a
recent (as of January 2008) enhancement to bioperl-live. If the gene does not
display properly for you, then either install a newer version of bioperl (for
example, the SVN "bioperl-live" version), or use the older "transcript" glyph
instead of the gene glyph.
The other thing to notice is that the feature is now qualified as "CDS:
predicted". This corresponds to a GFF3 type (column 3) of "CDS", and a GFF3
source (column 2) of "predicted." In all previous examples, we used an
unqualified feature name, but in this case we don't want the CDS subfeatures
from the three-tier EDEN gene examples to be displayed in the predicted gene
track. Therefore we limit the features that are displayed in this track by
qualifying the feature type with its source using the syntax shown here.
The result is shown in Figure 9:
     [figures/predicted_genes.gif]
     Figure 9: Simpler genes using linked CDSs and the transcript glyph
The bottom six lines of volvox4b.gff3 show how to display a single transcript
that has both coding and non-coding regions.
     ctgA exonerate mRNA 17400 23000 . + . ID=rna-
     Apple3;Name=Apple3;Note=Predicted
     ctgA exonerate UTR  17400 17999 . + . Parent=rna-Apple3
     ctgA exonerate CDS  18000 18800 . + 0 Parent=rna-Apple3
     ctgA exonerate CDS  19000 19500 . + 1 Parent=rna-Apple3
     ctgA exonerate CDS  21000 21200 . + 2 Parent=rna-Apple3
     ctgA exonerate UTR  21201 23000 . + . Parent=rna-Apple3
To represent this transcript, we need to create a feature of type mRNA and a
unique ID, followed by several UTR and CDS subfeatures all linked to the mRNA
via their Parent attribute. In this example we use "UTR" for the UTR features,
although the more explicit "five_prime_UTR" and "three_prime_UTR" types will
also work. The "so_transcript" (Sequence Ontology transcript) glyph knows how
to display these correctly:
     [Transcript]
     feature      	   = mRNA:exonerate
     glyph              = so_transcript
     description        = 1
     bgcolor            = beige
     category           = Genes
     key                = Exonerate predictions
After making this addition to the configuration file, reload the page and turn
on "Exonerate predictions." You will see a display that is similar to the gene
track, but treats each transcript as a separate feature.
As with the previous example, this depends on a recent (as of January 2008)
change to bioperl. If it doesn't work for you, either update bioperl, or use
the "so_transcript" glyph.
**** 2.3. Reading Frames ****
Continuing with the example from section 2.2, the third exon of EDEN.1 is
shared with EDEN.3. But is the reading frame preserved? The "cds" glyph will
create a display that will visualize each CDS's reading frame.
To see this work, add the following stanza to the bottom of the configuration
file:
     [ReadingFrame]
     feature            = mRNA
     glyph              = cds
     ignore_empty_phase = 1
     category           = Genes
     key                = Frame usage
When you reload the page and turn this track on, you'll see a "musical staff"
representation of the frame usage (Figure 10). From this we can see that the
alternative splicing in fact changes the reading frame of the second exon.
The "feature" option tells the glyph to take its data from the mRNA subfeatures
of the main gene features. Note that depending on which data adaptor you use,
you may need to specify the attribute "Index=1" for each of the mRNA
subfeatures in order for the glyph to find them inside the gene object.
However, this is usually unnecessary.
     [figures/cds1.gif]
     Figure 10: The "cds" glyph shows the reading frame using a musical
     staff notation
**** 2.4. Grouped Features ****
In some circumstances you may wish to group features together to create a
multipart feature. The gene object is actually just a special case of this. To
show you the general case, we'll creature a feature of type "BAC", whose
subparts are of type "clone_start" and "clone_end" (possibly corresponding to a
BAC clone mapping experiment). Here is the GFF3 representation of this:
     ctgA example BAC         1000  20000 . . .
     ID=b101.2;Name=b101.2;Note=Fingerprinted BAC with end reads
     ctgA example clone_start 1000   1500 . + . Parent=b101.2
     ctgA example clone_end   19500 20000 . - . Parent=b101.2
As you can see, we've created a top-level feature of type "BAC" with two
children of type "clone_start" and "clone_end" respectively. The start and end
have opposite strands, indicating that they were sequenced off different
strands of the BAC. The three features are tied together using the ID and
Parent attributes that should be familiar to you from the gene examples.
This data lives in volvox5.gff3. Go ahead and add this into the database now.
To visualize this add the appropriate stanza to the bottom of volvox.conf:
     [Clones]
     feature      = BAC
     glyph        = segments
     bgcolor      = yellow
     connector    = dashed
     strand_arrow = 1
     description  = 1
     key          = Fingerprinted BACs
With this new track turned on, look at ctgA:1..24200. It will show that GBrowse
has correctly picked up and rendered the relationship between the whole BAC and
its two end reads (Figure 11). We have seen all these display options before
with the exception of the "connector" option. This controls the appearance of
the connecting line between subparts of a feature and can be one of "none",
"solid", "dashed", "hat" or "quill". Try them and see what happens! (Note, you
will have to change the strandedness of the BAC parent feature from "." to "+"
in order to see anything special happen with the quill connector.)
     [figures/custom_aggregators1.gif]
     Figure 11: Displaying a simple multipart feature
For your convenience, the configuration file with all the modifications made up
through this point of the tutorial can be found in volvox3.conf.
**** 2.5 Showing Quantitative Data (basic) ****
GBrowse can plot quantitative data such as alignment scores, confidence scores
from gene prediction programs, and microarray intensity data. The data can be
displayed either with glyphs that change color to indicate score levels (see
the "heterogeneous_segments", "graded_segments" and "redgreen_box" glyphs), or
using a general-purpose XY-plot glyph.
Congratulations, Affymetrix has built a tiling array for the volvox genome!
There's now a transcriptional profile for volvox, with an intensity reading
every 100 bp across all of ctgA. The simulated data for this is in the file
volvox6.gff3, an excerpt of which is shown here:
     ctgA affy microarray_oligo   1 100 281 . . Name=Expt1
     ctgA affy microarray_oligo 101 200 183 . . Name=Expt1
     ctgA affy microarray_oligo 201 300 213 . . Name=Expt1
     ctgA affy microarray_oligo 301 400 191 . . Name=Expt1
     ctgA affy microarray_oligo 401 500 288 . . Name=Expt1
     ctgA affy microarray_oligo 501 600 184 . . Name=Expt1
     ...
The file contains 500 features, each of which is exactly 100 bp long. The
features are of type "microarray_oligo" and of source "affy." Each one has a
score (column 6) between 0 and 1000, where higher scores means more
transcriptional activity. This is the first time we've used the score column.
All of the 500 features share the same Name (column 9) of "Expt1". Sharing the
same name will allow us to group them together into a single transcriptional
profiling experiment. However, we do not give them the same ID for reasons that
are explained later. If we had multiple experiments to show, they would be
named Expt1, Expt2 and so on.
We would like to generate a line graph that shows the transcriptional profile
level across the current region. To do this, we need to group all members of
the same experiment together into a single graph, and then to assign the
"xyplot" glyph to the data. The following configuration stanza will do this:
     [TransChip]
     feature        = microarray_oligo
     glyph          = xyplot
     graph_type     = boxes
     fgcolor        = orange
     bgcolor        = orange
     height         = 50
     min_score      = 0
     max_score      = 1000
     scale          = right
     category       = Genes
     group_on       = display_name
     key            = Transcriptional Profile
The options shown here create a track named TransChip to display the tprofile
feature with the xyplot glyph. The "graph_type", "height", "scale",
"min_score", and "max_score" options all configure various aspects of the
xyplot glyph's appearance.
     You can read all about xyplot's options using perldoc Bio::Graphics::
     Glyph::xyplot
When you reload the page and turn on the Transcriptional Profile track, you
should see something like that shown in Figure 12.
     [figures/graph1.gif]
     Figure 12: A transcriptional profile rendered with the xyplot glyph
     Using the info that perldoc provides, play around with the xyplot
     options a bit. For example, see what happens when you change
     graph_type to "boxes."
**** 2.6 Quantitative Data (advanced) ****
The recipe in 11 works well for several thousand data points, but if you have
very dense data, such as that produced by genomic tiling arrays, then you will
want to use a specialized binary representation known as "wiggle" format. A
wiggle track consists of a file that contains all the quantitative data, and a
single feature in the database proper that points at that data file. Loading
this data is a three-step process:
   1. Create a WIG file.
   2. Convert the WIG file into the wiggle binary file and a gff3 line.
   3. Load the gff3 line.
WIG_format is a specialized format for describing quantitative data. It was
created by Jim Kent for use in the UCSC genome browser. Details on creating WIG
files are described at http://genome.ucsc.edu/goldenPath/help/wiggle.html.
WIG files are plain text files. They always begin with a "track" header, which,
at a minimum, looks like this:
     track type=wiggle_0 name="ArrayExpt1" description="20 degrees, 2 hr"
The "type" attribute is required, and must have a value of "wiggle_0". "name"
and "description" are optional, but suggested, and indicate the name and
description of the data series -- these will become the "Name" and "Note"
fields of the generated GFF3 feature. Following the track line comes the data
for one or more chromosomal regions. As described in the UCSC documentation,
there are three ways of formatting the data: (1)"Bed Format", (2)
"variableStep", and (3) "fixedStep" format. The first format is essentially the
same as GFF3 and does not give you any performance advantages over using
straight GFF3. variableStep format describes intervals of the genome that have
a fixed width, but begin at arbitrary locations, while fixedStep format
describes features of the genome that are evenly spaced and have a fixed width
(e.g. tiling array features).
For variableStep data, the format is:
      variableStep chrom=chr19 span=150
      59304701 10.0
      59304901 12.5
      59305401 15.0
      59305601 17.5
      59305901 20.0
      59306081 17.5
      59306301 15.0
      59306691 12.5
      59307871 10.0
The data is introduced by a line beginning with the keyword "variableStep", and
the arguments "chrom" and "span", which indicate the chromosome on which the
features are located, and the width of each feature, in base pairs. This is
followed by a series of two-element lines indicating the start position of each
feature, and its quantitative value. Values can be any sort of numeric data,
including integers, negative numbers and floating point.
For fixedStep data, the format is:
     fixedStep chrom=chr19 start=59307401 step=300 span=200
     1000
      900
      800
      700
      600
      500
      400
      300
      200
      100
The data is introduced by a line beginning with the keyword "fixedStep", and
the arguments "chrom", "span", "start" and "step". The first two arguments are
the same as before, while "start" and "step" indicate the starting position of
the first feature, and the spacing between each feature. This is followed by a
numeric value for each step. In this case, we have described 10 features
beginning at position 59307401. Each feature begins 300 bp from the next and is
200 bp wide. In practice, this means that the first 200 bp of each interval is
filled with known data, while information on the last 100 bp is "missing."
To see how this works in practice, let us reformat our example microarray data
using the fixedStep version of WIG format. The complete data for this is in the
file volvox_microarray.wig. It begins like this:
     track type=wiggle_0 name="example" description="20 degrees, 2 hr"
     fixedStep chrom=ctgA start=1 step=100 span=100
     281
     183
     213
     191
     288
     ...
Compare this to the microarray data in Showing_Quantitative_Data_(basic), and
you will see that the five entries in the WIG file correspond to the first five
features in the GFF3 files.
We'll now create the binary file for the data using the wiggle2gff3.pl script.
We want it to live in the volvox database directory, so we have to specify this
path when creating it:
     % wiggle2gff3.pl --path=/var/www/html/databases/volvox
     volvox_microarray.wig \
                      > volvox_microarray.gff3
After this script runs, it will write out a line of GFF3 data, which we save to
volvox_microarray.gff3. This file will look like this:
     ##gff-version 3

     ctgA . microarray_oligo 1 50000 . . .
     Name=example;Note=20%20degrees%2C%202%20hr;wigfile=/var/www/html/gbrowse/
     databases/volvox/track001.ctgA.1200440492.wig
This file contains a single feature that spans the region indicated by the WIG
file. The feature has the indicated name and description, and has a new
attribute "wigfile" that points to the place where the quantitative data within
the region can be found. You are free to edit this file to change the source or
type, You can also set the source and type in wiggle2gff3.pl by passing it --
source and --type options on the command line. If you move the binary wiggle
file, please change the value of the "wigfile" attribute to indicate its new
location.
You can now move this GFF3 file into /var/www/html/gbrowse/databases/volvox. At this
time, please also delete the volvox6.gff3 file to avoid seeing the same data
twice.
One last step is needed to make the data display properly, however. You must
set the glyph type to either "wiggle_xyplot" or "wiggle_density." These are the
only glyphs that recognize and properly format wiggle-style data. You can also
remove the min and max options, since the wiggle binary files store this
information internally and it is no longer needed.
In the config file, change the [TransChip] stanza to look like this:
     [TransChip]
     feature        = microarray_oligo
     glyph          = wiggle_xyplot
     graph_type     = boxes
     height         = 50
     scale          = right
     category       = Genes
     description    = 1
     key            = Transcriptional Profile
When you reload the page, the quantitative data should display correctly. You
might notice a speed improvement; this becomes much more noticeable on large
data sets.
Now, for some fun, change the [TransChip] section to use the "wiggle_density"
glyph. Also set the bgcolor to "blue" and delete the unneeded graph_type and
scale options.
     [TransChip]
     feature        = microarray_oligo
     glyph          = wiggle_density
     height         = 30
     bgcolor        = blue
     category       = Genes
     description    = 1
     key            = Transcriptional Profile
This is what the modified track will look like:
     [figures/wiggle_density.png]
     Figure 13: A transcriptional profile rendered with the wiggle_density
     glyph
**** 2.7. DNA and 3-frame translations ****
GBrowse can take advantage of DNA sequence data in several ways:
   1. It can display a GC content graph of the reference sequence at low
      magnifications and the DNA sequence itself at higher magnifications.
   2. It can display three and six-frame translations of the reference sequence
      DNA.
   3. It can display the protein translation of coding regions.
   4. It can display aligned nucleotide sequences, creating a poor man's
      multiple alignment.
So we've been working with feature coordinates, but no actual DNA sequence has
been loaded into the volvox database. We will again rebuild the database, this
time loading in a simulated DNA file in fasta format. Download the file
volvox.fa, and copy it into the volvox database directory. At this point in the
tutorial, when you do a directory listing of the volvox database directory
(with "ls" on unix systems, or "dir/w" on Windows systems) it should look like
this:
     % ls /var/www/html/gbrowse/databases/volvox/
     track001.ctgA.1202327456.wig  volvox2.gff3   volvox4.gff3  volvox.fa
     volvox1.gff3		      volvox3.gff3   volvox5.gff3
     volvox2b.gff3		      volvox4b.gff3  volvox7.gff3
     volvox_microarray.gff3
If you haven't done so already, please be sure that you have made the database
directory writeable by the web server user, either by making it world writeable
(as described at the beginning of this tutorial), or by changing the
directory's group ownership to match the Apache web server's group account (it
varies from system to system, but "nobody", "www", "apache" and "www-data" are
the most common possibilities). This is all you need to do to load the DNA. To
see that the DNA is indeed being loaded, add two new stanzas to the volvox.conf
configuration file:
     [DNA]
     glyph          = dna
     global feature = 1
     height         = 40
     do_gc          = 1
     gc_window      = auto
     fgcolor        = red
     axis_color     = blue
     strand         = both
     key            = DNA/GC Content

     [Translation]
     glyph          = translation
     global feature = 1
     height         = 40
     fgcolor        = purple
     start_codons   = 0
     stop_codons    = 1
     translation    = 6frame
     key            = 6-frame translation
The "DNA" track uses a specialized glyph called "dna". At low magnifications
(zoomed way out), this glyph draws a GC content plot. At high magnifications
(zoomed way in), this glyph draws the dna. Of the various options given in the
example stanza, the most important one is "global feature", which is set to a
true value (1). This tells GBrowse that the stanza doesn't correspond to a
specific feature type, but should be displayed globally. Other options control
whether to draw one or both strands, whether to draw the GC content histogram,
the window size to use when smoothing the histogram, and what colors to use.
Similarly, the "Translation" track uses a glyph called "translation", which
draws three or six-frame conceptual translations. At low magnifications (zoomed
way out), this glyph draws little symbols indicating where start and stop
codons are. At high magnifications, the actual amino acid sequence comes into
view. Again, the most important option is "global feature", which is set to a
true value to tell GBrowse that the track isn't attached to a particular
feature type, but is to be generated automatically. Other options control the
height of the glyph, whether to draw start and/or stop codon symbols, and
whether to generate a 3frame or 6frame translation.
Figures 13a and 13b show the browser at low and high magnification, with both
tracks activated. Notice that the coding track ("cds" glyph) notices that the
DNA is available and generates the transcripts' protein translations
automatically!
     (14A)
     [figures/dna1.png]

     (14B)
     [figures/dna2.png]
     Figure 14: Viewing DNA/GC content and 6-frame translation. (a) low
     magnification; (b) high magnification
     If you happen to do a listing of the volvox database directory after
     adding the DNA file, you might notice that a new file named
     "directory.index" has appeared. This index directory is created
     automatically by GBrowse in order to speed up access to the .fa file
     and to reduce memory requirements. If the database directory is not
     writable by all users, GBrowse will not be able to create this
     directory, and the display will be somewhat slower whenever a DNA
     track is turned on.
**** 2.8 ESTs and Other Alignments ****
This section will lead you through creating a plausible EST track, and show you
how grouping of 5' and 3' EST reads works.
We'll start with a simple data set containing information on three pairs of EST
reads. You'll find this data set in volvox7.gff. Here is the first pair
described in the data file:
     ctgA est EST_match 1050 1500 . + . ID=Match1;Name=agt830.5
     ctgA est EST_match 3000 3202 . + . ID=Match1;Name=agt830.5

     ctgA est EST_match 5410 5500 . - . ID=Match2;Name=agt830.3
     ctgA est EST_match 7000 7503 . - . ID=Match2;Name=agt830.3

     ctgA est EST_match 1050 1500 . + . ID=Match3;Name=agt221.5
     ctgA est EST_match 5000 5500 . + . ID=Match3;Name=agt221.5
     ctgA est EST_match 7000 7300 . + . ID=Match3;Name=agt221.5
     ...
What's going on here is the same as the alignments shown in volvox3.gff. There
are two EST reads named agt830.5 (the 5' read) and agt830.3 (the 3' read). Each
of them matches the ctgA genome in two discontinuous regions because,
presumably, they cross a splice site. As in the earlier example, we represent
each EST as a single "EST_match" feature that spans several lines. The lines
are linked together by sharing the same ID attribute.
There are two other things to notice. One is that the source field (column 2)
is "est" and the type (column 3) is "EST_match." Either of these fields can be
used to distinguish the EST matches in this file from the generic "match"
matches used in the earlier example. The second item of interest is that the
strand field (column 7) is + for the 5' EST and - for the 3' EST, indicating
that the 3' EST aligned to the reverse complement of ctgA.
Add this file to the volvox database directory, and add the following to the
configuration file:
     [EST]
     feature      = EST_match:est
     height       = 6
     glyph        = segments
     bgcolor      = orange
     key          = ESTs
This will give a display similar to that shown in Figure 15.
     [figures/multiple_alignments1.gif]
     Figure 15: A simple representation of EST matches.
For reasons described earlier, the feature option reads "EST_match:est" rather
than simply "match" in order to distinguish the EST matches from the example
matches that we loaded previously.
This display is OK, but it could be better. One problem is that the
relationship between the 5' and 3' EST read pairs is not shown. We'd like to
place the two members of the pair together on the same line, and connect them
with a dotted line to show that they are the two ends of the same cDNA clone.
An easy way to do this is to add a "group_pattern" option to the [EST] stanza:
     [EST]
     feature       = EST_match:est
     glyph         = segments
     height        = 6
     bgcolor       = orange
     group_pattern = /\.[53]$/
     key           = ESTs
The new group_pattern option tells GBrowse to use a Perl regular expression
pattern matching operation to find and group related EST matches based on their
names. It helps to understand how Perl regular expressions work, but basically
the pattern match breaks down this way:
       /            begin the pattern match
       \.           match a dot
       [53]         match either the numbers 5 or 3
       $            match the end of the string
       /            end the pattern match
What this is saying is to look for pairs of EST names that are similar except
for the terminal .5 or .3, and pair them. When we reload the page, we get
Figure 16.
     [figures/multiple_alignments2.gif]
     Figure 16: The group_pattern option allows EST pairs to be grouped
Here are regular expressions that will work for other common EST pairing
schemes:
 ____________________________________
|5'_EST____|3'_EST____|group_pattern_|
|agt123f___|agt123r___|/[fr]$/_______|
|agt123p___|agt123q___|/[pq]$/_______|
|f.agt123__|r.agt123__|/^[fr]\./_____|
|5.agt123__|3.agt123__|/^[53]\./_____|
|agt123.for|agt123.rev|/\.(for|rev)$/|
Another nice enhancement would be to give the 5' and 3' ESTs different colors
so as to distinguish one from another. This can be accomplished using a Perl
callback. Open up volvox.conf once more, and find the bgcolor option in the
[EST] track. Replace it with this (you may want to cut and paste from here in
order to avoid introducing any typos):
     bgcolor      = sub {
     		my $feature = shift;
     		my $name    = $feature->display_name;
     		if ($name =~ /\.5$/) {
     		   return 'red';
     		} else {
     		   return 'orange';
     		}
     	}
You'll need to know the basics of the Perl programming language in order to do
this type of thing yourself. Suffice to say that instead of hard-coding the
color "orange" into the bgcolor option, we are asking GBrowse to run a Perl
subroutine each time it needs to render an EST. The subroutine is passed the
feature that is about to be drawn. It asks the feature for its human-readable
name (display_name) and assigns that name to a variable named $name. It then
performs a pattern match on the name to see if it ends in a "5". If the name
matches, the subroutine returns the color "red" to GBrowse. Otherwise it
returns the color "orange."
The effect is shown in Figure 17.
     [figures/multiple_alignments3.gif]
     Figure 17: Using a callback to distinguish 5' and 3' ESTs
*** 2.8.1. Adding DNA to Alignments ***
The last thing we'll do with the EST data set is to add DNA to the ESTs so that
at high magnification GBrowse will show the multiple alignment. This
information is also used by the "dump alignments" plugin to generate a text-
based multiple alignment.
     NOTE: Currently only nucleotide to nucleotide alignments can be
     displayed at the level of individual nucleotides (e.g. BLASTN, BLAT,
     Exonerate). Protein to nucleotide alignments, such as those produced
     by Genewise or BLASTX, are not supported at the residue level
To make this work, we need to add two additional pieces of information to the
EST alignment data:
   1. The DNA sequences of the volvox ESTs.
   2. The alignment positions in EST coordinates.
In case the need for item (2) isn't immediately clear, consider this blow-up of
an alignment:
     ctgA      1050 gattgccattgaccttggccattggccaagctgaa 1086
                    |||||||||| ||||||| ||||||||||||||||
     agt830.5     1 gattgccattcaccttgggcattggccaagctgaa 135
What we currently have in the GFF file are the source genomic positions of the
alignments (in ctgA-relative coordinates). We need to add the target positions
in agt830.5-relative coordinates in order for GBrowse to fetch and display the
appropriate segments of the EST DNA.
The fasta file ests.fa provides the DNA sequences for the six EST reads. The
GFF load file volvox8.gff contains the revised coordinates. If you look at this
file you'll see that it is dissimilar to previous load files:
     ctgA est EST_match 1050 1500 . + .
     ID=Match1;Name=agt830.5;Target=agt830.5 1 451
     ctgA est EST_match 3000 3202 . + .
     ID=Match1;Name=agt830.5;Target=agt830.5 452 654
     ctgA est EST_match 5410 5500 . - .
     ID=Match2;Name=agt830.3;Target=agt830.3 505 595
     ctgA est EST_match 7000 7503 . - .
     ID=Match2;Name=agt830.3;Target=agt830.3 1 504
     ctgA est EST_match 1050 1500 . + .
     ID=Match3;Name=agt221.5;Target=agt221.5 1 451
     ctgA est EST_match 5000 5500 . + .
     ID=Match3;Name=agt221.5;Target=agt221.5 452 952
     ctgA est EST_match 7000 7300 . + .
     ID=Match3;Name=agt221.5;Target=agt221.5 953 1253
     ...
The first eight columns are identical to what we've been using before, but the
ninth column follows a new convention used for nucleotide to nucleotide and
protein to nucleotide alignments. There is now a special attribute, "Target",
that tells GBrowse specifies the name of the EST sequence (found in a FASTA
file), the start position of the alignment in EST coordinates, and the end
position of the alignment in EST coordinates. the combination of a target
sequence and its coordinates. For example, the first segment of the first
alignment, agt830.5, spans positions 1050 to 1500 in genome coordinates, and
positions 1-451 in EST sequence coordinates.
There is a subtlety here. Notice that for minus strand ESTs, the target
coordinates are not reversed; the start position is always less than the end
position. For example, for the first agt830.3 HSP, we are told that genomic
region 5410..5500 aligns to EST region 505..596. The strand field is used to
determine the direction of the alignment.
Since this data file contains a revised version of volvox7.gff, remove
volvox7.gff from the database directory and replace it with volvox8.gff. Also
copy ests.fa into the database directory. If you perform a directory listing,
it should look like this:
     directory.index  volvox1.gff3	volvox2.gff3  volvox4b.gff3
     volvox5.gff3  volvox8.gff3
     ests.fa		 volvox2b.gff3	volvox3.gff3  volvox4.gff3   volvox6.gff3
     volvox.fa
     NOTE: If you see doubled EST features after this point, make sure
     that you have removed volvox7.gff. Another thing to watch out for is
     that some sort of bug in the BioPerl layer (up through at least
     version 1.4) causes the EST DNA display to get messed up at this
     point on Windows systems. To fix the latter problem, go to the volvox
     database directory and remove the files directory.dir and
     directory.pag. These are automatically-generated DNA file indexes
     that GBrowse develops, and will be regenerated for you the next time
     you access a page.
We're not done with making configuration file changes, but volvox4.conf
contains all configuration file enhancements up to this point. If you like, you
can copy it over the live volvox.conf. It contains the following version of the
[EST] track:
     [EST]
     feature          = EST_match:est
     glyph            = segments
     height           = 6
     draw_target      = 1
     show_mismatch    = 1
     canonical_strand = 1
     bgcolor      = sub {
     		my $feature = shift;
     		my $name    = $feature->display_name;
     		if ($name =~ /\.5$/) {
     		   return 'red';
     		} else {
     		   return 'orange';
     		}
     	}
     group_pattern    = /\.[53]$/
     key              = ESTs
The key addition to this track configuration is the "draw_target",
"show_mismatch" and "canonical_strand" options. All options are true/false
flags, where 0 means false and 1 means true. draw_target tells the segments
glyph to draw the DNA sequence of the target ESTs when the magnification
allows. show_mismatch instructs the glyph to highlight mismatches between the
genome and the EST in pink. canonical_strand instructs the glyph to display the
plus strand sequence even when the EST matches the minus strand.
To see this work, reload the page, turn on the EST track and search for region
"ctgA:1065..1165". This will show the aligned 5' ends of agt221.5, agt830.5 and
agt767.5 (Figure 18). Notice that one of the T's towards the beginning of
agt830.5 is highlighted to show that it doesn't match the corresponding genomic
base.
     [figures/adding_dna_to_alignments1.gif]
     Figure 18: Multiple alignments at the DNA level
If you don't see the EST sequence appearing, make sure that ests.fa is in the
volvox database directory and is world readable. If it still isn't working, you
may need to "touch" the file in order to update its modification date. This
tells GBrowse that it is new and needs to be reindexed. In Unix:
     % touch /var/www/html/gbrowse/databases/volvox/ests.fa
If you are still having problems, remove the directory.index file completely in
order to force reindexing.
**** 2.9. Trace Data ****
If you have sequence trace information (in SCF format) associated with the
reference sequence, this can be displayed in gbrowse using the trace glyph. To
use this glyph, you must have installed:
  The Staden io-lib package
      staden.sourceforge.net
  zlib
      www.zlib.net
  The Bio::SCF perl module
      Available from CPAN
Note that at this time, it is not possible to use the trace glyph with Windows
servers, since we do not know of a version of the Staden io-lib package that
has been compiled for Windows.
The data file volvox9.gff3 contains an example trace entry.
     ctgA    example read   44401   45925   .       +       .       name
     trace; trace volvox_trace.scf
This aligns the full trace sequence to the reference sequence. The trace file
in this case is named "volvox_trace.scf", and it is located in /var/www/html/gbrowse/
databases/volvox_trace.scf
Due to sequence quality, the first few bases of a trace file usually don't
align. Even so, these need bases need to be included in the gff file. For
instance, if the bases 10-700 of the trace file aligns to the bases 100-800 of
the reference sequence, the feature would be 90-800 to account for the first 10
bases (starting at base 0).
     NOTE: The trace glyph currently doesn't deal with insertions or
     deletions. If an indel occurs, the alignment after the indel will be
     off.
Copy this file into the volvox database directory. Then, to display the trace,
copy the following into the volvox.conf (or copy volvox5.conf over the current
volvox.conf file).
     [Traces]
     feature      = read
     glyph        = trace
     fgcolor      = black
     bgcolor      = orange
     strand_arrow = 1
     height       = 6
     description  = 1
     a_color      = green
     c_color      = blue
     g_color      = black
     t_color      = red
     show_border  = 1
     trace_height = 80
     trace_prefix = http://localhost/gbrowse/tutorial/data_files/
     key          = Traces
The fgcolor, bgcolor, strand_arrow and height control the bar that shows the
location and directionality of the trace.
The trace_prefix option is important because it gives the path to the trace
files. This is prepended to the trace file name defined in the gff file. It can
be a direct path to the directory (eg "/usr/local/trace_files/") or a web
address (as above).
The a/c/g/t_color options allow configuration of the base colors. The
trace_height refers to the height of the trace itself. Play around with it to
find a height that you like.
If show_border is set to 1, a black box will be drawn around the trace.
After configuring the trace glyph, reload the browser page and enable traces.
Zoomed out you will see:
     [figures/trace1.png]
     Figure 19: The trace glyph zoomed out.
Zooming in will show you the trace diagram:
     [figures/trace2.png]
     Figure 19: The trace glyph zoomed in.
===============================================================================
***** 3. GBrowse Enhancements *****
In this section of the tutorial we'll discuss customizing the look and feel of
GBrowse by adding a regionview section, adding feature tracks to the regionview
and/or overview sections, configuring semantic zooming, and adding
functionality with plugins.
**** 3.1. Adding a "Region" Panel ****
The overview is the scale that appears at the top of the detailed image. It
always shows the entire reference sequence, whether it be a chromosome, a
contig or a clone. With larger genomes, you may want to supplement the overview
with a "region panel" that is intermediate in size between the overview panel
and the detail panel. The region panel can contain tracks of its own and is
useful for displaying features that are too numerous for the overview panel and
too large for the detail panel.
Open the volvox.conf configuration file and add the following line to the
[GENERAL] section. A good place is near the "max segment" and "default segment"
sections:
     # max and default segment sizes for detailed view
     max segment     = 50000
     default segment = 5000

     # size of the "region panel"
     region segment = 20000
Now when you reload the volvox page, you will see an intermediate panel labeled
"region", as shown in Figure 20:
     [figures/enhancements1.gif]
     Figure 20: The "region" panel shows a region intermediate in size
     between the overview and the detail panel.
You can declare region panel tracks in exactly the same way that you declare
overview tracks by declaring stanzas qualified by ":region"
     [TransChip:region]
     feature        = tprofile
     glyph          = xyplot
     graph_type     = boxes
     height         = 50
     min_score      = 0
     max_score      = 1000
     bgcolor        = blue
     scale          = right
     key            = Profile
Figure 21 shows what the region looks like with its "Profile" track turned on.
     [figures/enhancements2.gif]
     Figure 21: You can add any number of tracks to the region panel, just
     as you would for the overview panel.
**** 3.2. Putting Features into the Overview & Regionview ****
In many cases it is handy to add tracks directly to the overview and/or
regionview. These tracks can be turned on and off just like normal tracks, and
can serve as reference points for well-known genes, cytogenetic bands, or
genetic markers.
We will illustrate how to do this by placing a copy of the Motifs track into
the overview. Add the following to the bottom of the volvox.conf configuration
file:
     [Motifs:overview]
     feature      = polypeptide_domain
     glyph        = span
     height       = 5
     description  = 0
     label        = 1
     key          = Motifs
This stanza is identical to the [Motifs] track that we created earlier, except
that its name is qualified with ":overview". This tells GBrowse that this is
not an ordinary track to be placed in the detail image, but one that should be
placed in the overview.
We also want the overview motifs track to be displayed by default, so go to the
top of the configuration file, and modify the "default features" option to look
like this:
     # list of tracks to turn on by default
     default features = ExampleFeatures  Motifs:overview
Reload the page. Viola'! See Figure 22.
     [figures/overview1.gif]
     Figure 22: Any number of tracks can be placed in the overview
You can add as many tracks to the overview as you like. The main warning is
that if you add lots of features to the overview it can get pretty crowded in
there. Performance can also suffer, since each feature must be fetched and
rendered each time the overview is displayed.
To add a track to the region panel, simply replace ":overview" with ":
regionview" in the track stanza:
     [Motifs:regionview]
     feature      = polypeptide_domain
     glyph        = span
     height       = 5
     description  = 0
     label        = 1
     key          = Motifs
**** 3.3. Semantic Zooming ****
One of the cooler features of GBrowse is its ability to support semantic
zooming. Semantic zooming is a feature in which objects show different levels
of detail depending on the level of magnification. We've already seen this
behavior in the "dna" and "segments" glyphs, which show the DNA sequence only
when there's sufficient room to display it.
GBrowse has several types of semantic zooming:
  glyph-based, automatic
      The dna and segments glyphs, and others that support semantic zooming out
      of the box. This happens automatically and can't be modified.
  semantic labeling
      When there's sufficient room, GBrowse will print the label and
      descriptions next to the glyphs. The threshold at which this happens is
      under your control.
  semantic bumping
      When there's sufficient room, GBrowse will "bump" features to prevent
      them from colliding on the screen. When this would cause the display to
      become to high, bumping is suppressed. This threshold is also under your
      control.
  semantic options
      You can set track configuration sections up so that when a preset size
      threshold is exceeded, one configuration replaces another.
The thresholds for labeling and bumping are set by configuration options named
"label density" and "bump density" respectively. The standard values can be
found in the defaults track named [TRACK DEFAULTS]. They are originally set so
that labels are suppressed when there are more than 25 features per track, and
bumping is suppressed when there are more than 100 features per track. You can
these values globally by editing their values in [TRACK DEFAULTS], or you can
add "label density" and/or "bump density" options to individual track
configuration sections in order to override the settings for specific tracks.
The process of setting up semantic options is a bit more interesting. To
illustrate, we will create semantic zooming for the [Alignments] track
("Example Alignments"). We would like the track to shift from showing the
individual segments to showing solid rectangles when the user is zoomed out to
30K and beyond, and turn bumping off when the user is zoomed out to 45K and
beyond. The process is simple. Beneath the [Alignments] stanza, we add a stanza
qualified for zoomlevels of >= 30,000 and another stanza qualified for
zoomlevels of >= 45,000:
     [Alignments]
     feature      = match
     glyph        = segments
     key          = Example alignments

     [Alignments:30000]
     glyph        = box
     label        = 0

     [Alignments:45000]
     glyph        = box
     bump         = 0
     label        = 0
The format for semantic options is [Trackname:distance], where Trackname must
be the same as the non-qualified track, and distance is the length of the
region at which the semantic options will kick in. Only options that are
different from the non-qualified track need to be listed. According to the
configuration given above, when the user is looking at a region 30,000 bp or
longer, the glyph option will change to "box," which is a solid rectangle that
doesn't show any internal details. All other options, such as feature and key,
will be inherited from the [Alignments] track.
At 45,000 bp, the glyph is again set to box, and in addition the "bump" option
is set to zero, turning off collision control. Notice that options are
inherited from the unqualified track stanza, and not from the previous semantic
zoom level. If we had neglected to specify the glyph option in [Alignments:
45000], the glyph would have reverted to "segments."
Make these changes to volvox.conf, turn on the "Example Alignments" track, and
view the contig at 20K, 40K and 50K. At 40K, you'll see the alignments lose
their internal structure and be replaced by solid boxes (Figure 23). At 50K
they'll begin to overlap and the feature labels will be suppressed.
     [figures/semantic_zooming1.gif]
     Figure 23: Semantically zoomed alignments at 40K
**** 3.4. Grouping Tracks ****
The bottom of the GBrowse window contains an expandable set of checkboxes that
allows the users to turn tracks on and off. By default, the tracks are grouped
into sections corresponding to tracks belonging to the overview panel, those
belonging to the region panel, tracks created by external (third-party)
annotations, and tracks created by plugins. All other tracks are grouped
together in a catch-all section named "General."
You can easily define new track groups to make navigation easier. To do so,
just add a "category" option to each of the track stanzas. This option defines
the name of the category. Tracks that belong to the same category will be
grouped together, regardless of the order in which the track definitions appear
in the configuration file. For example, we can place the [Motifs] and the
[Translation] tracks into a section named "Proteins" by modifying their stanzas
to look like this:
     [Motifs]
     feature      = polypeptide_domain
     glyph        = span
     height       = 5
     description  = 1
     category     = Proteins
     key          = Example motifs

     [Translation]
     glyph          = translation
     global feature = 1
     height         = 40
     fgcolor        = purple
     start_codons   = 0
     stop_codons    = 1
     category       = Proteins
     translation  = 6frame
     key          = 6-frame translation
In this way we can create sections named "Alignments," "Examples," "Genes" and
"Proteins" and assign the appropriate tracks to them. The Tracks control
section will look something like Figure 24:
     [figures/enhancements3.gif]
     Figure 24: You can add any number of tracks to the region panel, just
     as you would for the overview panel.
     The file volvox_final.conf contains the final configuration file with
     all the modifications we've made during the course of this tutorial.
     The data files volvox_all.gff and volvox_all.fa likewise contain the
     entirety of the feature and DNA data.
**** 3.5. Grouping Tracks into a Table ****
A further refinement to display track information within the category is a
table display with headings for the rows and columns (see Figure 25 for an
example). This layout is useful for displaying data that highlights the
experimental design as in microarray or ChIP-on-Chip experiments.
     [figures/categorytable.png]
     Figure 25: An example of a category table containing a 9 track table,
     organized as 3 rows x 3 columns each with a heading.
This was constructed by adding an option named "category tables" to the
[GENERAL] section. The first argument in this option refers to the category you
wish to add the table to, the second is a space separated list of column
headings, the third a space separated list of row headings.
     # category table configuration
     category tables = 'ArrayExpts' 'strain-A strain-B strain-C'
     'temperature anaerobic aerobic'


It is then important that your stanzas within the category are in
column followed by row order (see example below and compare with
Figure 25). So stanza 1 is column 1/row 1, stanza 2 is column 1/row 2,
stanza 3 is column 1/row 3, stanza 4 is column 2/row 1, stanza 5 is
column 2/row 2 etc. This means each cell in the table must have a
stanza. Any surplus tracks within that category will be ignored. For
example if there was a stanza 10, this would not be shown.  If there
are empty tracks they can be disabled using the 'disabled = 1' option
in the stanza. So to display the category table in figure 27 you would
use the following configuration.

     [temp_strainA]
     category       = ArrayExpts
     feature        = temp_strainA_agg
     glyph          = xyplot
     bgcolor        = red
     neg_color      = green
     fgcolor        = black
     graph_type     = boxes
     height         = 80
     min_score      = -2.0
     max_score      = 2.0
     scale          = both
     key            = Temp strain A (1 expt)

     [anaerobic_strainA]
     category       = ArrayExpts
     feature        = anaerobic_strainA_agg
     glyph          = xyplot
     bgcolor        = red
     neg_color      = green
     fgcolor        = black
     graph_type     = boxes
     height         = 80
     min_score      = -2.0
     max_score      = 2.0
     scale          = both
     key            = Anaerobic Strain A (0 expt)
     disabled       = 1

     [aerobic_strainA]
     category       = ArrayExpts
     feature        = aerobic_strainA_agg
     glyph          = xyplot
     bgcolor        = red
     neg_color      = green
     fgcolor        = black
     graph_type     = boxes
     height         = 80
     min_score      = -2.0
     max_score      = 2.0
     scale          = both
     key            = Aerobic Strain A (0 expt)
     disabled       = 1


     [temp_strainB]
     category       = ArrayExpts
     feature        = temp_strainB_agg
     glyph          = xyplot
     bgcolor        = red
     neg_color      = green
     fgcolor        = black
     graph_type     = boxes
     height         = 80
     min_score      = -2.0
     max_score      = 2.0
     scale          = both
     key            = Temp strain B (2 expts)

     [anaerobic_strainB]
     category       = ArrayExpts
     feature        = anaerobic_strainB_agg
     glyph          = xyplot
     bgcolor        = red
     neg_color      = green
     fgcolor        = black
     graph_type     = boxes
     height         = 80
     min_score      = -2.0
     max_score      = 2.0
     scale          = both
     key            = Anaerobic Strain B (0 expt)
     disabled       = 1

     [aerobic_strainB]
     category       = ArrayExpts
     feature        = aerobic_strainB_agg
     glyph          = xyplot
     bgcolor        = red
     neg_color      = green
     fgcolor        = black
     graph_type     = boxes
     height         = 80
     min_score      = -2.0
     max_score      = 2.0
     scale          = both
     title          = blah
     key            = Aerobic strain B (3 expts)

     [temp_strainC]
     category       = ArrayExpts
     feature        = temp_strainC_agg
     glyph          = xyplot
     bgcolor        = red
     neg_color      = green
     fgcolor        = black
     graph_type     = boxes
     height         = 80
     min_score      = -2.0
     max_score      = 2.0
     scale          = both
     key            = Temp strain C (1 expt)

     [anaerobic_strainC]
     category       = ArrayExpts
     feature        = anaerobic_strainC_agg
     glyph          = xyplot
     bgcolor        = red
     neg_color      = green
     fgcolor        = black
     graph_type     = boxes
     height         = 80
     min_score      = -2.0
     max_score      = 2.0
     scale          = both
     key            = Anaerobic strain C (3 expts)

     [aerobic_strainC]
     category       = ArrayExpts
     feature        = aerobic_strainC_agg
     glyph          = xyplot
     bgcolor        = red
     neg_color      = green
     fgcolor        = black
     graph_type     = boxes
     height         = 80
     min_score      = -2.0
     max_score      = 2.0
     scale          = both
     key            = Aerobic strain C 3 (2 expts)


If you need to have multiple category tables, simply use continuation
lines for the "category tables" option:

     # category table configuration
     category tables = 'ArrayExpts' 'strain-A strain-B strain-C'
     'temperature anaerobic aerobic'
                       'CHiP-Chip'  'TFX1 ONE-CUT PHA4' '16-cell-stage
     320-cell-stage adult'


**** 3.6. Using Plugins ****


 Another cool GBrowse feature is its ability to take advantage of
plugins, which are small modules of Perl code that extend GBrowse in
various ways. In this section, we will show how to activate two
popular plugins, RestrictionAnnotator and Aligner.  The
first generates a track of restriction sites.  The second dumps a
text-based multiple alignment of the current region on view.


To see these plugins at work, first make sure that the database files
are up to date with this position in the tutorial.  If you are in any
doubt, remove the current contents of the volvox database directory
and replace them with the files volvox_all.gff3 and volvox_all.fa.


Now find the option "plugins=" at the top of volvox.conf, and modify
it to activate the Aligner and RestrictionAnnotator plugins:

     plugins = Aligner RestrictionAnnotator
When you reload the page, you will see a new popup menu appear under the image
labeled "Dumps, searches and other operations." You will also see an automatic
track labeled "plugin:Restriction Sites" appear in the track list. When you
turn on this track, you will be presented with a restriction map (Figure 26).
You can then adjust which restriction sites are shown by selecting "Annotate
Restriction Sites" from the popup menu and pressing the "Configure" button.
     [figures/plugins1.gif]
     Figure 26: The RestrictionAnnotator Plugin
To see the Aligner at work, center your view on a region that contains the EST
alignments (for example, ctgA:1000..5000), select "Dump Alignments" from the
plugin popup menu, and press "Go". This will return a text-based multiple
alignment of the genome and the EST tracks.
The Aligner plugin has some additional configuration that you can perform.
We'll look at this now as an example of how to configure plugins. Open up
volvox.conf and add the following configuration section:
     ########################
     # Plugin configuration
     ########################

     [Aligner:plugin]
     alignable_tracks   = EST
     upcase_tracks      = CDS Motifs
     upcase_default     = CDS
It doesn't matter where the section goes, but it is probably a good idea to
place this towards the middle of the file after the [GENERAL] section (at the
top) and before the [TRACK DEFAULTS] section. Otherwise it is easy for you or
someone else maintaining the configuration file to mistake this for some sort
of track configuration.
Plugin configuration sections are distinguished from track configuration by
having names of the format PluginName:plugin. In this case, the three
configuration options are applied to the Aligner plugin. For the Aligner
plugin, the configuration options are:
 _____________________________________________________________________________
|Option__________|Description_________________________________________________|
|                |Space-delimited list of tracks to include in the multiple   |
|alignable_tracks|alignment. The genome is always included. If this option is |
|                |not present, then GBrowse will automatically include any    |
|________________|track_that_has_the_"draw_target"_option_set.________________|
|                |Space-delimited list of tracks that will be used to UPCASE  |
|                |the genomic DNA. This is very useful if you want to embed   |
|                |the positions of coding regions or other features inside the|
|upcase_tracks   |multiple alignment. Uppercasing will not be turned on by    |
|                |default. The user must press the "Configure" button, and    |
|                |select which of the uppercase tracks are to be activated    |
|________________|from_a_list_of_checkboxes.__________________________________|
|upcase_default  |A space-delimited list of tracks that will be uppercased by |
|________________|default_unless_the_user_turns_them_off_during_configuration.|
|                |A small integer indicating that the aligner should include  |
|ragged_default  |some unaligned bases from the end of each sequence. This is |
|                |useful for seeing the sequencing primer or cloning site in  |
|________________|ESTs._______________________________________________________|
With the changes in place, select the aligner from the popup menu and press
Configure. Turn on uppercasing of the coding region track and see how it
affects the display (Figure 27).
     [figures/plugins2.gif]
     Figure 27: The Aligner plugin produces multiple alignments.
Plugin files live in /etc/httpd/conf/gbrowse.conf/plugins. To view plugin documentation,
find the plugin file, which usually lives under gbrowse.conf/plugins, and run
the perldoc command with the -F ("file") option:
     % perldoc -F Aligner.pm
Here's the list of plugins that come with the standard distribution:
 _____________________________________________________________________________
|Plugin______________|Description_____________________________________________|
|Aligner_____________|Dump_multiple_alignments________________________________|
|AlignTwoSequences   |Execute NCBI's bl2seq on the current view (requires the |
|____________________|bl2seq_executable)._____________________________________|
|AttributeHiliter    |Highlight (by colorizing) features whose attributes     |
|____________________|match_some_user-specified_values._______________________|
|                    |Allows the user to cut and paste a series of landmarks  |
|BatchDumper         |on the genome and dumps out all overlapping features    |
|____________________|using_a_variety_of_formats_(e.g._GenBank_format)________|
|Blat                |Plugin to align sequences against the genome using the  |
|____________________|BLAT_algorithm_(requires_BLAT_executable).______________|
|CMapDumper          |Produces files that can be read by the CMap_comparative |
|____________________|map_browser.____________________________________________|
|CreateBlastDB       |Creates a Blast-formatted database from a GBrowse       |
|____________________|database._______________________________________________|
|                    |Produce pretty-printed FASTA dumps of the current       |
|FastaDumper         |region, with selected features highlighted with colors  |
|____________________|or_font_styles._________________________________________|
|                    |Small demonstration of how to write a plugin that       |
|FilterTest          |filters features (makes them visible or invisible) based|
|____________________|on_arbitrary_criteria.__________________________________|
|GeneFinder          |Runs Phil Green's genefinder gene prediction program    |
|____________________|within_GBrowse_(requires_genefinder_executable).________|
|GFFDumper           |Dump out the current region in GFF format (redundant    |
|____________________|with_BatchDumper).______________________________________|
|OligoFinder         |Lets the user search for landmarks on the basis of      |
|____________________|unique_11-mers_or_greater.______________________________|
|PrimerDesigner      |Interactively design PCR primers (requires primer3      |
|____________________|executable).____________________________________________|
|ProteinDumper       |Dump translated protein sequences of the current region |
|____________________|in_various_formats______________________________________|
|                    |Small demonstration of how to connect a plugin to a gene|
|RandomGene          |prediction program. Doesn't actually predict genes, but |
|____________________|generates_simulated_ones._______________________________|
|RestrictionAnnotator|Creates_restriction_maps._______________________________|
|                    |Generate DNA spectrograms to highlight low complexity   |
|Spectrogram         |regions, repetitive regions, coding regions and other   |
|                    |regions with periodicity. Requires Math::FFT Perl       |
|____________________|module._________________________________________________|
|Submitter           |Helper plugin for the rubber-band select menu. See      |
|____________________|GBrowse_Rubber-band_selection.__________________________|
|test                |This dumps the current view in FASTA format, and is used|
|____________________|for_regression_testing_the_plugin_architecture._________|
===============================================================================
***** upload">4.1. Uploading an Annotation File *****
First, we'll look at how to upload private tracks to the browser. This method
is intended for users who wish to view their own data in the context of the
genome, and don't want to share the track with others.
Instead of using the artificial volvox data, we will now use some real genome
annotations from the C. elegans genome project. This is a region around C.
elegans cosmid C01F4. The core data that we'll be using is contained in the
files elegans_core.gff3, and elegans.fa.
Refer back to the beginning_of_the_tutorial now and create a GBrowse database
directory named "elegans_core". Then copy elegans_core.gff3, and elegans.fa
into it. The configuration file to use is elegans_core.conf. Place it in /etc/httpd/conf/
gbrowse.conf/.
Confirm that you can browse the database. Figure 28 is a picture of the entire
data set with all core tracks turned on.
     [figures/third_party1.gif]
     Figure 28: The core C. elegans dataset.
We will now add some third-party annotations to the display. These are
contained in the files "elegans_acceptor.gff3", "elegans_expression.gff3",
"elefans_sts.gff3", "elegans_deletion.gff3", and "elegans_repeats.gff3":
 _____________________________________________________________________________
|elegans_acceptor.gff3  |Annotations of C. elegans spliced leader acceptor    |
|_______________________|sites._______________________________________________|
|elegans_expression.gff3|Positions assayed for gene expression level in C.    |
|_______________________|elegans_microarrays._________________________________|
|elegans_sts.gff3       |Primer pairs available for the region produced by the|
|_______________________|C._elegans_ORFeome_project.__________________________|
|elegans_deletion.gff3  |Deletion endpoints from a targeted gene knockout     |
|_______________________|project._____________________________________________|
|elegans_repeats.gff3   |Complex repetitive elements found using the          |
|_______________________|RepeatMasker_program.________________________________|
We can load each of these files to private storage located on the server using
the file upload feature. Copy these five files to your home directory where you
can find them easily. Go to the section marked Upload your own annotations and
choose the "Browse..." button. Select one of the annotation files, and then
press the "Upload" button to upload the file to the server. The annotations
contained in the file should now appear on the display. If you now do this for
all five of the annotation files, you will eventually get a display like that
shown in Figure 29.
     [figures/third_party2.gif]
     Figure 29: After uploading four annotation files.
     NOTE: This upload function works even if the gbrowse you are
     uploading to is located on a remote server. The uploaded files are
     stored in a private directory on the server away from the main data
     set. Other users cannot see your data.
Although this display is functional, there is no difference between the
appearance of each of the tracks. Fortunately, we can customize the uploaded
files quite easily. Let us change the "elegans_sts.gff3" file so that the
primer pairs use the "primers" glyph. We can either do this by deleting the
uploaded file, making the appropriate modification to our local version and
then reuploading it, or by editing the file in place. We'll take the latter
course.
Scroll to the bottom of the browser window, find the uploaded file named
"elegans_sts.gff3", and choose "Edit File...".
     [figures/third_party3.gif]
     Figure 30: The uploaded files can be edited in place by clicking the
     "Edit File..." button.
This will take you to a simple text editor window. At the top of the window,
add the following configuration stanza:
     # edited elegans_sts.gff3 file
     [Orfeome Primers]
     feature = reagent
     glyph   = primers
     height  = 6
     key     = ORFeome project primer pairs
When you are done, press "Submit Changes..." and the display will be updated to
show the track with a more readable track name and the primers glyph.
If you like, you can customize each of the files. Here is a suggested set of
customizations:
     # for the file elegans_repeats.gff3
     [Repeats]
     feature = repeat
     bgcolor = white
     key     = Complex repeats

     # for the file elegans_acceptor.gff3
     [Acceptors]
     feature = trans-splice_acceptor_site
     glyph   = diamond
     bgcolor = red
     key     = Trans-splice Acceptors

     # for the file elegans_deletion.gff3
     [Deletions]
     feature=deletion
     glyph  = span
     key    = Gene knockouts

     # for the file elegans_expression.gff3
     [Expression]
     feature = microarray_oligo
     bgcolor = orange
     height  = 4
     key     = Microarray expression probe
With this combination of configurations, the display will now look as shown in
Figure 31:
     [figures/third_party4.gif]
     Figure 31: After customizing the annotation files.
     NOTE: Be aware of an important difference between the track
     configuration of the uploaded files and of the main GBrowse
     configuration files. In GBrowse, the [STANZA] heading is the name of
     the symbolic name of the track, and particular feature types are
     added to the track using the feature= option. In uploaded files, the
     [STANZA] heading is the feature type itself. This means that each
     track can only contain one feature type. However, any uploaded GFF
     file can contain multiple feature types, and each feature type can
     have its own configuration stanza.
     The other important difference between the uploaded file
     configuration and the GBrowse main configuration is that for security
     reasons Perl subroutines are not allowed in the configuration
     sections of uploaded files. However links and link patterns are
     allowed.
There is no particular reason that each of the annotation sets were broken into
separate files. We could easily combine them into a single GFF file just as you
do for the core annotations.
**** 4.2._Sharing_an_Annotation_File ****
Once you have an uploaded annotation file set up the way you like it, you might
want to share it with others. You can do this easily if you have access to an
anonymous FTP or web server (if you are reading this tutorial, it is fair to
assume that you do!)
To watch this in action, we will place one of the annotation files onto the
local web server and then load it from within the local GBrowse. This contrived
example doesn't make much sense until you realize that the same trick will work
when the GBrowse server and the web-accessible annotation file can be on
separate machines halfway across the world.
We will demonstrate using the elegans_sts.gff3 file. Please use a version that
has been edited to place the [reagent] configuration stanza at the top. Then
copy this file to the directory "/var/www/html". This will place it at the top of the
Web server document tree, but outside the location of GBrowse databases. Check
that the file is correctly installed on your web server by fetching this URL:
http://localhost/elegans_sts.gff3. If the file is correctly installed on the
Web server, you will see this:
     [Orfeome Primers]
     feature = reagent
     glyph   = primers
     height  = 6
     key     = ORFeome project primer pairs

     ##gff-version 3
     ##date Tue Feb 24 06:39:41 2004
     ##source gbrowse GFFDumper plugin
     ##NOTE: Selected features dumped.
     C01F4	Orfeome_project	reagent	3319	17668	.	+	.	Name=mv_ZK783.1;amplified=0
     C01F4	Orfeome_project	reagent	18584	20445	.	-
     	.	Name=mv_G_YK5686;amplified=1
     C01F4	Orfeome_project	reagent	24509	25425	.	-
     	.	Name=mv_ZK783.3;amplified=1
     C01F4	Orfeome_project	reagent	26525	33359	.	-
     	.	Name=mv_ZK783.4;amplified=0
     C01F4	Orfeome_project	reagent	38660	49506	.	+	.	Name=mv_C18H2.1;amplified=1
Now go back to your browser, and delete all the uploaded files. (This is to
prevent the list of tracks from getting too long!) You can do this by scrolling
to the bottom of the browser window and pressing "Delete File" for each of the
annotation files that you previously uploaded. This should return you to the
display of the core gene models and EST alignments that we began with.
Now we'll reload the STS annotations by using their URL. Scroll to the bottom
of the window, find the text field labeled "Enter Remote Annotation URL", type
in http://localhost/elegans_sts.gff3, and press "Update URLs." The "ORFeome
project primer pairs" track will reappear.
In order to make this process even simpler, you can create a popup menu
containing the URLs of frequently-accessed remote annotation files. To make
this more interesting, first copy the elegans_expression.gff3 file to the
"/var/www/html" directory in the way described earlier. Now elegans_sts.gff3 and
elegans_expression.gff3 will be available as the URLs http://localhost/
elegans_sts.gff3 and http://localhost/elegans_expression.gff3, respectively.
Open up the GBrowse configuration file, "/etc/httpd/conf/gbrowse.conf/elegans_core.conf",
and insert the following lines right after the "plugins =" line:
     # remote GFF files to make available for optional loading
     remote sources = "ORFeome STSs"      http://localhost/
     elegans_sts.gff3
     	         "Expression probes" http://localhost/
     elegans_expression.gff3
When you reload the web page, you will see a popup menu appear next to the
remote annotation URL textfield (Figure 32). The menu will contain options to
load "ORFeome STSs" and "Expression probes", and selecting a menu item will
have exactly the same effect as typing in the URL manually.
     [figures/third_party5.gif]
     Figure 32: The preset remote annotation URL popup menu.
The neat thing about all this is that it works across the Internet. Send the
URL of the annotation files to your colleagues (being sure to replace
"localhost" with the hostname of your web server!) and they'll be able to load
this URL into any GBrowse that uses the same core annotations. You can also use
this mechanism within your laboratory or department to share annotation sets
without having to give everyone write access to the web server's /var/www/html
directory.
To remove a URL from the list of loaded URLs, just delete it from its text
field and reload.
**** 4.3 Using GBrowse as a DAS Server or Client ****
The Distributed Annotation System protocol (DAS; http://www.biodas.org) is a
system for exchanging genomic annotations across the Internet. It works
similarly to the idea of sharing the URLs of web-accessible GFF files, except
that it is designed to support large data sets. When a client application needs
to fetch just a subset of the data, such as a small piece of a chromosomal arm,
the DAS protocol allows only the relevant annotations to be retrieved, rather
than the whole data set.
To take advantage of DAS functionality, you will have to install the Perl Bio::
Das module. This is available from CPAN (the Comprehensive Perl Archive Network
(http://www.cpan.org) or from the GMOD PPM repository. Unix users can install
Bio::Das with this command:
     % perl -MCPAN -e 'install Bio::Das'
Windows users can use the PPM tool:
     C:\Windows> ppm
     ppm> install Bio::Das
     ppm> quit
You may need to issue the command "rep add gmod http://www.gmod.org/ggb/ppm" if
PPM complains that it cannot find Bio::Das.
When you installed GBrowse, you also installed a CGI script that enables your
web server to act as a DAS server. The CGI script is named "/var/www/cgi-bin/das", and
it runs off the same configuration files as GBrowse itself. Only a very small
bit of extra configuration is required to enable full DAS server functionality.
In this part of the tutorial we will first turn on the DAS server, and then use
it to serve out annotations on the C. elegans database.
To start, open the elegans_core.conf configuration file and add the following
line to the configuration file. It can go anywhere before the start of the
track definition stanzas, but it is probably a good idea to place it towards
the top between "plugins" and "default features."
     # DAS reference server
     das mapmaster      = SELF
What this line is doing is to declare to the DAS system that our server is
authoritative for the coordinates on the current C. elegans genome example.
This is appropriate if you are starting out a genome for the first time. If,
however, you want to annotate against an existing set of genome coordinates,
you should replace SELF with the URL of the DAS reference server that serves
that genome. For example release hg16 of the human genome at UCSC corresponds
to DAS URL http://genome.cse.ucsc.edu/cgi-bin/das. A list of reference servers
for various model organisms can be found at http://www.biodas.org.
The next step is to go through the configured tracks and add a "das category"
to each of them. DAS uses the idea of the "category" of a feature in order to
filter sets of features by their purpose. Categories include:
 ____________________________________________________________________________
|transcription|features_that_have_to_do_with_RNA_transcription_______________|
|translation__|features_that_have_to_do_with_protein_translation_and_function|
|variation____|mutations,_deletions,_polymorphisms___________________________|
|structural___|contigs,_clones,_reads,_PCR_primers___________________________|
|repeat_______|repetitive_elements___________________________________________|
|experimental_|a_catch-all_for_experimental_data_____________________________|
|miscellaneous|anything_that_doesn't_fit_in_one_fo_the_other_categories______|
Find the [Transcripts] stanza and modify it to to have a das category of
"transcription" as shown here:
     [Genes]
     feature      = gene
     glyph        = gene
     height       = 8
     bgcolor      = blue
     description  = 1
     das category = transcription
     key          = Protein-coding genes
Similarly, modify the [Alignments] track to have a das category of
"similarity." You do not need to add a category to the DNA track, as it is
treated specially by das. You're all done! Be sure to save the configuration
file before you try the next step.
Using a web browser fetch the URL http://localhost/cgi-bin/das/dsn. This will
return an XML document giving information about each of the data sources that
you have configured.
     <?xml version="1.0" standalone="yes"?>
     <!DOCTYPE DASDSN SYSTEM "http://www.biodas.org/dtd/dasdsn.dtd">
     <DASDSN>
        <DSN>
           <SOURCE id="elegans_core">elegans_core</SOURCE>
           <MAPMASTER>http://localhost/cgi-bin/das/elegans_core</
     MAPMASTER>
           <DESCRIPTION>C. elegans Core Annotations</DESCRIPTION>
        </DSN>
     </DASDSN>
This is showing that there is one configured DAS source, the "elegans_core"
data set.
Next test that the DAS "types" request is working. This request returns all the
feature types that the database knows about. Using a web browser fetch the URL
http://localhost/cgi-bin/das/elegans_core/types. This should return another
short document confirming that the "gene" and "EST_match:BLAT_EST_BEST" feature
types are available.
The final test that the DAS server is performing correctly is to browse to the
elegans_core database and to turn off all the tracks except for DNA/GC content.
This should give you an empty details panel. Now scroll down to the first empty
URL entry field and type in http://localhost/cgi-bin/das/elegans_core and press
"Update URLs." The page should now reload and display the gene models and the
EST alignments. However, the data is now not coming directly from the local
database, but from the database via the DAS protocol.
*** 4.3.1. Combining Databases with DAS ***
We can now use DAS to integrate the core gene model and EST alignment
annotations with the STSs, expression data, trans-splice acceptors and other
third party annotations. To do this, we will create a GBrowse database that
contains the third party annotations, but not the core data. This new database
will be used as a DAS source.
Create a new database directory called elegans_extra in the "/var/www/html/gbrowse/
databases" directory, and add to it a copy of the file elegans_extra.gff3. This
GFF file is simply the result of concatenating together the individual
annotation files we looked at earlier (elegans_sts.gff3, etc), and removing the
redundant comment lines from the top of the file. Now copy the configuration
file elegans_extra.conf into the /etc/httpd/conf/gbrowse.conf/ directory. Have a look at
this config file, and note that it contains the appropriate "das mapmaster" and
"das category" configuration objects.
Once the config file is installed, confirm that you can browse the extra
annotations by fetching http://localhost/cgi-bin/gbrowse/elegans_extra.
Now we're ready to layer the extra annotations onto the core annotations using
DAS. Open up a browser window on the http://localhost/cgi-bin/gbrowse/
elegans_core database. Delete any URLs that are already listed in the "Add
remote annotations" area, and add the URL "http://localhost/cgi-bin/das/
elegans_extra." When you reload, the core annotations will be shown on top, and
the annotations from the elegans_extra database will be shown in four tracks at
the bottom of the display.
The power of this feature is that we can use it across the Internet to
integrate databases that are independently maintained. For example, try adding
the DAS URL http://dev.wormbase.org/db/seq/das/elegans_even_more, and see what
appears.
By default, when you enter a DAS URL, the system will load all the feature
types that the DAS server makes available. If this is not desirable, you can
limit the tracks by type and/or category. To find out what feature types a DAS
server supports, retrieve a URL like the following: http://localhost/cgi-bin/
das/elegans_extra/types. This will provide a list of feature type names and
their functional categories. From this we can see that the elegans_extra
database exports types of "repeat", "trans-splice_acceptor," "Deletion_allele,"
and "Expression." Of course, we already knew this since we set the database up
ourselves!
Using this information, you can now limit the number of tracks retrieved from
the DAS server to just those that are of interest to us. In the "Add remote
annotations" text field, replace the current DAS URL with this one: http://
localhost/cgi-bin/das/elegans_extra?type=repeat. When you reload, you will see
only the repeat track and not the other three.
What if we want to see two of the four tracks? We just add additional type=
sections, separated by semicolons. To see both the "repeat" and "Expression"
tracks, we could request http://localhost/cgi-bin/das/
elegans_extra?type=repeat;type=Expression (Figure 33).
     [figures/DAS1.gif]
     Figure 33: The C. elegans core annotations database with the "repeat"
     and "Expression" tracks superimposed on it using DAS.
To fetch features that match a particular category, we can add the category=
option to the URL. For example, to fetch only features that have to do with RNA
transcription, you can request http://localhost/cgi-bin/das/
elegans_extra?category=transcription.
We can take advantage of this feature to add a menu of external DAS annotations
to the browser. Open "/etc/httpd/conf/gbrowse.conf/elegans_core.conf" and insert the
following section right after the "plugins =" line:
     # remote DAS data to make available for optional loading
     remote sources =
        "DAS mRNA features"      http://localhost/cgi-bin/das/
     elegans_extra?category=transcription
        "DAS protein features"   http://localhost/cgi-bin/das/
     elegans_extra?category=translation
        "DAS repeat features"    http://localhost/cgi-bin/das/
     elegans_extra?category=repeat
        "DAS variation features" http://localhost/cgi-bin/das/
     elegans_extra?category=variation
        "DAS experimental features" http://localhost/cgi-bin/das/
     elegans_extra?category=experimental
When you reload, the page will now show a popup menu of pre-defined DAS sources
that users can choose. The DAS sources can be local, as shown here, or located
on one or more remote web sites.
4.3.2. Exporting DAS Tracks to Ensembl and other Genome Browsers
GBrowse DAS tracks can be layered onto Ensembl and other DAS-aware genome
browsers. There are a couple of things to bear in mind:
   1. Only the tracks explicitly labeled with "das category" will be exported.
   2. The range of glyphs supported by Ensembl is more limited than GBrowse.
The last is a gotcha. The official list of DAS-recognized glyphs can be found
here, but gbrowse has a larger number of glyphs. Because of this, DAS-exported
features may not look on Ensembl the way they look on GBrowse. There are three
workarounds for this:
  The das flatten option
      Set this option to flatten a multi-part feature, such as a gene, into a
      simpler "flat" structure that will display correctly on the Ensembl
      contig viewer. Also be sure to specify "grouping true" when you configure
      Ensembl for this DAS source.
  The das glyph option
      Set this option in an exported track stanza in order to force the glyph
      to a standard DAS glyph, such as "box". For example:
                 das glyph = box
  The das type option
      Ensembl and possibly other browsers treat certain feature types
      specially. In particular, if a feature has a type of "gene" then Ensembl
      will display it with angled introns. Set das type in a track stanza to
      force the reported type to one of these special values. Example:
                 das type = gene
*** 4.3.3. Running GBrowse off DAS Entirely ***
If you wish, you can even run GBrowse off a remote DAS server entirely and keep
no data locally (or just maintain private annotation tracks). This works by
replacing the Bio::DB::GFF database adaptor that we have been using up to now
with an adaptor named "Bio::Das". However, because of a poorly characterized
interaction between the Bio::Das module and Perl 5.6, it is recommended that
you use Perl 5.8.1 or higher for this. Otherwise you may experience out of
memory errors.
To watch this in action, we will run GBrowse off the UCSC genome browser, which
exports its data in DAS format.
We will need a configuration file to do this. DAS-based configuration files are
almost identical to the ones we have been using up to now for local databases.
The main change is to replace the "db_adaptor" and "db_args" options with ones
appropriate for the DAS data source. For example, for the "hg16" human genome
database maintained at UCSC, the appropriate options will be:
     [GENERAL]
     description   = Human July 2003 Genome at UCSC
     db_adaptor    = Bio::Das
     db_args       = -source http://genome.cse.ucsc.edu/cgi-bin/das
     	        -dsn    hg16
Conveniently enough, recent versions of the GBrowse distribution include a
utility called "make_das_conf.pl" that will build a basic DAS browser
configuration file for you. This utility was installed for you when you
installed GBrowse. To run it, you will need to know the base URL of the DAS
server you're going to display. For our example, we'll use the UCSC DAS server
at http://genome.cse.ucsc.edu/cgi-bin/das.
This is a command-line utility. To find out the databases served by UCSC, type
in the following command at the Unix or Windows command line:
     % make_das_conf.pl http://genome.cse.ucsc.edu/cgi-bin/das
     The following DAS URLs are available at this server.  Please call the
     script again
     using one of the following URLs:

     http://genome.cse.ucsc.edu/cgi-bin/das/dm1
     	Fruitfly Jan. 2003 Genome at UCSC

     http://genome.cse.ucsc.edu/cgi-bin/das/hg13
     	Human Nov. 2002 Genome at UCSC

     http://genome.cse.ucsc.edu/cgi-bin/das/hg15
     	Human April 2003 Genome at UCSC

     http://genome.cse.ucsc.edu/cgi-bin/das/hg16
     	Human July 2003 Genome at UCSC

     http://genome.cse.ucsc.edu/cgi-bin/das/rn3
     	Rat Jun 2003 Genome at UCSC
     [... many many more ...]
We're looking for the hg16 release, so we reissue make_das_conf.pl again using
UCSC DAS server's URL with the hg16 release number appended to the end:
     % make_das_conf.pl http://genome.cse.ucsc.edu/cgi-bin/das/hg16
     [GENERAL]
     description   = Human July 2003 Genome at UCSC
     db_adaptor    = Bio::Das
     db_args       = -source http://genome.cse.ucsc.edu/cgi-bin/das
     	        -dsn    hg16

     # examples to show in the introduction
     examples = 10 10_random 11 12 13 13_random 14 15 15_random
           16 17 17_random 18 18_random 19 19_random 1 1_random
           20 21 22 2 2_random 3 3_random 4 4_random 5 5_random
           6 6_random 7 7_random 8 8_random 9 9_random M
           Un_random X X_random Y

     das mapmaster = http://genome.cse.ucsc.edu:80/cgi-bin/das/hg16

     aggregators = ECgene{ECgene}
            affy10K{affy10K}
            affyGeno{affyGeno}
            affyRatio{affyRatio}
            affyTranscriptome{affyTranscriptome}
            affyU133{affyU133}
            affyU95{affyU95}
     [...much much more...]
If you tried this at the command line, you saw a lot of text scroll up your
screen and disappear forever. Run the command again, and this time redirect its
output into a new configuration file named "ucsc_hg16.conf":
     % make_das_conf.pl http://genome.cse.ucsc.edu/cgi-bin/das/hg16
     >/etc/httpd/conf/gbrowse.conf/ucsc_hg16.conf
That should be all you need to do, unless you are behind a firewall that uses
an HTTP proxy. In this case, you will need to edit the "db_args" option in the
generated configuration file to include a -proxy option. This tells gbrowse to
fetch the remote data using the indicated proxy. For example:
     [GENERAL]
     description   = Human July 2003 Genome at UCSC
     db_adaptor    = Bio::Das
     db_args       = -source http://genome.cse.ucsc.edu/cgi-bin/das
     	        -dsn    hg16
                     -proxy  http://my.proxy.address
Try browsing the new data source by requesting http://localhost/cgi-bin/
gbrowse/ucsc_hg16, and you should be able to browse through a rudimentary
version of the Human genome display.
Once you have a basic configuration file for a remote DAS source, you can
pretty it up by changing track styles, key names, and so forth. Bear in mind
that the make_das_conf.pl does its best to guess about the right landmarks to
use in the list of examples in the instructions, which feature types should be
made the defaults for searching, and how to aggregate multi-part features
together. You will almost certainly need to customize these options to meet
your needs.
===============================================================================
***** 5. Using Other Backends *****
Till now, we've been using the Bio::DB::SeqFeature::Store in-memory adaptor.
This adaptor is suitable for small databases, but does not scale well to
realistically-sized genomes. This section will show you how to create large
genome annotation databases using the Berkeleydb and Mysql adaptors. For a
full-featured genome database that includes annotations of gene structure and
function, as well as genetic maps, diversity information and phenotypic
information, be sure to check out the Chado_database which is significantly
more feature-rich than those described here.
**** 5.1. The Berkeleydb Backend ****
The in-memory database is great for smaller data sets, and can handle GFF files
of up to about 20,000 features (more if you have lots of memory). For larger
data sets, however, you'll want to use a database management system. GBrowse
handles a number of DBMS through its "database adaptor" system. This section
shows how to use the Bio::DB::GFF berkeleydb adaptor that comes for free when
you install BioPerl; this will enable you to create databases of 10 million or
more features. The next section shows you how to install a MySQL relational
database that will support even larger data sets. You may skip these sections
and move on to working with third-party annotations if you do not wish to
install a berkeleydb-based server at this time.
The Berkeleydb database adaptor comes with BioPerl 1.51 or higher (still under
development at the time this tutorial was written). If you have an older
version of BioPerl, GBrowse will install the adaptor for you. As its name
implies, this adaptor uses the Berkeleydb database system (http://
www.sleepycat.com) to create indexed database files from GFF feature files. The
adaptor also requires the Perl DB_File interface to Berkeleydb. If you are
using a Linux or Mac OSX system, you almost certainly have both Berkeleydb and
DB_File already installed. For Windows users of ActiveState Perl, you should
confirm that DB_File is installed by running the following command:
     C:\> perl -MDB_File -e 'print $DB_File::VERSION'
If this prints out a number, then you are golden. If you get an error, you
should reinstall DB_File by running the PPM tool:
     C:\> ppm
     PPM interactive shell (2.1) - type 'help' for available commands.
     PPM> install DB_File
It is an extremely simple task to convert an existing in-memory database to use
the Berkeleydb database. We will now convert the Volvox example database to
Berkeleydb.
Take the most recent version of the volvox.conf configuration file, and edit
the top few lines of the new file so that it looks like this:
     [GENERAL]
     description   = Volvox Berkeleydb Database
     db_adaptor    = Bio::DB::SeqFeature::Store
     db_args       = -adaptor berkeleydb
     	        -dir     '/var/www/html/gbrowse/databases/volvox'
We made just two changes. First, we changed the description of the database to
"Volvox Berkeleydb Database" to distinguish it from the in-memory database.
Second, we changed the value of the -adaptor option from "memory" to
"berkeleydb".
Now reload the volvox page in your browser. There will be a slight delay as the
Berkeleydb adaptor constructs its indexes, and then the page will reappear. You
should now be able to browse and search the database exactly as before.
Depending on how fast the memory adaptor was to begin with, you may not notice
a speed improvement; however, with large GFF files, the performance improvement
will be very marked.
If you look in the volvox database directory, you will see a new subdirectory
named "index". This contains a set of index files that allow gbrowse to find
features quickly. They are automatically created and updated as needed when the
underlying GFF or FASTA files are changed.
If you get an "Internal Server Error" or similar message, check the server
error log file for messages that explain what went wrong. The most common
problem is that the volvox database directory is not writeable by the web
server user. As described earlier, this directory must be "world writeable" in
order to allow the web server to create and maintain the databases
*** 5.1.1. The bp_seqfeature_load.pl script ***
Although it is convenient to maintain the Berkeleydb indexes automatically,
this mechanism has a number of disadvantages. One disadvantage is that this
mechanism requires the database directory to be world writeable (or at least
writeable by the web user), which may not be acceptable in some installations.
Another disadvantage is that the indexing may take a long time, up to 10
minutes for a GFF databases containing a million lines. Some web servers will
time out during this process. For large databases, it is better to explicitly
create the database index files using the bp_seqfeature_load.pl program.
bp_seqfeature_load.pl is a BioPerl utility that is described in more detail in
The_MySQL_Backend. It takes as its input a series of GFF and FASTA files and
creates the appropriate database files. To see how to use it, we will create a
fresh database directory. Go to the GBrowse database located at /var/www/html/
gbrowse/databases and create a new subdirectory called "volvox_bdb:"
      % cd /var/www/html/gbrowse/databases
      % mkdir volvox_berkeley
On Windows systems you can use the file manager to create this new folder.
You do not have to make this directory world writeable, but it should be
readable and executable by the user that the web server runs as. Now enter the
tutorial data files directory (/var/www/html/gbrowse/tutorial/data_files) and load
the GFF and sequence files using the following command:
     % bp_seqfeature_load.pl -c -a berkeleydb -f -d /var/www/html/gbrowse/
     databases/volvox_berkeley volvox_all.fa volvox_all.gff
     loading volvox_all.fa...


     Building object tree... 0.00s
     Loading bulk data into database... 0.01s
     load time:  0.02s
     loading volvox_all.gff3...


     Building object tree... 0.00s
     Loading bulk data into database... 0.00s
     load time:  0.08s
The arguments to bp_load_gff.pl are:
-a                            Use the berkeleydb database adaptor.
-c                            clear (initialize) the database
-f                            use the fast loading option
-d /var/www/html/gbrowse/databases/ Load the data into the indicated database
volvox_berkeley               directory.
volvox_all.fa volvox_all.gff  The data files to load.
If all goes well, this will create the index files in /var/www/html/gbrowse/
databases/volvox_bdb. If you look in that directory now, you'll see a series of
index files.
The last step is to modify the volvox.conf to point to this directory. Open it
in a text editor and modify the top part so that it looks like this:
     [GENERAL]
     description   = Volvox Berkeleydb Database
     db_adaptor    = Bio::DB::SeqFeature::Store
     db_args       = -adaptor berkeleydb
     	        -dsn    '/var/www/html/gbrowse/databases/volvox_berkeley'
The change here is to replace the -dir argument with -dsn ("data source name").
This tells the Berkeleydb adaptor that pre-made index files can be found in the
indicated directory. It will not attempt to update the index files
automatically.
If you wish to update the indexes with new GFF or sequence data, you should run
the bp_load_gff.pl script again to update the indexes. Using the -c flag will
reinitialize the indexes from scratch, erasing whatever was there before.
Without this flag, the provided GFF and/or sequence data will be incrementally
added to the indexes.
**** 5.2. The MySQL Backend ****
The Bio::SeqFeature::Store MySQL adaptor is an interface to the open source
MySQL database management system. Its performance is similar to that of the
Berkeleydb adaptor, but it has better provisions for error recovery and is safe
to use in environments where multiple users write to the database
simultaneously. In addition, the MySQL adaptor has been tested much more
extensively than the Berkeleydb adaptor and is highly recommended for
production environments. This section describes how to set up GBrowse to use
the MySQL adaptor. If you are not interested in this, you may skip to the next
section that describes loading third-party annotations.
First you'll have to install MySQL. Although it is installed by default in most
Linux systems, it will not be present on Windows or Macintosh OSX systems. Go
to www.mysql.com and follow the instructions to download and install the
database. Come back here when this is done.
Next, you'll need to install the Perl interface to MySQL. On a Windows system
using ActiveState Perl, use the ppm tool:
     C:\Windows> ppm
     ppm> install DBD::mysql
     ppm> quit
On a Unix, Linux or Mac OSX system, use the perl CPAN installer (this may need
to be done with root/superuser privileges):
     % perl -MCPAN -e shell
     cpan> install DBD::mysql
     cpan> quit
Now you're ready to create the MySQL version of the volvox database. First
you'll set up a new empty database named "volvox." Using the mysql command-line
tool, create the database, grant yourself read/write privileges, and grant the
"nobody" user read privileges:
     % mysql -uroot -p
     Enter password: *********

     mysql> create database volvox;
     Query OK, 1 row affected (0.04 sec)

     mysql> grant all privileges on volvox.* to lstein@localhost;
     Query OK, 0 rows affected (0.00 sec)

     mysql> grant select on volvox.* to nobody@localhost;
     Query OK, 0 rows affected (0.00 sec)

     mysql> quit
     Bye
Depending on how mysql was installed, you may not need to provide a password,
in which case just type "mysql -uroot" without the "-p" argument. When granting
privileges to yourself, replace "lstein" with your own login name. If you are
on a Windows system, you may be able to skip this step entirely.
You'll now load the .gff and .fa files into this newly created database. There
are actually two steps needed. The first is to "initialize" the database with
all the data definitions needed to hold genomic feature data, and the second is
to actually load the data. Fortunately, both these steps are handled by the
same command-line tool, bp_seqfeature_load.pl, which is part of the BioPerl
suite.
Copy the files volvox_all.gff3 and volvox_all.fa to some convenient place. Then
run the following command from the command line:
     % bp_seqfeature_load.pl -c -d volvox volvox_all.fa -f volvox_all.gff3
     loading volvox_all.fa...


     Building object tree... 0.00s
     Loading bulk data into database... 0.00s
     load time:  0.02s
     loading volvox_all.gff3...


     Building object tree... 0.00s
     Loading bulk data into database... 0.02s
     load time:  0.23s
The arguments to bp_seqfeature_load.pl are:
-c                            clear (initialize) the database
-d volvox                     Load into the database named volvox
-f                            Use the fast loading algorithm.
volvox_all.fa volvox_all.gff3 The data files to load.
The MySQL database is all ready to go. Now, in order to tell GBrowse to start
using the MySQL database rather than the in-memory database, you need to make a
small change to the volvox.conf configuration file. Find the few lines of the
file and change them to look like this:
     [GENERAL]
     description   = Volvox Example Database
     db_adaptor    = Bio::DB::SeqFeature::Store
     db_args       = -adaptor DBI::mysql
     	        -dsn     volvox
                     -user    nobody
The -adaptor argument is telling GBrowse to use the DBI::mysql" database
adaptor, which is the BioPerl interface to MySQL databases. The -dsn argument
tells GBrowse to use the data source name "volvox". The -fast option turns on
some optimizations that will make features load faster.
When you reload the web page, GBrowse will now be using MySQL. Depending on the
speed of your CPU and disk, you might notice that it seems a bit snappier than
the in-memory version. See CONFIGURE_HOWTO.txt for more information on
configuring GBrowse to use relational databases.
**** 5.3. Other backends ****
The Bio::DB::SeqFeature::Store database backend supports the three adaptors we
have already used in this tutorial. For information see the following perldoc
manual pages:
  perldoc Bio::DB::SeqFeature::Store::DBI::mysql
      The MySQL adaptor.
  perldoc Bio::DB::SeqFeature::Store::memory
      The in-memory adaptor.
  perldoc Bio::DB::SeqFeature::Store::berkeleydb
      The Berkeleydb adaptor
Another set of adaptors, in the Bio::DB::Das set, let GBrowse run on top of the
rich biological database schemas Chado and BioSQL:
  perldoc Bio::DB::Das::Chado
      An adaptor for PostgreSQL databases using the Chado schema (see the Chado
      home_page.)
  perldoc Bio::DB::Das::BioSQL
      An adaptor for PostgreSQL and MySQL databases using the BioSQL schema
      (see www.biosql.org).
  perldoc Bio::Das
      An adaptor for Distributed Annotation System genome annotation (version
      1). We discuss this in more detail under Using_GBrowse_as_a_DAS_Server_or
      Client
Lastly, there is an older family of adaptors that use the Bio::DB::GFF database
system. These are best-suited for loading data stored in GFF version 2 files ,
but will work, with limitations, with GFF3 files as well. These adaptors work
with a wider range of relational database backends.
  perldoc Bio::DB::GFF::Adaptor::dbi::mysql
      The MySQL adaptor.
  perldoc Bio::DB::GFF::Adaptor::dbi::oracle
      The Oracle adaptor.
  perldoc Bio::DB::GFF::Adaptor::dbi::pg
      The PostgreSQL adaptor.
  perldoc Bio::DB::GFF::Adaptor::dbi::biofetch
      An adaptor that will fetch data automatically from GenBank/EMBL and load
      it into a local MySQL database.
  perldoc Bio::DB::GFF::Adaptor::memory
      An adaptor for in-memory databases running off files.
===============================================================================
***** 6. Conclusion *****
This is just a short introduction to the many things that you can do with
GBrowse. Major features not discussed were:
    * multi-language support
    * third-party feature loading
    * the ability to view GenBank, chado, and biosql feature databases
    * advanced callbacks
All this information, and more can be found in GBrowse_Configuration_HOWTO and
in the documentation for Bio::DB::SeqFeature::Store and Bio::Graphics.
Have fun!
===============================================================================
     Lincoln D. Stein, lstein@cshl.org
     Cold_Spring_Harbor_Laboratory
 Last modified: Wed Mar 26 09:35:21 EDT 2008