This document provides information on configuring the Generic Genome Browser (GBrowse), part of the Generic Model Organism Systems Database Project (http://www.gmod.org/).
This section describes how to create new annotation databases from scratch.
GBrowse is based around the GFF file format, which stands for ``Gene Finding Format'' and was invented at the Sanger Centre. The GFF format is a flat tab-delimited file, each line of which corresponds to an annotation, or feature. Each line has nine columns and looks like this:
Chr1 curated CDS 365647 365963 . + 1 Transcript "R119.7"
The 9 columns are as follows:
The group field is also used to store information about the target of sequence similarity hits, and miscellaneous notes. See the next section for a description of how to describe similarity targets.
The sequences used to establish the coordinate system for annotations can correspond to sequenced clones, clone fragments, contigs or super-contigs.
In addition to a group ID, the GFF format allows annotations to have a group class. This makes sure that all groups are unique even if they happen to share the same name. For example, you can have a GenBank accession named AP001234 and a clone named AP001234 and distinguish between them by giving the first one a class of Accession and the second a class of Clone.
You should use double-quotes around the group name or class if it contains white space.
The first 8 fields of the GFF format are easy to understand. The group field is a challenge. It is used in three distinct ways:
1. Using the Group field for simple features
For a simple feature that spans a single continuous range, choose a name and class for the object and give it a line in the GFF file that refers to its start and stop positions.
Chr3 giemsa heterochromatin 4500000 6000000 . . . Band 3q12.1
2. Using the Group field to group features that belong together
For a group of features that belong together, such as the exons in a transcript, choose a name and class for the object. Give each segment a separate line in the GFF file but use the same name for each line. For example:
IV curated exon 5506900 5506996 . + . Transcript B0273.1 IV curated exon 5506026 5506382 . + . Transcript B0273.1 IV curated exon 5506558 5506660 . + . Transcript B0273.1 IV curated exon 5506738 5506852 . + . Transcript B0273.1
These four lines refer to a biological object of class ``Transcript'' and name B0273.1. Each of its parts uses the method ``exon'', source ``curated''. Once loaded, the user will be able to search the genome for this object by asking the browser to retrieve ``Transcript:B0273.1''. The browser can also be configured to allow the Transcript: prefix to be omitted.
You can extend the idiom for objects that have heterogeneous parts, such as a transcript that has 5' and 3' UTRs
IV curated mRNA 5506800 5508917 . + . Transcript B0273.1; Note "Zn-Finger" IV curated 5'UTR 5506800 5508999 . + . Transcript B0273.1 IV curated exon 5506900 5506996 . + . Transcript B0273.1 IV curated exon 5506026 5506382 . + . Transcript B0273.1 IV curated exon 5506558 5506660 . + . Transcript B0273.1 IV curated exon 5506738 5506852 . + . Transcript B0273.1 IV curated 3'UTR 5506852 5508917 . + . Transcript B0273.1
In this example, there is a single feature with method ``mRNA'' that spans the entire range. It is grouped with subparts of type 5'UTR, 3'UTR and exon. They are all grouped together into a Transcript named B0273.1. Furthermore the mRNA feature has a note attached to it.
*NOTE* The subparts of a feature are in absolute (chromosomal or contig) coordinates. It is not currently possible to define a feature in absolute coordinates and then to load its subparts using coordinates that are relative to the start of the feature.
Some annotations do not need to be individually named. For example, it is probably not useful to assign a unique name to each ALU repeat in a vertebrate genome. For these, just leave the Group field empty.
3. Using the Group field to add a note
The group field can be used to add one or more notes to an annotation. To do this, place a semicolon after the group name and add a Note field:
Chr3 giemsa heterochromatin 4500000 6000000 . . . Band 3q12.1 ; Note "Marfan's syndrome"
You can add multiple Notes. Just separate them by semicolons:
Band 3q12.1 ; Note "Marfan's syndrome" ; Note "dystrophic dysplasia"
The Note should come AFTER the group type and name.
3. Using the Group field to add an alternative name
If you want the feature to be quickly searchable by an alternative name, you can add one or more Alias tags. A feature can have multiple aliases, and multiple features can share the same alias:
Chr3 giemsa heterochromatin 4500000 6000000 . . . Band 3q12.1 ; Alias MFX
Searches for aliases will be both faster and more reliable than searches for keywords in notes, since the latter relies on whole-text search methods that vary somewhat from DBMS to DBMS.
Each reference sequence in the GFF table must itself have an entry. This is necessary so that the length of the reference sequence is known.
For example, if ``Chr1'' is used as a reference sequence, then the GFF file should have an entry for it similar to this one:
Chr1 assembly chromosome 1 14972282 . + . Sequence Chr1
This indicates that the reference sequence named ``Chr1'' has length 14972282 bp, method ``chromosome'' and source ``assembly''. In addition, as indicated by the group field, Chr1 has class ``Sequence'' and name ``Chr''.
It is suggested that you use ``Sequence'' as the class name for all reference sequences, since this is the default class used by the Bio::DB::GFF module when no more specific class is requested. If you use a different class name, then be sure to indicate that fact with the ``reference class'' option (see below).
There are several cases in which an annotation indicates the relationship between two sequences. One common one is a similarity hit, where the annotation indicates an alignment. A second common case is a map assembly, in which the annotation indicates that a portion of a larger sequence is built up from one or more smaller ones.
Both cases are indicated by using the Target tag in the group field. For example, a typical similarity hit will look like this:
Chr1 BLASTX similarity 76953 77108 132 + 0 Target Protein:SW:ABL_DROME 493 544
Here, the group field contains the Target tag, followed by an identifier for the biological object. The GFF format uses the notation Class:Name for the biological object, and even though this is stylistically inconsistent, that's the way it's done. The object identifier is followed by two integers indicating the start and stop of the alignment on the target sequence.
Unlike the main start and stop columns, it is possible for the target start to be greater than the target end. The previous example indicates that the the section of Chr1 from 76,953 to 77,108 aligns to the protein SW:ABL_DROME starting at position 493 and extending to position 544.
A similar notation is used for sequence assembly information as shown in this example:
Chr1 assembly Link 10922906 11177731 . . . Target Sequence:LINK_H06O01 1 254826 LINK_H06O01 assembly Cosmid 32386 64122 . . . Target Sequence:F49B2 6 31742
This indicates that the region between bases 10922906 and 11177731 of Chr1 are composed of LINK_H06O01 from bp 1 to bp 254826. The region of LINK_H0601 between 32386 and 64122 is, in turn, composed of the bases 5 to 31742 of cosmid F49B2.
Use the BioPerl script utilities bulk_load_gff.pl, load_gff.pl or (if you are brave) fast_load_gff.pl to load the GFF file into the database. For example, if your database is a MySQL database on the local host named ``dicty'', you can load it into an empty database using bulk_load_gff.pl like this:
bulk_load_gff.pl -c -d dicty my_data.gff
To update existing databases, use either load_gff.pl or fast_load_gff.pl. The latter is somewhat experimental, so use with care.
The Bio::DB::GFF database (and only Bio::DB::GFF!) has a feature known as ``aggregators''. These are small software packages that recognize certain common feature types and convert them into complex biological objects. These aggregators make it possible to develop intelligent graphical representations of annotations, such as a gene that draws confirmed exons differently from predicted ones.
An aggregator typically creates a new composite feature with a different method than any of its components. For example, the standard ``alignment'' aggregator takes multiple alignments of method ``similarity'', groups them by their name, and returns a single feature of method ``alignment''.
The various aggregators are described in detail in the Bio::DB::GFF manual page. It is easy to write new aggregators, and also possible to define aggregators on the fly in the gbrowse configuration file. It is suggested that you use the sample GFF files from the yeast, drosophila and C. elegans projects to see what methods to use to achieve the desired results.
In addition to the standard aggregators that are distributed with BioPerl, GBrowse distributes several experimental and/or special-purpose aggregators:
Adding features to the compound feature, ``reftranscript'', can be done by adding to the ``part_names'' call (i.e. ``refCDS'').
nucleotide_match:waba_weak nucleotide_match:waba_strong nucleotide_match:waba_coding
It is strongly recommended that for mirroring C. elegans annotations, you use the ``processed_transcript'' aggregator in conjunction with the GFF3 files found at:
ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/genome_feature_tables/GFF3
IT IS NOT NECESSARY TO USE AGGREGATORS WITH THE CHADO, BIOSQL OR BIO::DB::SEQFEATURE::STORE (GFF3) DATABASES.
Each data source has a corresponding configuration file in the directory gbrowse.conf. Once you've created and loaded a new database, you should make a copy of one of the existing configuration files and modify it to meet your needs. The name of the new configuration file must follow the form:
sourcename.conf
where ``sourcename'' is a short word that describes the data source. You can use this name to select the data source when linking to the browser. Just construct a URL that uses ``sourcename'' as a virtual directory under cgi-bin/gbrowse:
http://your.site.org/cgi-bin/gbrowse/sourcename/
(Note: If you don't add the slash at the end, gbrowse will automatically do it for you, since the terminal slash is needed to work around an apparent bug in MSIE's cookie handling.)
It is suggested that you use the same name as the database, although this isn't a requirement. (If no ``source='' argument is given, gbrowse picks the first configuration file that occurs alphabetically; you can control this by placing numbers in front of the configuration file, as in ``01.yeast.conf''.)
The configuration file is divided into a number of sections, each one introduced by a [SECTION TITLE]. The [GENERAL] section contains settings that are applicable to the entire application. Other sections define tracks to display.1
I suggest that you begin with one of the example configuration files provided with the distribution and modify it to suit your needs.
The [GENERAL] section consists of a series of name=value options. For example, the beginning of the yeast.conf sample configuration file looks like this:
[GENERAL] description = S. cerevisiae (via SGD Nov 2001) db_adaptor = Bio::DB::GFF db_args = -adaptor dbi::mysql -dsn dbi:mysql:database=yeast;host=localhost aggregators = transcript alignment user = passwd =
Each option is a single word or phrase, usually in lower case. This is followed by an equals sign and the value of the option. You can add whitespace around the equals sign in order to increase readability. If a value is very long, you can continue it on additional lines provided that you put a tab or other whitespace on the continuation lines. For example:
description = S. cerevisiae annotations via SGD Nov 2001, and converted using the process_sgd.pl script
Any lines that begin with a pound sign (#) are considered comments and ignored.
During this discussion, you might want to follow along with one of the example configuration files.
The following [GENERAL] options are recognized:
db_args = -adaptor dbi::mysql -dsn dbi:mysql:database=<db_name>;host=<db_host>
replacing <db_name> and <db_host> with the database and database host of your choice. For MySQL databases running on the localhost, you can shorten this to just ``db_name''.
If the database requires you to log in with a user name and password, use the following db_adaptor:
db_args = -adaptor dbi::mysql -dsn dbi:mysql:database=<db_name>;host=<db_host> -user <username> -pass <password>
replacing <username> and <password> with the appropriate values. In the example configuration files, we use a username of ``nobody'' and an empty password. This is appropriate if the database is configured to allow ``nobody'' to log in from the local machine without using a password.
To use the Oracle version of Bio::DB::GFF, use these arguments:
db_args = -adaptor dbi::oracle -dsn dbi:oracle:database=db_service
Where db_description should be replaced with the name of the desired database service definition. See the documentation for the Perl dbd::Oracle database driver for more information about the -dsn format.
To use the in-memory version of Bio::DB::GFF, use these arguments:
db_args = -adaptor memory -dir /path/to/directory
The indicated directory should contain one or more GFF and FASTA files, distinguished by the filename extensions .gff and .fa respectively.
To disable the default aggregators, leave this setting blank, as in:
aggregators=
To activate the default aggregators of ``transcript,'' ``clone,'' and ``alignment,'' comment this setting out entirely:
# aggregators =
Do not use aggregators with Bio::DB::SeqFeature::Store, BioSQL, or Chado.
A good standard list of plugins is:
plugins = SequenceDumper FastaDumper RestrictionAnnotator
See the contents of conf/plugins and contrib/plugins for more plugins that you can install.
tmpimages = <tmpimages_url> <tmpimages_path>
Where <tmpimages_url> is the directory as it appears as a URL and <tmpimages_path> is the physical path to the directory as it appears to the filesystem. Usually the physical path is just the URL with the DocumentRoot configuration variable prepended to it, in which case only the URL is needed. However, if the URL is defined using an Alias directive, then the path argument is mandatory.
The tmpimages option is mandatory.
NOTE: The path argument is ignored if gbrowse is running under modperl, because modperl allows the URL to be translated into a physical directory programatically.
default features = Genes ORFs tRNAs Centromeres:overview
The syntax for annotation plugins is slightly different. To activate an annotation plugin track by default, preface the plugin's name with ``plugin:''
default features = Genes ORFs Centromeres:overview plugin:RestrictionAnnotator
reference class = contig
Example:
initial landmark = Chr1
The max segment option sets an upper bound on the maximum size segment that will be displayed on the detailed view. Its value is in the selected units. Above this limit, the user will be prompted to select a smaller region on the birds-eye view. The default is 1,000,000 base pairs.
If the user tries to view a segment smaller than the min segment option, then the segment will be resized to be this size. The default is 20 bp.
zoom levels = 1000 2000 5000 10000 20000 40000 100000 200000
Note that all data sources will need to have this option defined in order for it to take effect across all databases.
You can freshen the cache and force cached copies to be ignored by touching the configuration file or by calling gbrowse with the CGI option nocache=1.
It is also possible to place an anonymous Perl subroutine here. The code will be invoked during preparation of the page and must return a string value to use as the header. See COMPUTED OPTIONS for details.
Example:
header = <h1>Welcome to the Volvox Sequence Page</h1>
It is also possible to place an anonymous Perl subroutine here. The code will be invoked during preparation of the page and must return a string value to use as the header. See COMPUTED OPTIONS for details.
Example:
footer = <hr> <table width="100%"> <TR> <TD align="LEFT" class="databody"> For the source code for this browser, see the <a href="http://www.gmod.org"> Generic Model Organism Database Project.</a> For other questions, send mail to <a href="mailto:lstein@cshl.org">lstein@cshl.org</a>. </TD> </TR> </table>
examples = II NPY1 NAB2 Orf:YGL123W
Example:
automatic classes = Symbol Gene Clone
When the user types in ``hb3'', the browser will search first for a Sequence feature of class hb3, followed in turn by matching features in Symbol, Gene and Clone. The search stops when the first match is found. Otherwise, the browser will proceed to a full text search of all the comment fields.
search_notes()
method. By default this will search the text
of all attributes, including such things as protein sequence. The
Bio::DB::SeqFeature::Store database is a bit smarter about searching,
and will only, by default, search attributes named ``Note''. You can
expand the search by giving a list of attribute names to the ``search
attributes'' option.
remote sources = "Menu Label 1" http://url1.host.com/etc/etc "Menu Label 2" http://url2.host.com/etc/etc
instructions = "Type in the name of a contig or clone."
no search = 1
open Show the section initially open closed Show the section initially collapsed off Do not show the section at all
For example ``instructions section = closed'' will initially show the instructions section in collapsed form when the user visits gbrowse for the first time. ``upload_tracks section = off'' will disable the uploads section entirely.
Note that turning off the details section will effectively disable gbrowse, but you might want to do this if you want to show the overview section only. Turning off the search section will also disable the navigation buttons. If you want to disable searching selectively, you should use the ``no search'' option instead.
<table> <tr><th>Option</th><th>Where it goes</th></tr> <tr><td>header</td><td>between the top and the instructions</td></tr> <tr><td>html1 </td><td>between the instructions and the navigation bar</td></tr> <tr><td>html2 </td><td>between the navigation bar and the overview</td></tr> <tr><td>html3 </td><td>between the overview and the detail view</td></tr> <tr><td>html4 </td><td>between the detail view and the data source panel</td></tr> <tr><td>html5 </td><td>between the data source panel and the track list</td></tr> <tr><td>html6 </td><td>between the track list and the annotation upload</td></tr> <tr><td>footer</td><td>between the annotation upload and the bottom</td></tr> </table>
These can be code references. One useful thing to do is to use the language translator to insert language-specific HTML. Here's an example provided by Marc Logghe:
html2 = sub { my $go = $main::CONFIG->tr('Go'); return qq( <table width="800" border="0"> <tr class="searchbody"> <td align="left" colspan="3" /> <b>Dump:</b><input type="button" value="Assembly" onclick="window.open('gbrowse?plugin=AssemblyDumper;plugin_action=$go');"> <input type="button" value="Reads" onclick="window.open('gbrowse?plugin=ReadDumper;plugin_action=$go');"> </td> </tr> </table> ); }
If you use a coderef for the html options, the subroutine is passed two arguments. The first argument is a Bio::Das::SegmentI object (see the manual page for Bio::DB::GFF::RelSegment for details). The second argument is a hashref containing the user's settings for the current page.
keystyle = between Print the track labels between the tracks themselves.
keystyle = beneath Print the track labels at the bottom of the detailed view.
The ``empty_tracks'' option controls what to do when a track has no features in it. Possible values are:
empty_tracks = key Print just the key (the track label).
empty_tracks = suppress Suppress the track completely.
empty_tracks = line Draw a solid line across the track.
empty_tracks = dashed Draw a dashed line across the track.
The default value is ``key.''
The only difference between the two options is the time that they are applied relative to the grid that shows base pair coordinates. The background option is invoked before the grid is drawn so that the grid appears on top of it. The postgrid option is invoked after the grid is drawn, so that anything the option draws appears on top of the grid. See http://sourceforge.net/mailarchive/message.php?msg_id=12116755 for an example of using this feature to show assembly gaps as vertical gray regions.
You can individually adjust the left and right padding using pad_left and pad_right, which, if present, will supersede image_padding.
The default is false.
Please see the DAS_HOWTO manpage for more information on using DAS with GBrowse.
proxy = http://myproxy.myorg.com:9000
The ``session driver'' option will be passed to CGI::Session->new()
as
the first argument. It specifies the driver, serializer and ID
generator according to the syntax described in the CGI::Session manpage. The
``session args'' option will be passed to CGI::Session->new()
as the
third argument. It specifies additional parameters to be passed to the
selected driver.
For example, here is how to create session data that is stored in the MySQL ``test'' database under a table named ``gbrowse_sessions.'' The session data will be stored in binary form by the Storable module:
session driver = driver:mysql;serializer:storable session args = DataSource test TableName gbrowse_sessions
See the CGI::Session documentation for information about setting up the MySQL table and appropriate permissions.
You might also want to read about the CGI::Session::ID::salted_md5 manpage for an ID generation algorithm that should be more secure (but slightly slower) than the default one.
You will not ordinarily need to use these settings, as the defaults seem to work well.
remember settings time = +3M # remember settings for 3 months
The users' settings, which includes uploaded files, track options and plugin configuration, will be reset to the default if he or she fails to visit the site within the time specified.
The default value is 1 month.
See CGI for more information on the time format.
The default value is 3 months.
When you set ``msie hack'' to a true value, Gbrowse will use the GET request when it detects MSIE in use. This will fix the ``Back'' button issue, but will put very long URLs in the Location box. It is your choice which of these is more annoying to your users.
The track defaults section specifies default values for each track. The following common options are recognized:
glyph height bgcolor fgcolor fontcolor font2color strand_arrow
These options control the default graphical settings for any annotation types that are not explicitly specified. See the section below on controlling the settings. Any of the options allowed in the [track] sections described below are allowed here.
The link option's value should be a URL containing one or more variables. Variables begin with a dollar sign ($), and are replaced at run time with the information relating to the selected annotation. Recognized variables include:
$name The feature's name (group name) $id The feature's id (eg, PK from a database) $class The feature's class (group class) $method The feature's method $source The feature's source $ref The name of the sequence segment (chromosome, contig) on which this feature is located $description The feature's description (notes) $start The start position of this feature, relative to $ref $end The end position of this feature, relative to $ref $segstart The left end of $ref displayed in the detailed view $segend The right end of $ref displayed in the detailed view
For example, the wormbase.conf file uses this link rule:
link = http://www.wormbase.org/db/get?name=$name;class=$class
At run time, if the user clicks on an EST named yk1234.5, this will generate the URL
http://www.wormbase.org/db/get?name=yk1234.5;class=EST
It is possible to override the global link rule on a feature-by-feature basis. See the next section for details on this. It is also possible to declare a subroutine to compute the proper URL dynamically. See COMPUTED OPTIONS for details.
A special link type of AUTO will cause the feature to link to the gbrowse_details script, which summarizes information about the feature. The default is not to link at all.
link_target = _blank
The value uses the HTML targetting rules to name/create the window to receive the value of the link. The first time the link is accessed, a window with the specified name is created. The next time the user clicks on a link with the same target, that window will receive the content of the link if it is still present, or it will be created again if it has been closed. A target named ``_blank'' is special and will always create a new window.
The ``link_target'' option can also be computed dynamically. See COMPUTED OPTIONS for details.
Note HTML characters such as ``<'', ``>'' and ``&'' are not automatically escaped from the title. This lets you do neat stuff, such as create popup menus, but also means that you need to be careful. The function CGI::escapeHTML() is available to properly escape HTML characters in dynamically-generated titles.
The special value ``AUTO'' causes a default description to appear describing the name, type and position of the feature. This is also assumed if the title option is missing or blank.
Any other [Section] in the configuration file is treated as a declaration of a track. The order of track sections will become the default order of tracks on the display (the user can change this later). Here is a typical track declaration from yeast.conf:
[Genes] feature = gene:sgd glyph = generic bgcolor = yellow forwardcolor = yellow reversecolor = turquoise strand_arrow = 1 height = 6 description = 1 key = Named gene
This track is named ``Genes''. You may use a short mnemonic if you prefer; this will make the URL shorter when the user bookmarks a view he or she likes. Track names can contain almost any character, including whitespace, but cannot contain the ``-'' or ``+'' signs because these are used to separate track names in the URL when bookmarking. [My Genes] is OK, but [My-Genes] is not.
As in the general configuration section, the track declaration contains multiple name=value option pairs.
Valid options are as follows:
It is possible to omit the source. A feature of type ``gene'' will include all features whose methods are ``gene'', regardless of the source field. It is not possible to omit the method.
It is possible to have several feature types displayed on a single track. Simply provide the feature option with a space-delimited list of the features you want to include. For example:
feature = gene:sgd stRNA:sgd
This will include features of type ``gene:sgd'' and ``stRNA:sgd'' in the same track and display them in a similar fashion.
Example:
remote feature = http://www.wormbase.org/cgi-bin/das/wormbase?type=mRNA
Example:
group_on = display_name
(this feature is under refinement and may change in the future)
category = Genes
Note that it is not possible to make subcategories. If all tracks are categorized, then the ``General'' category will not be displayed.
A large number of glyph-specific options are also recognized. These are described in the next section.
A large variety of glyphs are available, and more are being added as the Bio::Graphics module grows.
A list of the common glyphs and their options is provided by the GBrowse itself. Click on the ``[Help]'' link in the section labeled ``Upload your own annotations''. This page also lists the valid foreground and background colors. Most of the glyphs are found in the BioPerl distribution, but a few are distributed directly with GBrowse.
The most popular glyph types are:
Glyph Description ----- -----------
generic a rectangle allele_tower allele found at a SNP position arrow an arrow anchored_arrow a span with vertical bases |---------|. If one or the other end of the feature is off-screen, the base will be replaced by an arrow. box another rectangle; doesn't show subparts of features cds shows the reading frame of spliced transcripts; used in conjunction with the "coding" aggregator. diamond a point-like feature represented as a triangle dna DNA and GC content heterogeneous_segments a multi-segmented feature in which each segment can have a distinctive color. For Jim Kent's WABA features, this works with the waba_alignment aggregator. idiogram this takes specially-formatted feature data and turns it into an idiogram of a Giemsa-stained metaphase chromosome processed_transcript multi-purpose representation of a spliced mRNA, including positions of UTRs segments a multi-segmented feature such as an alignment span like anchored_arrow, except that the ends are truncated at the edge of the panel, not turned into an arrow trace reads an SCF trace file and draws a graphic representation triangle a point-like feature represented as a diamond transcript a gene model transcript2 a slightly different representation of a gene model translation 1-, 3- and 6-frame translations wormbase_transcript yet another gene model that can show UTR segments (for features that conform to the WormBase gene schema). Used in conjunction with the "wormbase_gene" aggregator. xyplot histograms and line plots
A more definitive list of glyph options can be found in the Bio::Graphics manual pages. Consult the manual pages for the following modules:
Glyph Manual Page ----- -----------
(common options for all) Bio::Graphics::Glyph allele_tower Bio::Graphics::Glyph::allele_tower arrow Bio::Graphics::Glyph::arrow anchored_arrow Bio::Graphics::Glyph::anchored_arrow box Bio::Graphics::Glyph::box cds Bio::Graphics::Glyph::cds crossbox Bio::Graphics::Glyph::crossbox diamond Bio::Graphics::Glyph::diamond dna Bio::Graphics::Glyph::dna dot Bio::Graphics::Glyph::dot ellipse Bio::Graphics::Glyph::ellipse extending_arrow Bio::Graphics::Glyph::extending_arrow generic Bio::Graphics::Glyph::generic graded_segments Bio::Graphics::Glyph::graded_segments heterogeneous_segments Bio::Graphics::Glyph::heterogeneous_segments idiogram Bio::Graphics::Glyph::idiogram line Bio::Graphics::Glyph::line primers Bio::Graphics::Glyph::primers processed_transcript Bio::Graphics::Glyph::processed_transcript rndrect Bio::Graphics::Glyph::rndrect ruler_arrow Bio::Graphics::Glyph::ruler_arrow segments Bio::Graphics::Glyph::segments span Bio::Graphics::Glyph::span toomany Bio::Graphics::Glyph::toomany trace Bio::Graphics::Glyph::trace transcript Bio::Graphics::Glyph::transcript transcript2 Bio::Graphics::Glyph::transcript2 translation Bio::Graphics::Glyph::translation triangle Bio::Graphics::Glyph::triangle wormbase_transcript Bio::Graphics::Glyph::wormbase_transcript xyplot Bio::Graphics::Glyph::xyplot
The ``perldoc'' command is handy for reading the documentation from the Unix command line. For example:
perldoc Bio::Graphics::Glyph::primers
This will provide you with a summary of the options that apply to the ``primers'' glyph.
In the manual pages, the glyph options are presented the way they are called from Perl. For example, the documentation will tell you to use the -connect_color option to set the color to use when drawing the line that connects the two inward pointing arrows in the primer pair glyph. This translates to the configuration file as an option named ``connect_color''. For example:
[PCR Products] glyph = primer connect_color = blue
When referring to colors, you can use a variety of color names such as ``blue'' and ``green''. To get the full list, cut and paste the following magic incantation into the command line:
perl -MBio::Graphics::Panel -e 'print join "\n",Bio::Graphics::Panel->color_names'
or see this URL:
http://www.wormbase.org/db/seq/gbrowse?help=annotation
Alternatively, you can use the #RRGGBB notation to specify the red, green and blue components of the color. Refer to any book on HTML for the details on using the notation.
You can make any set of tracks appear in the overview by creating a stanza with a title of the format [<label>:overview], where <label> is any unique label of your choice. The format of the stanza is identical to the others, but the indicated track will appear in the overview rather than as an option in the detailed view. For example, this stanza adds to the overview a set of features of method ``gene'', source ``framework'':
[framework:overview] feature = gene:framework label = 1 glyph = generic bgcolor = lavender height = 5 key = Mapped Genes
Similarly, you can make a track appear in the region panel by appending ``:region'' to its name:
[genedensity:region] feature = gene_density glyph = xyplot graph_type = boxes scale = right bgcolor = red fgcolor = red height = 20 key = SNP Density
Sometimes you will want to change the appearance of a track when the user has zoomed out or zoomed in beyond a certain level. To indicate this, create a set of ``length qualified'' stanzas of format [<label>:<zoom level>], where all stanzas share the same <label>, and <zoom level> indicates the minimum size of the region that the stanza will apply to. For example:
[gene] feature = transcript:curated glyph = dna fgcolor = blue key = genes citation = example semantic zoom track
[gene:500] feature = transcript:curated glyph = transcript2
[gene:100000] feature = transcript:curated glyph = arrow
[gene:500000] feature = transcript:curated glyph = generic
This series of stanzas says to use the ``transcript2'' glyph when the segment being displayed is 500 bp or longer, to use the ``arrow'' glyph when the segment being displayed is 100,000 bp or longer, and the ``generic'' glyph when the region being displayed is 500,000 bp or longer. For all other segment lengths (1 to 499 bp), the ordinary [gene] stanza will be consulted, and the ``dna'' glyph will be displayed. The bare [gene] stanza is used to set all but the ``feature'' options for the other stanzas. This means that the fgcolor, key and citation options are shared amongst all the [gene:XXXX] stanzas, but the ``feature'' option must be repeated.
You can override any options in the length qualified stanzas. For example, if you want to change the color to red in when displaying genes on segments between 500 and 99,999 bp, you can modify the [gene:500] stanza as follows:
[gene:500] feature = transcript:curated glyph = transcript2 fgcolor = red
It is also possible to display different features at different zoom levels, although you should handle this potentially confusing feature with care.
If you wish to turn off a track entirely, you can use the ``hide'' flag to hide the track when the display exceeds a certain size:
[6_frame_translation:50000] hide = 1
Some options can be computed at run time by using Perl subroutines as their values. These are known as ``callbacks.'' Currently this works with the values of the ``link'', ``title'', ``link_target'', ``header'' and ``footer'' options, and any glyph-specific option that appears in a track section.
You need to know the Perl programming language to take advantage of this. The general format of this type of option is:
option name = sub { some perl code; some more perl code; even more perl code; }
The value must begin with the sequence ``sub {'' in order to be recognized as a subroutine declaration. After this, you can have one or more lines of Perl code followed by a closing brace. Continuation lines must begin with whitespace.
When the browser first encounters an option like this one, it will attempt to compile it into Perl runtime code. If successful, the compiled code will be stored for later use and invoked whenever the value of the option is needed. (Otherwise, an error message will appear in your server error log).
For options of type ``footer'' and ``header'', the subroutine is passed no arguments. It is expected to produce some HTML and return it as a string value.
For glyph-specific features, such as ``bgcolor'' the subroutine will be
called at run time with five arguments consisting of the feature, the
name of the option, the current part number of the feature, the total
number of parts in this feature, and the glyph corresponding to the
feature. Usually you will just look at the first argument. The return
value is treated as the value of the corresponding option. For
example, this bgcolor subroutine will call the feature's primary_tag()
method, and return ``blue'' if it is an exon, ``orange'' otherwise:
bgcolor = sub { my $feature = shift; return "blue" if $feature->primary_tag eq 'exon'; return "orange"; }
See the manual page for Bio::DB::GFF::Feature for information on how to interrogate the feature object.
For special effects, such as coloring the first and last exons differently, you may need access to all five arguments. Here is an example that draws the first and last parts of a feature in blue and the rest in red:
sub { my($feature,$option_name,$part_no,$total_parts,$glyph) = @_; return 'blue' if $part_no == 0; # zero-based indexing! return 'blue' if $part_no == $total_parts-1; # zero-based indexing! return 'red'; }
See the Bio::Graphics::Panel manual page for more details.
Callbacks for the ``link'', ``title'', and ``link_target'' options have a slightly different call signature. They receive three arguments consisting of the feature, the Bio::Graphics::Panel object, and the Bio::Graphics::Glyph object corresponding to the current track within the panel:
link = sub { my ($feature, $panel, $track) = @_; ... do something }
Ordinarily you will only need to use the feature object. The other arguments are useful to look up panel-specific settings such as the pixel width of the panel or the state of the ``flip'' setting:
title = sub { my ($feature,$panel,$track) = @_; my $name = $feature->display_name; return $panel->flip ? "$name (flipped)" : $name; }
Named Subroutine References ---------------------------
If you use a version of BioPerl after April 15, 2003, you can also use references to named subroutines as option arguments. To use named subroutines, add an init_code section to the [GENERAL] section of the configuration file. init_code should contain nothing but subroutine definitions and other initialization routines. For example:
init_code = sub score_color { my $feature = shift; if ($feature->score > 50) { return 'red'; } else { return 'green'; } } sub score_height { my $feature = shift; if ($feature->score > 50) { return 10; } else { return 5; } }
Then simply refer to these subroutines using the \&name syntax:
[EST_ALIGNMENTS] glyph = generic bgcolor = \&score_color height = \&score_height
You can declare global variables in the init_code subroutine if you use ``no strict 'vars';'' at the top of the section:
init_code = no strict 'vars'; $HEIGHT = 10; sub score_height { my $feature = shift; $HEIGHT++; if ($feature->score > 50) { return $HEIGHT*2; } else { return $HEIGHT; } }
Due to the way the configuration file is parsed, there must be no empty lines in the init_code section. Either use comments to introduce white space, or ``use'' a .pm file to do anything fancy.
Subroutines that you define in the init_code section, as well as anonymous subroutines, will go into a package that changes unpredictably each time you load the page. If you need a predictable package name, you can define it this way:
init_code = package My; sub score_height { .... }
[EST_ALIGNMENTS] height = \&My::score_height
The Bio::DB::GFF data model recognizes a single-level of ``grouping'' of features, but doesn't specify how to use the group information to correctly assemble the various individual components into a biological object. Aggregators are used to assemble this information. For example, let's say that you decide that your preferred ``transcript'' data model contains three subfeature types: a set of one or more features of method ``exon'', a single feature of method ``TSS'', and a single feature of method ``polyA''. Optionally, the data model could contain a single ``main subfeature'' that runs the length of the entire transcript. We might give this feature a method of ``primary_transc'' (for ``primary transcript.'')
In a GFF file, a three-exon transcript might be represented as follows:
Chr1 confirmed primary_transc 100 500 . + . Transcript "ABC.1" Chr1 confirmed TSS 100 100 . + . Transcript "ABC.1" Chr1 confirmed exon 100 200 . + . Transcript "ABC.1" Chr1 confirmed exon 250 300 . + . Transcript "ABC.1" Chr1 confirmed exon 400 500 . + . Transcript "ABC.1" Chr1 confirmed polyA 500 500 . + . Transcript "ABC.1"
To aggregate this, you would like to create an aggregator named ``transcript'', whose ``main method'' is ``primary_transc'', and whose ``sub methods'' are ``TSS,'' ``exon,'' and ``polyA.''
The way to indicate this in the configuration file is to add a ``complex aggregator'' to the list of aggregators:
aggregator = transcript{TSS,exon,polyA/primary_transc}
The format of this value is ``aggregator_name{submethod1,submethod2,.../mainmethod}''.
You can now use the name of the aggregator name as the argument of the ``feature'' option in a track section:
[Transcripts] feature = transcript glyph = segments bgcolor = wheat fgcolor = black height = 10 key = Transcripts
If you do not have a main subfeature, leave off the ``/mainmethod''. For example:
aggregator = transcript{TSS,exon,polyA}
A few formatting notes. You are free to mix simple and complex aggregators in the ``aggregator'' option. For example, you can activate the standard ``clone'' and ``alignment'' aggregators as well as the new transcript aggregator with a line like this one:
aggregator = clone transcript{TSS,exon,polyA/primary_transc} alignment
If the complex aggregator contains whitespace or apostrophes, you must surround it with double-quotes, like this:
"transcript{TSS,5'UTR,3'UTR,exon,polyA/primary_transc}"
Be aware that some glyphs look for particular method names when rendering aggregated features. For example, the standard ``transcript'' glyph is closely tied to the ``transcript'' aggregator, and looks for submethods named ``intron'', ``exon'' and ``CDS'', and a main method named ``transcript.''
Here is the list of available predefined aggregators:
alignment clone coding transcript none orf waba_alignment wormbase_gene
To view the documentation for any of these aggregators, run the command ``perldoc Bio::DB::GFF::Aggregator::aggregator_name'', where ``aggregator_name'' is the name of the aggregator.
gbrowse recognizes the concept of a ``group'' of related features that are connected by dotted lines. The canonical example is a pair of ESTs that are related by being from the two ends of the same cDNA clone. However many feature databases, including the GFF database recommended for gbrowse, do not allow for arbitrary hierarchical grouping. To work around this, you may specify a feature name-based regular expression that will be used to trigger grouping.
It works like this. Say you are working with EST feature pairs and they follow the nomenclature 501283.5 and 501283.3, where the suffix is ``5'' or ``3'' depending on whether the read was from the 5' or 3' ends of the insert. To group these pairs by a dotted line, specify the ``group_pattern'' option in the appropriate track section:
group_pattern = /\.[53]$/
At render time, gbrowse will strip off this pattern from the names of all features in the EST track and group those that have a common base name. Hence 501283.5 and 501283.3 will be grouped together by a dotted line, because after the pattern is removed, they will share the same common name ``501283''.
This works for all embedded pattern, provided that stripping out the pattern results in related features sharing the same name. For example, if the convention were ``est.for.501283'' and ``est.rev.501283'', then this grouping pattern would have the desired effect:
group_pattern = /\.(for|rev)\./
Don't forget to escape regular expression meta-characters and to consider the various ways in which the regular expression might break. It is entirely possible to create an invalid regular expression, in which case gbrowse will crash until you comment out the offending option.
If a track definition's ``link'' option (see section B2) is set to AUTO, the gbrowse_details script will be invoked when the user clicks on a feature contained within the track. This will generate a simple table of all feature information available in the database. This includes the user-defined tag/value attributes set in Column 9 of the GFF for that feature.
You can control, to some extent, the formatting of the tag value table by providing a configuration stanza with the following format:
[feature_type:details] tag1 = formatting rule tag2 = formatting rule tag3 = formatting rule
``feature_type'' is the type of the feature you wish to control. For example, ``gene:sgd'' or simply ``gene''. You may also specify a feature_type of ``default'' to control the formatting for all features. ``tag1'', ``tag2'' and so forth are the tags that you wish to control the formatting of. The tags ``Name,'' ``Class'', ``Type'', ``Source'', ``Position'', and ``Length'' are valid for all features, while ``Target'' and ``Matches'' are valid for all features that have a target alignment. In addition, you can use the names of any attributes that you have defined. Tags names are NOT case sensitive, and you may use a tag named ``default'' to define a formatting rule that is general to all tags (more specific formatting rules will override less specific ones).
A formatting rule can be a string with (possible) substitution values, or a callback. If a string, it can contain one or more of the substitution variable ``$name'', ``$start'', ``$end'', ``$stop'', ``$strand'', ``$method'', ``$type'', ``$description'' and ``$class'', which are replaced with the corresponding values from the current feature. In addition, the substitution variable ``$value'' is replaced with the current value of the attribute, and the variable ``$tag'' is replaced with the current tag (attribute) name. HTML characters are passed through.
For example, here is a simple way to boldface the Type field, italicize the Length field, and turn the Notes into a Google search:
[gene:details] Type = <b>$value</b> Length = <i>$value</b> Note = <a href="http://www.google.com/search?q=$value">$value</a>
If you provide a callback, the callback subroutine will be invoked with three arguments. WARNING: the three arguments are different from the ones passed to other callbacks, and consist of the tag value, the tag name, and the current feature:
Note = sub { my($value,$tag_name,$feature) = @_; do something.... }
You can use this feature to format sequence attributes nicely. For example, if your features have a Translation attribute which contains their protein translations, then you are probably unsatisified with the default formatting of these features. You can modify this with a callback that word-wraps the value into lines of at most 60 characters, and puts the whole thing in a <pre> section.
[gene:details] Translation = sub { my $value = shift; $value =~ s/(\S{1,60})/$1\n/g; "<pre>$value</pre>"; }
The formatting rule mechanism described in the previous section is the recommended way of creating a link out from the gbrowse_details page. However, an older mechanism is available for backward compatibility.
To use this legacy mechanism, create a stanza header named [TagName:DETAILS], where TagName is the name of the tag (attribute name) whose values you wish to turn into URLs, and where DETAILS must be spelled with capital letters. Put the option ``URL'' inside this stanza, containing a string to be transformed into the URL.
For example, to link to a local cgi script from the following GFF line:
IV curated exon 518 550 . + . Transcript B0273.1; local_id 11723
one might add the following stanza to the configuration file:
[local_id:DETAILS] URL = http://localhost/cgi-bin/localLookup.cgi?tag=$tag;id=$value
The URL option's value should be a URL containing one or more variables. Variables begin with a dollar sign ($), and are replaced at run time with the information relating to the selected feature attribute. Recognized variables are:
$tag The "tag" of the tag/value pair $value The "value" of the tag/value pair
The value of URL can also be an anonymous subroutine, in which case the subroutine will be invoked with a two-element argument list consisting of the name of the tag and its value. This example, provided by Cyril Pommier, will convert Dbxref tags into links to NCBI, provided that the value of the tag looks like an NCBI GI number:
[Dbxref:DETAILS] URL = sub { my ($tag,$value)=@_; if ($value =~ /NCBI_gi:(.+)/){ return "http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?term=$1"; } return; }
With a little bit of additional effort, you can set one or more tracks up to display a density histogram of the features contained within the track. For example, the human data source in GBrowse demo (http://www.wormbase.org/db/seq/gbrowse/human) uses density histograms in the chromosomal overview. In addition, when the features in the SNP track become too dense to view, this track converts into a histogram. To see this in action, turn on the SNP track and then zoom out beyond 150K.
There are four steps for making histograms:
The first step is to generate the density data. Currently this is done by generating a GFF file containing a set of ``bin'' feature types. Use the bp_generate_histogram.pl script to do this. You will find it in bioperl under the scripts/Bio-DB-GFF directory.
Assuming that your database is named ``dicty'', you have a feature named SNP, and you wish to generate a density distribution across 10,000 bp bins, here is the command you would use:
bp_generate_histogram.pl -merge -d dicty -bin 10000 SNP >snp_density.gff
This is saying to use the ``dicty'' database (-d) option, to use 10,000 bp bins (the -bin option) and to count the occurrences of the SNP feature throughout the database. In addition, the -merge option says to merge all types of SNPs into a single bin. Otherwise they will be stratified by their source. The resulting GFF file contains a series of entries like these ones:
Chr1 SNP bin 1 10000 49 + . bin Chr1:SNP Chr1 SNP bin 10001 20000 29 + . bin Chr1:SNP
What this is saying is that there are now a series of pseudo-features of type ``bin:SNP'' that occupy successive 10,000 bp regions of the genome. The score field contains the number of times a SNP was seen in that bin.
You'll now load this file using load_gff.pl or fast_load_gff.pl:
load_gff.pl -d dicty snp_density.gff
The next step is to tell GBrowse how to use this information. You do this by creating a new aggregator for the SNP density information. Open the GBrowse configuration file and find the aggregators option. Add a new aggregator that looks like this:
aggregators = snp_density{bin:SNP}
This is declaring a new feature named ``snp_density'' that is composed of subparts of type bin:SNP.
The last step is to declare a track for the density information. You will use the ``xyplot'' glyph, which can draw a number of graphs, including histograms. To add the SNP density information as a static track in the overview, create a section like this one:
[SNP:overview] feature = snp_density glyph = xyplot graph_type = boxes scale = right bgcolor = red fgcolor = red height = 20 key = SNP Density
This is declaring a new constant track in the overview named ``SNP Density.'' The feature is ``snp_density'', corresponding to the aggregator declared earlier. The glyph is ``xyplot'' using the graph type of ``boxes'' to generate a column graph.
To set up a track so that the histogram appears when the user zooms out beyond 100,000 bp but shows the detailed information at higher magnifications, generate two track sections like these:
[SNPs] feature = snp glyph = triangle point = 1 orient = N height = 6 bgcolor = blue fgcolor = blue key = SNPs
[SNPs:100000] feature = snp_density glyph = xyplot graph_type = boxes scale = right
The first track section sets up the defaults for the SNP track. SNPs are represented as blue triangles pointing North. The second track declaration declares that when the user zooms out to over 100K base pairs, GBrowse should display the snp_density feature using the xyplot glyph.
GBrowse is partially internationalized. End-users whose browsers are set to request a non-English language will see the GBrowse main and secondary screens in their preferred language, provided that GBrowse has the appropriate translation file.
Translation files are located in gbrowse.conf/languages/ and use the standard two-letter language abbreviations, such as ``fr'' for French, as well as the regional abbregiations, such as fr-CA for Canadian French. Currently there are translation files for French, Italian, and Japanese. If your favorite language isn't supported, you are encouraged to create a new translation file and contribute it to the GBrowse development effort. Please contact Lincoln Stein (lstein@cshl.org) for help in doing this.
If the end user does not specify a preferred language, GBrowse will default to ``en'' (English). You can change this by placing a ``language'' option in the configuration file somewhere inside the [GENERAL] section. For example, to make Japanese the default, create this entry:
language = ja
GBrowse will still use the end-user's preferred language in preference to the default if the preferred language is available.
Although GBrowse automatically changes the text and button language, it can't automatically translate the track labels. If you would like the track labels to localize, you will have to provide your own translations in the ``key'', ``citation'' and ``category'' options. The syntax is similar to that used for semantic zooming:
[gene] glyph = transcript feature = transcript:curated height = 10 key = Named Gene key:fr = Gènes Nommés key:it = I Geni dati un nome a key:sp = Los Genes denominados category = Genes category:fr = Gènes
The option is followed by a colon and the two-letter language name to indicate that when the page is being displayed with this language, to use the indicated value of the option. The option without the colon is the default. You may enter accented and umlauted characters directly, as shown, or use the HTML entities. Non-English character sets, such as Japanese, should also work correctly, provided that the translation file indicates the correct character set to use.
HELP FILES:
The GBrowse help files are in English. Although there is support for internationalizing the hep files, no one has done this yet. If you are industrious and wish to translate the help files into your favorite language, find the two help files where they are located in htdocs/gbrowse/. One is named general_help.html, while the other is named annotation_help.html. Translate them, and create new files with the language prefix appended to the end. For example, the French translation of annotation_help.html would be annotation_help.html.fr.
LIMITATIONS:
- There is no localization support. For example, GBrowse will print large numbers using commas (e.g. 1,234,567) instead of periods, even when talking to a European browser.
- Although the HTML frame around the GBrowse genome image will use the appropriate character set, the overview and detail images themselves are limited to Latin alphabets. This is because of limited native character support in the GD library used by GBrowse. When a non-Latin character set is called for, such as Japanese, GBrowse will use Japanese for the frame, but English for the image.
- The rate at which the GBrowse team adds new features to the browser often outstrips the ability of volunteers to update the translation files. This means that new buttons and fields may be displayed in English on an otherwise correctly internationalized page.
You can restrict who has access to gbrowse by IP address, host name, domain or username and password. Restriction can apply to the database as a whole, or to particular annotation tracks.
To limit access to a whole database, you can use Apache's standard authentication and authorization. Gbrowse uses a URL of this form to select which database it is set to:
http://your.host/cgi-bin/gbrowse/your_database
where ``your_database'' is the name of the currently selected database. For example, the yeast database is http://your.host/cgi-bin/gbrowse/yeast.
To control access to the entire database, create a <Location> section in httpd.conf. The <Location> section should look like this:
<Location /cgi-bin/gbrowse/your_database> Order deny,allow deny from all allow from localhost .cshl.edu .ebi.ac.uk </Location>
This denies access to everybody except for ``localhost'' and browsers from the domains .cshl.edu and .ebi.ac.uk. You can also limit by IP address, by username and password or by combinations of these techniques. See http://httpd.apache.org/docs/howto/auth.html for the full details.
You can also limit individual tracks to certain individuals or organizations. Unless the stated requirements are met, the track will not appear on the main screen or any of the configuration screens. To set this up, add a ``restrict'' option to the track you wish to make off-limits:
[PROPRIETARY] feature = etc glyph = etc restrict = Order deny,allow deny from all allow from localhost .cshl.edu .ebi.ac.uk
The value of the restrict option is identical to the Apache authorization directives and can include any of the directives ``Order,'' ``Satisfy,'' ``deny from,'' ``allow from,'' ``require valid-user'' or ``require user.'' The only difference is that the ``require group'' directive is not supported, since the location of Apache's group file is not passed to CGI scripts. Note that username/password authentication must be turned on in httpd.conf and the user must have successfully authenticated himself in order for the username to be available.
As with other gbrowse options, restrict can be a code subroutine. The subroutine will be called with three arguments consisting of the host, ip address and authenticated user. It should return a true value to allow access to the track, or a false value to forbid it. This can be used to implement group-based authorization or more complex schemes.
Here is an example that uses the Text::GenderFromName to allow access if the user's name sounds female and forbids access if the name sounds male. (It might be useful for an X-chromosome annotation site.)
restrict = sub { my ($host,$ip,$user) = @_; return unless defined $user; use Text::GenderFromName qw(gender); return gender($user) eq 'f'; }
You should be aware that the username will only be defined if username authentication is turned on and the user has successfully authenticated himself against Apache's user database using the correct password. In addition, the hostname will only be defined if HostnameLookups have been turned on in httpd.conf. In the latter case, you can convert the IP address into a hostname using this piece of code:
use Socket; $host = gethostbyaddr(inet_aton($addr),AF_INET);
Note that this may slow down the response time of gbrowse noticeably if you have a slow DNS name server.
Another thing to be aware of when restricting access to an entire database is that that even though the database itself will not be accessible to unauthorized users, the name of the database will still be available from the popup ``Data Source'' menu. If you wish even the name to be suppressed from view by unauthorized users, add the following line to the [GENERAL] section of the configuration file of the database you wish to suppress:
restrict = require valid-user
The syntax described earlier for restricting access to tracks by hostname, IP address or username holds true for restricting the visibility of the database on the Data Source popup menu.
GBrowse can be tweaked to make it more suitable for displaying genetic and radiation hybrid maps.
The main issue is that the Bio::DB::GFF database expects coordinates to be positive integers, not fractions, but genetic and RH maps use floating point numbers. Working around this is a bit of an ugly hack. Before loading your data you must multiply all your coordinates by a constant power of 10 in order to convert them into integers. For example, if a genetic map uses Morgan units ranging from 0 to 1.80, you would multiple by 100 to create a map in ranging from 0 to 180.
Create a GFF file containing the markers in modified coordinates and load it as usual. Now you must tell GBrowse to reverse these changes. Enter the following options into the [GENERAL] section of the configuration file:
units = M unit_divider = 100
These two options tell GBrowse to use ``M'' (Morgan) units, and to divide all coordinates by 100. GBrowse will automatically display the scale using the most appropriate units, so the displayed map will typically be drawn using cM units.
If you wish to change the location of the gbrowse.conf configuration file directory, you must manually edit the gbrowse CGI script. Open the script in a text editor, and find this section:
################################################################### # Non-modperl users should change this variable if needed to point # to the directory in which the configuration files are stored. # use constant CONF_DIR => '/usr/local/apache/conf/gbrowse.conf'; # ###################################################################
Change the definition of CONF_DIR to the desired location of the configuration files.
An alternative, for users of mod_perl only, is to add the GBrowseConf per-directory variable to the configuration for the directory in which the gbrowse script lives. This variable overrides the CONF_DIR value. For example:
<Directory /usr/local/apache/cgi-perl> SetHandler perl-script PerlHandler Apache::Registry PerlSendHeader On Options +ExecCGI PerlSetVar GBrowseConf /etc/gbrowse.conf </Directory>
You may insert features from a DAS source into any named track. Create a stanza as usual but instead of specifying the feature type using the ``feature'' option, give the desired DAS URL using the ``remote feature'' option:
remote feature = http://dev.hapmap.org/cgi-perl/das/t2d_testing?type=ldblock
Because DAS sources specify the glyph and visualization options, most of the settings such as bgcolor will be ignored. However, the track key and citation options are honored.
You can use the same syntax to load a GFF file or a feature file in Gbrowse upload format into a track. Just provide a URL that returns the desired data.
You can also run GBrowse entirely off a single DAS source. To get this support, you must use Bio::Das version 0.90 or higher, available from http://www.biodas.org.
A sample [GENERAL] configuration section looks like this:
[GENERAL] description = Das Example Database (dicty) db_adaptor = Bio::Das db_args = -source http://www.biodas.org/cgi-bin/das -dsn dicty
The db_adaptor option must be set to ``Bio::Das''. The db_args option must contain a -source pointing to the base of the remote DAS server, and a -dsn pointing to the name of the annotation database.
The remainder of the configuration file should be configured as described earlier. The following short script will return a list of the feature types known to the remote DAS server. You can use the output of this script as the basis for the tracks to configure.
#!/usr/bin/perl
use strict;
use Bio::Das; my $db = Bio::Das->new('http://localhost/cgi-bin/das'=>'dicty'); print join "\n",$db->types;
Limitations:
The DAS implementation does not descend into subcomponents. For example, if the user requests features on a chromosome, but the remote DAS server has annotated genes using contig coordinates, then the genes will not appear on the chromosome.
The gbrowse_details script does not provide useful information because the DAS/1 protocol does not provide a way to retrieve attribute information on a named feature.
The BioMOBY project aims to design and deploy platforms that enable and simplify biological database interoperability.
To date, the MOBY-Services (MOBY-S) branch of the BioMOBY project has published a fairly stable API that is now being used by data providers worldwide to publish their data in an interoperable manner. A simple MOBY browser has been written for Gbrowse that allows the end-user to ``surf'' out of their Gbrowse view and begin exploring data related to the genomic features displayed in Gbrowse.
Configuration of the gbrowse_moby script does, at this time, require some VERY simple code-editing, and small modifications to your XX.organism.conf configuration file. These are described in detail below:
[ORIGIN] link = http://yoursite.com/cgi-bin/gbrowse_moby?source=$source&name=$name&class=$class&method=$method&ref=$ref&description=$description feature = origin:Sequence glyph = anchored_arrow fgcolor = orange font2color = red linewidth = 2 height = 10 description = 1 key = Definition line link_target = _MOBY
AND/OR
[db_xref:DETAILS] URL = http://yoursite.com/cgi-bin/gbrowse_moby?namespace=$tag;id=$value
Note that all you are doing in each case is to associate a mouse click on a particular feature type with an invocation of the gbrowse_moby script, passing a few of the common Gbrowse variables in the GET string.
The gbrowse_moby script will take information passed from a click on a Gbrowse feature, or a click on a configured DETAILS GFF attribute type, and initiate a MOBY browsing session with information from that link. Most information is discarded. The only useful information to MOBY is a ``namespace'' and an ``id'' within that namespace.
Generally speaking, namespaces in Gbrowse will have to be mapped to a namespace in the MOBY namespace ontology (which is derived from the Gene Ontology Database Cross-Reference Abbreviations list). Currently, this requires editing of the gbrowse_moby code, where a Perl hash named %source2namespace maps the GFF source (column 2) to a MOBY namespace:
$source2namespace{$source} = moby_namespace
cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/moby login
When prompted for a password, type ``cvs''.
cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/moby co moby-live cvs update -dP
You will then need to enter the moby-live/Perl folder and run ``perl Makefile.PL; make; make install'' to install the MOBY libraries into your system.
* $source - converted into a MOBY namespace by parsing the 'source' GFF tag against the %source2namespace hash. (see more detailed explanation in the examples below) $namespace - used verbatim as a valid MOBY namespace * $name - used verbatim as a MOBY id interpreted in the namespace * $id - used verbatim as a MOBY id interpreted in the namespace * $class - this is the GFF column 9 class; used for the page title $objectclass - this should be a MOBY Class ontology term (becomes Class 'Object' by default, and this is usually correct) $object - contains the raw XML of a valid MOBY object
Note that you MUST at least pass a namespace-type variable (source/namespace) and an id-type variable (name/id) in order to have a successful MOBY call.
A22344 Genbank origin 1000 2000 87 + . You would set your configuration file as follows: [ORIGIN] link = http://yoursite.com/cgi-bin/gbrowse_moby?source=$source&name=$name&class=$class feature = origin:Genbank
and you would edit the gbrowse_moby script as follows:
my %source2namespace = ( # GFF-source MOBY-namespace 'Genbank' => 'NCBI_Acc', );
this maps the GFF source tag ``Genbank'' to the MOBY namespace ``NCBI_Acc''
A22344 Genbank origin 1000 2000 87 + . Locus CDC23
You would set your configuration file as follows:
[ORIGIN] link = http://yoursite.com/cgi-bin/gbrowse_moby?source=$source&name=$name&class=$class feature = origin:Genbank
and you might also set a DETAILS call to handle the Locus Xref: (notice that we use the 'source' tag to force a translation of the foreign namespace into a MOBY namespace)
[db_xref:DETAILS] URL = http://brie4.cshl.org:9320/cgi-bin/gbrowse_moby?source=$tag;id=$value
then to handle the mapping of Locus to YDB_Locus as well as the Genbank GFF source tag you would edit the source2namespace hash in gbrowse_moby to read:
my %source2namespace = ( # GFF-source MOBY-namespace 'Genbank' => 'NCBI_Acc', 'Locus' => 'YDB_Locus', );
A22344 Genbank origin 1000 2000 87 + . NCBI_gi 118746
You would set your configuration file as follows:
[ORIGIN] link = http://yoursite.com/cgi-bin/gbrowse_moby?source=$source&name=$name&class=$class feature = origin:Genbank
and you might also set a DETAILS call to handle the NCBI_gi Xref: (notice that we now use the 'namespace' tag to indicate that the tag is already a valid MOBY namespace)
[db_xref:DETAILS] URL = http://brie4.cshl.org:9320/cgi-bin/gbrowse_moby?namespace=$tag;id=$value
Since there is no need to map the namespace portion, we now only need to handle the Genbank GFF source as before:
my %source2namespace = ( # GFF-source MOBY-namespace 'Genbank' => 'NCBI_Acc', );
http://mobycentral.cbr.nrc.ca/cgi-bin/types/Namespaces
-A useful mapping to make is to put the organism name into the Global_Keyword namespace. This will trigger discovery of MedLine searches for papers about that organism.
A selection of services are distributed with the Gbrowse package that will allow you to serve your underlying data using the BioMOBY Services architecture.
To enable these, simply do the following:
-The full listing of valid MOBY namespaces is available at:
http://mobycentral.cbr.nrc.ca/cgi-bin/types/Namespaces
Locus = TAIR_Locus
to this section of the config file. This will allow people who have TAIR_Locus identifiers in-hand to discover your service and request information about that locus from your database.
You may add as many Namespace->Class mappings as you wish; one per line.
perl register_moby_services.pl -register
As services are registered they will be added to a file: registeredMOBYServices.dat. This file is used to de-register your services if you wish to do so. To deregister, simply run:
perl register_moby_services.pl -clean
If your .dat file is not available, cleaning your services will be unsuccessful.
GBrowse provides a method to filter the contents of individual tracks based on information that can be obtained from feature attributes. For example, suppose you have performed a blast and added all hits as similarity features on an entry. In gbrowse, all those features can get a little crowdy. The administrator can decide to show only the top 5 of the blast hits. This can easily be accomplished by adding the filter option in the conf file. It might look like this:
[BLAST] feature = blast glyph = segments filter = sub { my $feat = shift; (my $rank) = $feat->get_tag_values('rank'); # persistent Bio::SeqFeature::Generic features #(my $rank) = $feat->attributes('rank'); # Bio::DB::GFF::Feature $rank < 6; }
Another useful example is to show features coming from a plain genbank file. When loaded into BioSQL the source becomes 'EMBL/Genbank/SwissProt'. Using the Bio::DB::Das::BioSQL adaptor you have to pass the source to the feature option. It can be rather difficult to distinguish all the features when they all have the same source string. This problem can be solved using the filter option. In the following example the difference between the features is done based on the primary_tag
[REGION] feature = EMBL/GenBank/SwissProt filter = sub { my $feat = shift; $feat->primary_tag =~ /region/i; } key = RefSeq Protein Domains [SIGPEPTIDE] feature = EMBL/GenBank/SwissProt filter = sub { my $feat = shift; $feat->primary_tag =~ /sig_peptide/i; } key = RefSeq Signal Peptide
This section describes the public CGI parameters recognized by GBrowse. By setting the parameters in the URL, you can get gbrowse to do various useful things:
http://www.your.site/cgi-bin/gbrowse/volvox http://www.your.site/cgi-bin/gbrowse/yeast http://www.your.site/cgi-bin/gbrowse/my_testing_database
These will correspond to config files named volvox.pm, yeast.pm and my_testing_database.pm respectively.
As noted earlier, you can place numbers in front of the configuration file names in order to adjust the order in which they appear in the data source menu.
NOTE: For obscure reasons involving Internet Explorer compatibility, gbrowse will add an extra slash to the end of the URL, resulting in URLs that look like:
http://www.your.site/cgi-bin/gbrowse/yeast/?q=NAB2
Don't worry about this. The URL works the same with and without the terminal slash.
http://www.your.site/cgi-bin/gbrowse/yeast?q=NAB2
This will have the same effect as typing ``NAB2'' into the gbrowse search box.
To go immediately to the multiple hits page (which shows hits on several overview panels), use multiple q arguments:
http://www.your.site/cgi-bin/gbrowse/yeast?q=NAB2;q=NPY1
Alternatively, you can use a single q parameter and separate each landmark name with a dash:
http://www.your.site/cgi-bin/gbrowse/yeast?q=NAB2-NPY1
The rules for specifying relative offsets and object classes are the same as in the main search field:
http://www.your.site/cgi-bin/gbrowse/yeast?q=Gene:NAB2:1..5000
The ``end'' argument is a synonym for ``stop''.
http://www.your.site/cgi-bin/gbrowse/yeast?labels=ORFs-tRNAs
To use the ``+'' character you may have to URL escape it:
http://www.your.site/cgi-bin/gbrowse/yeast?labels=ORFs%2BtRNAs
All tracks not explicitly given by the label parameter will be closed (disabled).
http://www.your.site/cgi-bin/gbrowse/yeast?enable=ORFs-tRNAs
http://www.your.site/cgi-bin/gbrowse/yeast?disable=ORFs-tRNAs
When modifying track state, the ``label'' parameter is processed first, followed by the ``enable'' parameter and the ``disable'' parameter.
h_feat=SKT5@blue
You may omit ``@color'', in which case the highlight will default to yellow. You can specify multiple h_feat arguments in order to highlight several features with distinct colors.
Passing an argument of h_feature=_clear_ will clear all feature highlighting.
h_region=Chr3:200000..250000@wheat
You may omit ``@color'' in which case the highlight will default to lightgrey. You can specify multiple h_region arguments in order to highlight multiple sequence ranges with different colors.
Passing an argument of h_region=_clear_ will clear all region highlighting.
"general" open the general help page "citations" open up the track description & citation page "link_image" open the page that describes how to generate an embedded image of the current view "svg_image" the page that describes how to generate SVGs
The ``id'' argument is used to associated the upload with a session. Pick some long, hard to guess number. This will be associated stably with the uploaded file(s). To see the upload information, provide the same number in the ``id'' argument every time you access gbrowse.
Each plugin may have its own set of URL arguments. A plugin's arguments are preceded by the plugin's name. For example, the FastaDumper plugin has a parameter named ``format'' which controls the output format. So to invoke this plugin and make the output plain text, one would provide the arguments:
http://www.your.site/cgi-bin/gbrowse/yeast?q=NUT21;plugin=FastaDumper; plugin_do=Go;FastaDumper.format=text
Plugins tend not to be well documented, so you may have to read through the source code to figure out their arguments.
For further information, bug reports, etc, please consult the mailing lists at www.gmod.org. The main mailing list for gbrowse support is gmod-gbrowse@lists.sourceforge.net.
Have fun!
Lincoln Stein & the GMOD development team lstein@cshl.edu