Support for the chado database schema

The chado schema ( ) is a comprehensive database schema developed largely by developers at UC Berkeley and Harvard working on FlyBase. It is intended to be a generic database schema for model organism use. Its use with GBrowse is supported via a limited implementation of the Das interface from BioPerl. It is limited in that I implemented only what I needed of the interface and nothing more.

The chado adaptor works through three perl modules included in this distribution:


These files are installed the in the BioPerl infastructure when 'make install' is run.

In addition to the standard chado schema, this adaptor requires a few additional views and functions. These are found in two files in the chado CVS or in a gmod distribution. These are:


The easiest way to get these into the chado schema is include them when building the chado schema from a gmod release during `perl Makefile.PL`. It is currently included by default when the schema is built this way. If you already have a chado instance and want to add these items, the easiest way to do that is to cat the files to stdout and pipe that to a psql command:

  % cat sequence-gff-views.sql   | psql <chado-database-name>
  % cat sequence-gff-funcs.pgsql | psql <chado-database-name>

A sample chado configuration file is included in contrib/conf_files/. Since chado uses the Sequence Ontology for its controlled vocabulary, it is quite likely that this configuration file should work for any instance of chado once the database-specific parameters are set. Also, depending on what the ``reference type'' is (usually something like 'chromosome' or 'contig'), the line in the configuration for reference class will need to be modified to agree with your data.

After the tables are created, the user that is running Apache must be granted privileges to select on several tables. Usually that user is 'nobody', although on RedHat systems using RPM installed Apache the user is 'apache'. First create that user in Postgres, then in the psql shell grant select permissions:

  CREATE USER nobody;
  GRANT SELECT ON feature_synonym      TO nobody;
  GRANT SELECT ON synonym              TO nobody;
  GRANT SELECT ON feature_dbxref       TO nobody;
  GRANT SELECT ON dbxref               TO nobody;
  GRANT SELECT ON feature              TO nobody;
  GRANT SELECT ON featureloc           TO nobody;
  GRANT SELECT ON cvterm               TO nobody;
  GRANT SELECT ON feature_relationship TO nobody;
  GRANT SELECT ON cv                   TO nobody;
  GRANT SELECT ON gffatts              TO nobody;
  GRANT SELECT ON feature_cvterm       TO nobody;
  GRANT SELECT ON feature_gcontext     TO nobody;
  GRANT SELECT ON gcontext             TO nobody;
  GRANT SELECT ON featureprop          TO nobody;
  GRANT SELECT ON pub                  TO nobody;
  GRANT SELECT ON feature_pub          TO nobody;
  GRANT SELECT ON db                   TO nobody;

Creating a configuration file

The GBrowse configuration file for a chado database is the same format as for any other data source, but there are a few notes specific to chado for GBrowse configuration files. A sample configuration file called 07.chado.conf is included in the contrib/conf_files directory of this distribution, and is installed in $HTDOCS/gbrowse/contrib/conf_files.

Two items specific to chado that must go into the configuration file:

Reference class
The reference class in configuration file must be the Sequence Ontology- Feature Annotation (SOFA) type that is the feature type in chado that is the foundation type, like 'chromosome', 'region' or 'contig', the the other features in the database are on.

Aggregators must not be used with the chado adaptor, as they aren't needed and don't make sense in this context. They are used in Bio::DB::GFF to construct complex biological objects out of the flat data in GFF files, for example, attaching exons to their mRNA. In chado, this isn't necessary since the relationship between features is clearly defined in the feature_relationship table, and that information is automatically obtained by the chado adaptor.

You can add db args to fine tune some parameters of the adapter:
  db_args      = -dsn yourConnectString
                 -enable_seqscan 0
                 -srcfeatureslice 1
                 -do2Level 1

enable_seqscan : set this to 0 to force the use of your indexes if your dba hasn't set it globally. It triggers a 'set enable_seqscan=0' sql command if set to 0, and nothing otherwise.

srcfeatureslice : toggle the use of featureloc_slice(srcfeat_id, int, int) instead of featureslice(int, int). featureloc_slice is part of the 0.01 version of chado

do2Level : prefectchs the direct kids of a feature, thus avoiding subsequents queries. It provides a slight performances boost in most cases.


If you encounter any bugs or problems with this chado adaptor, please let me know.

Scott Cain 2005/03/10