Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes
Here we present a method for uncovering metabolic pathways that are distinctive (relevant) for complex phenotypic traits of microbial genomes.
The method is based on a set of completely sequenced and (automatically) annotated genomes and on a collection of known metabolic pathways.
For revealing the representative metabolic features of phenotypically related microorganisms, our method first assesses the
metabolic complement of each genome in form of a pathway profile. For this purpose, the method uses the
score-based pathway prediction method described below.
In a second step, machine learning techniques are applied for selecting the set of metabolic pathways
most relevant in distinguishing those genomes showing a certain phenotype from those lacking this phenotype(see below).
For demonstrating its potential, we applied our method to several microbial phenotypes. For these applications, we used a set of 266 genomes
that have been automatically annotated by the PEDANT system.
The pathway and reaction database BioPath has been used as a source of
reaction and pathway information. The data used and the software implementing our method is freely available
for non-commercial use and can be downloaded free of charge from here.
The software and the data provided are briefly described in the
following. For a more detailed description of the method and its application please see our manuscript.
The complete archive containing all software and data also includes information how to use both. For any problems with the
software or data provided, please contact us.
For each genome (out of a given list of genomes), the pathway prediction method calculates a score value for each pathway of a pathway collection. The score ranges between 0 and 1 and describes the "completeness" (1: complete; 0: no enzyme of the pathway is available) of the respective pathway in the organism under consideration. In addition to the presence and absence of enzymes (given by EC numbers), the score is weighted by the uniqueness of the enzymes for the respective pathway. (For more details please see the manuscript.) For predicting - or more precisely - for scoring pathways of specific genomes, we provide:
- a Java archive (jar) (download)
- MySQL dumps of PEDANT's (version 2) automatic EC number annotations for 266 genomes (download)
- a MySQL dump of the BioPath pathway data (download)
For predicting BioPath reference pathways for PEDANT genomes, please import the mysql dumps into mysql databases. The usage of the software is described here. The software also allows for predicting pathways based on proprietary (manual or automatic) EC number assignments. (For examples see the tar archive, which is provided for download.) The pathway profiles produced by the method can be converted to csv format using the shell script
convertProfilesFileToCsv.csh, which is also provided. The resulting csv file
can be used as input for the pathway selection procedure.
For selecting (and cross-checking) the metabolic pathways that are most distinctive for a specific phenotype,
we provide the Perl script relevantPathways.pl. This command line tool requires following data as input
(example file is provided within the software archive):
- pathway profiles of sequenced genomes (see above).
- binary (yes⁄no) phenotypic annotations for the genomes in the profiles file. (The phenotypic features of the 266 genomes used in our study are provided in a supplementary file attached to our manuscript.)
The program produces lists of pathways ranked by their relevance for the phenotype under consideration. The rankings are derived by several multivariate statistical methods (using WEKA). For estimating the significance of the association between highly ranked pathways and phenotypes, cross-check diagrams (derived by classification methods) are plotted by the script (using R). For more details please read our manuscript. The usage of the program is described here.
The pathway predictions used in our study rely on:
- BioPath: resource for pathway and reaction data
- PEDANT: resource for automatically annotated genomic information
BioPath is a database of biochemical pathways that provides access to
metabolic transformations and cellular regulations derived from the
"Biochemical Pathways" wall chart.
BioPath pathway and reaction data used in our study can be downloaded
free of charge for academic research purposes
(download;
documentation and disclaimer).
Any use of the information and data for commercial purposes requires a
license through Molecular Networks GmbH, Erlangen, Germany.
For a detailed description of BioPath please refer to:
Reitz,M., Sacher,O., Tarkhov,A., Trümbach,D. and Gasteiger,J.
Enabling the exploration of biochemical pathways. Org. Biomol. Chem. 2, 3226-3237 (2004).
Pubmed
The PEDANT genome database provides exhaustive standardized automatic analysis for a huge number of
genomic sequences by a large variety of established bioinformatics tools.
PEDANT EC number annotations used in our study can be downloaded
free of charge for academic research purposes
(download;
documentation and disclaimer).
Any use of the information and data for commercial purposes requires a
license through Biomax Informatics AG.
For a detailed description of PEDANT please refer to:
Riley et al., Nucleic Acids Res., 33(Database issue), D308-10 (2005).
Pubmed and
Walter et al., Nucleic Acids Res., 37(Database issue), D408-11 (2009).
Pubmed