BDVAL

From Icbwiki

Jump to: navigation, search

Contents

News

http://bdval.org/ now provides for easy access to the project home page. A user guide and source code is provided on the main web site. This wiki will continue to offer project documentation.

Overview

BDVAL is an acronym for Biomarker Discovery and VALidation. BDVAL is an open source project for biomarker discovery in high-throughput datasets. BDVal can process microarray and proteomics datasets to discover and validate biomarkers. The program is distributed under the GNU General Public License (GPL).

  • BDVal directly supports many kinds of classifiers: it can train weka and libSVM classifiers.
  • BDVal supports various feature selection strategy and validation protocols: SVM weights (Support Vector Machine), recursive feature elimination, genetic algorithm wrappers, T-Test, Fold-Change, and any sensible combination of these strategies. Leave one out, stratified cross validation with random repeats are all supported.
  • BDVal leverages biological information: gene lists and pathway information can be used during for a priori feature selection or feature aggregation.
  • BDVal is a high-performance program: it takes advantage of multi-threaded machines transparently.
  • BDVal is highly portable: it runs on a laptop computer or a multi-processor SMP machine without recompilation (thanks to Java)
  • BDVal output is fully reproducible: all steps of discovery and validation are automated. Random seeds can be controlled. The program generates detailed validation statistics and detailed model information output. Results are fully reproducible.
  • BDVal is robust: the program has been used in the MAQC-II community evaluation of biomarker discovery approaches.

Flexibility and modularity of the BDVAL software design allows various feature selection methods to be combined easily. Feature selection strategies can be automated. This is done by creating a "sequence" file which describes the strategy and executing the sequence with a dataset and classification parameters.

Modularity

Download the software

BDVAL is distributed under the GNU General Public License (GPL). The latest version of BDVal can be downloaded following instructions provided on the BDVal installation page.

If you already know how to install, you can just get the latest distribution from the BDVal download page

Classifier support

BDVal supports SVM classifiers (implemented with libSVM) or weka classifiers. SVM classifiers are used by default. The user can pass training parameters to the classification engine with the BDVal --classifier-parameters option.

The libSVM classifier will aim to produce probability estimates by default. This option is computationally expensive because cross-validation is used to fit the decision value of the SVM to a sigmoid curve. Another problem with libSVM probability estimates is that the estimates are generated using a random generator seeded with time. This makes it impossible to reproduce the probabilities between two rounds of model training and can really confuse model performance comparisons. For these reasons, we recommend to disable probability estimates when training with libSVM. To disable probability estimates, use the BDVal option --classifier-parameters probability=false

To use a weka classifier, use the --classifier and --classifier-parameters options. The following will cause BDVal to train a NaiveBayes classifier:

  --classifier edu.cornell.med.icb.learning.weka.WekaClassifier 
  --classifier-parameters wekaClass=weka.classifiers.bayes.NaiveBayesUpdateable

Please note that weka classifiers cannot be used with svm-weight, iterative-svm-weight, and other modes that select features by ranking them by support vector machine weight. Genetic algorithm feature selection combined with t-test or fold-change or other statistics should be used to select features for weka classifiers.

Feature Selection Methods

BDVal supports various feature selection methods. Several methods can be combined into a feature selection strategy. For instance, it is common to filter features by T-test and fold-change. This can be achieved in BDVal by running the T-Test feature selection method with a confidence threshold, save the resulting gene/feature list and reduce the list further by running Fold-Change selection.

The following feature selection methods are supported:

It is also possible to filter features using the following methods:

Input Datasets

Various file formats including GEO, Iconix, TMM, Cologne and others are supported by BDVal. Details are provided here.

Configuration details including dataset specific information (i.e., cids and task files) can be found here.

Worked Example

A complete example using GEO dataset GSE8402 is available here.

For Developers

BDVAL is designed as an extensible framework that can incorporate new feature selection methods, classifiers, and other options over time. An example of adding a new mode to BDVAL here.

Additional Information

Personal tools