BDVAL/Options

From Icbwiki

Jump to: navigation, search

This page describes the various options and modes supported by BDVal.

Global Options

The following table describes the various options that can be given to the BDVal program.

Flag Arguments Required Description
(-m | --mode) mode yes Mode of execution, one of: cross-validation, leave-one-out, svm-weights, svm-weights-iterative, t-test, fold-change, kendal-tau, ga-wrapper, write-model, predict, stats, sequence, min-max, reformat, define-splits, execute-splits
(-i | --input) filename yes Name of the input file. This file contains the measurement data used to discover markers. Supported formats for the input datasets can be found here.
(-t | --task-list) task-list yes Name of the file that describes the classification tasks. This file is tab delimited, with one line per task. First column is the input filename. Second column is the name of the first condition. Third column is the name of the second condition. Fourth column is the number of samples in the first condition. Fifth is column the number of samples in the second condition.
(-p | --platform-filenames) platform-filenames yes Comma separated list of platform filenames. Supported formats for the platform files can be found here.
(-c | --conditions) conditions yes Specify the file with the mapping condition-name column-identifier (tab delimited, with one mapping per line)
--dataset-name dataset-name no The name of the dataset being run.
--dataset-root dataset-root no The root of the dataset being run.
(-l | --classifier) classifier no Fully qualified class name of the classifier implementation. (default: edu.cornell.med.icb.learning.libsvm.LibSvmClassifier)
(-a | --classifier-parameters) classifier-parameters no Comma separated list of parameters that will be passed to the classifier. Parameters vary from one classifier to the next. Check the documentation of the classifier and the source code to see which parameters can be set.
(-o | --output) filename no Name of the output file. If not specified, or "-", output will be to the console.
--overwrite-output boolean no When true and -o is specified, the output file will be over-written. (default: false)
--model-id model-id no The model-id, created in ExecuteSplitsMode (a hash of the options) (default: no_model_id)
(-g | --gene-lists) gene-lists no Name of the file that describes the gene lists. This file is tab delimited, with one line per gene list. First column is the name of the gene list. Second column (optional) is the name of the file which describes the gene list. If the file has only one column, the name of the gene list must be full (for the full array). If the name of the gene list is random, the second field indicates how many random probesets must be selected, and a third field indicates the random seed to use for probeset selection.
--gene-list gene-list no Filename for a single gene list.
--seed seed no Seed to initialize random generator.
--pathways pathways no Filename of the pathway description information. The pathway description information is a file with one line per pathway. Each line is tab delimited. The first field provides a pathway identifier. Subsequent fields on the line are Ensembl gene ids for gene that belong to the pathway. When this option is provided, features are aggregated by pathway and computations are performed in aggregated feature space. Some aggregation algorithms may generate several aggregated features per pathway. When this option is active, the option --gene2probes must be provided on the command line.
--pathway-aggregation-method pathway-aggregation-method no Indicate which method should be used to aggregate features for pathway runs. Two methods are available: PCA or average. PCA performs a principal component analysis for the probesets of each pathway. Average uses a single feature for each pathway calculated as the average of the probeset signal in each pathway. Default is PCA. (default: PCA)
--gene-to-probes gene2probes no Filename of the gene to probe description information. The pathway description information is a file with one line per gene. Each line is tab delimited. The first field is an ensembl gene id. The second field is a probe id which measures expression of a transcript of the gene. Several lines may share the same gene id, indicating that multiple probe ids exist for the gene.
--floor floor no Specify a floor value for the signal. If a signal is lower than the floor, it is set to the floor. If no floor is provided, values are unchanged.
--two-channel-array two-channel-array no Indicate that the data is for a two channel array. This flag affects how the floor value is interpreted. For two channel arrays, values on the array are set to 1.0 if (Math.abs(oldValue-1.0) + 1) <= floorValue, whereas for one channel array the condition becomes: oldValue <= floorValue.
--scale-features scale-features no Indicate whether the features should be scaled to the range [-1 1]. If false, no scaling occurs. If true (default), features are scaled.
--percentile-scaling percentile-scaling no Indicate whether feature scaling is done with percentile and median or full range and average. When percentiles are used, the range of each feature is determined as the range of the 20-80 percentile of the data and median is used instead of the mean. (default: false)
--normalize-features normalize-features no Indicate whether the feature vectors should be normalized. If false, no normalizing occurs. If true (default), features are normalized. (default: false)
--gene-features-dir gene-features-dir no The directory where gene features files will be read from (when specified in a -gene-lists.txt file).
--output-stats-from-gene-list n/a no
--rserve-port rserve-port no The Rserve port to use (default: -1)
--process-split-id process-split-id no Restricts execution to a split id. A split execution plan must be provided as well.The split id is used together with the split plan to determine which samples should be processed. Typical usage would be "--process-split-id 2 --split-plan theplan.txt --split-type training". This would result in training samples being used that match split #2 in theplan.txt.
--split-plan split-plan no Filename for the split plan definition. See process-split-id.
--split-type split-type no Split type (i.e., training, test, feature-selection, must match a type listed in the split plan). See process-split-id.
--cache-dir cache-dir no Cache directory. Specify a directory when intermediate processed tables will be saved for faster access. (default: cache)
(-h | --help) n/a no Print help message

Mode Options

BDVAL supports the following modes of operation.

Options specific to a particular mode can be found here.

Personal tools