BDVAL/Predict

From Icbwiki

Jump to: navigation, search

This mode is used to make predictions for unlabeled samples. Can be used to evaluate predictions on an independent test set. True labels can be provided for purpose of evaluation in the form of a cids file. It is implemented by org.bdval.Predict.

Prediction Table Files

Prediction table files are tab delimited and contain one line for each prediction evaluated on a sample. Prediction files record a unique identifier for each split (SplitId) and for each repeat (RepeatId). These identifiers exactly match the partitions described in the split plan files.

Identifier Description
splitId Split Id matching the split plan which was used in the evaluation.
splitType Split type matching the split plan which was used in the evaluation. All predictions must been done in test folds, so all the splitType values should be “test” in a prediction table file.
repeatId RepeatId matching the split plan which was used in the evaluation.
modelFilenamePrefix The name of the model which was evaluated to produce the prediction table.
sampleIndex Index of the sample in the complete input file. Matches the split plan which was used in the evaluation.
sampleId Identifier for the sample described in the split. Matches the split plan which was used in the evaluation.
predictedLabel Label which the model predicted for the sample (numeric, output of the classifier).
predictedSymbolicLabel Symbolic label predicted for the sample. A string which matches the symbolic class label used for training the model (e.g., "NLT", "LT").
probabilityOfPredictedClass Probability of the predicted class, according to the model, or raw decision value of the classifier for non-probabilistic classifiers. For support vector machines, this is the distance from the trained hyperplane and has arbitrary units.
probabilityClass1 Probability of the numeric class 1 for this sample, according to the model. Use this field only for probabilistic models. It is simply 1- probabilityOfPredictedClass when predictedLabel = 0 and probabilityOfPredictedClass when predictedLabel = 1. Warning, this field makes no sense when a raw decision value is reported for probabilityOfPredictedClass.
trueLabel True class label for the sample, according to the split plan.
numericTrueLabel Numeric equivalent of the true sample label.
correct Whether the prediction is correct. Values are "correct" or "incorrect".
modelNumFeatures Number of features in the model (constant across the entire prediction table file).

Mode Parameters

The following options are available in this mode

Flag Arguments Required Description
--model model yes Model filename prefix. Models have several files named with a common prefix. The model that will be used to predict the label of the samples in the input file.
--pathway-components-dir pathway-components-dir no Directory where pathway components will be stored. (default: pathway-components)
--test-samples test-samples no Filename for list of test sample ids. Path to a file with one line per test sample id. The input dataset will be filtered to keep only those samples in the list for prediction and performance calculation.
--print-stats n/a no Print statistics instead of detailed result table.
--true-labels true-labels no True labels for this dataset, in the cids format. Providing true labels makes it possible to report evaluation measures on the test set.
Personal tools