BDVAL/Configure

From Icbwiki

(Redirected from BDVAL/Configuration)
Jump to: navigation, search

Before BDVAL can be used, some configuration is required.

Contents

R

BDVAL relies on RUtils, which in turn depends on R and Rserve. BDVAL also requires the ROCR package. Consult the R documentation for instructions on how to install optional packages into R.

ROCR

To install ROCR type the following via the R command line

 install.packages('ROCR')

Rserve

To install Rserve type the following via the R command line

 install.packages('Rserve',,'http://www.rforge.net/')

Note that if you do not specify the rforge url, you will likely get an older version of the Rserve package.

To start Rserve type the following via the R command line

 library("Rserve")
 Rserve()

Multi-threading

BDVal is designed to execute tasks in parallel where possible through the use of the Parallel Java (PJ) library. By default the Parallel Java library will attempt to use the same number of threads as there are CPUs on the host. To specify the default number of threads on the Java command line like this:

 java -Dpj.nt=n ...

where "n" us the number of threads you wish to use.

The BDVal examples using ant generally specify the default values based on the operating system. When using the BDVal ant examples and templates, you can specify the number of threads by setting the ant property "num-threads" which will get passed onto the java command line. For example:

 ant -Dnum-threads=n -f prostate-example.xml

will set the number of threads to "n".

Note that on windows operating systems 1 thread is recommended. Cases have been reported where data from the Rserve process gets mangled when using more than one thread at a time. This is not the case for Unix based systems.

Logging

BDVAL uses the Commons Logging api to display informational messages, however log4j is recommended. Consult the documentation of respective frameworks if you need to configure custom logging.

It may be helpful to enable assertions by adding "-ea" to the java command line.

Dataset Specific Configuration

Condition Identifiers (cids) files

Condition identifiers (cids) files are used to group biological samples into distinct "classes". The file format is a simple tab delimited file with two entries per line. The first column is a string that defines the name of the class the second column is a biological sample to be included in the class. The name of the sample should map to a sample found in the input dataset. The first row of the file is considered to be a description and is therefore ignored. A cids file that corresponds to the sample input data file in tmm format is available here.

 class [tab] sample

Task files

Task files are used to define distinct groups of evaluations to be executed. Task files are expected to end with the extension ".tasks". The format of the task file is a tab delimited file with 5 columns. The first column is the name associated with the evaluation. The next two columns define the classes (as defined in the corresponding cids file) to be used. The forth and fifth columns represent the number of biological samples in the classes defined in columns two and three respectively. These last two columns also serve as a way to cross check the entries cids file.

 name [tab] class1 [tab] class2 [tab] #_of_samples_in_class1 [tab] #_of_samples_in_class2
Personal tools