Elementolab/Split raw data

From Icbwiki

Revision as of 19:19, 16 October 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Back to Elementolab/ChIPseeqer_Tutorial

SPLIT RAW DATA (ALIGNED READS)

The first step in this analysis is to split read files into one read file per chromosome. Note: split_raw_data_files.pl was renamed to ChIPseeqerSplitReadFiles

ChIPseeqer supports:

  • analyzing ChIPseq datasets with or without INPUT data.
  • a variety of read file formats: bed, eland and mit. (Supported Formats)

IMPORTANT: ChIP reads and input DNA read files MUST be in different directories. We'll assume for this tutorial that these directories are named CHIP/ and INPUT/.

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.

1. Based on the format of your data, type the command:

ChIPseeqerSplitReadFiles --format=sam --files="*.gz" 

The above command will split reads from all .gz file in the current directory.

Alternatively, you can specify a directory. All files in this directory will be used and split.

ChIPseeqerSplitReadFiles --format=sam --datafolder=CHIP 

An output folder can also be specified:

ChIPseeqerSplitReadFiles --format=sam --files="*.gz" --outputFolder=CHIP

The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc.

--format=STR         define the format of your raw data. This option can be one of the following: bed, eland, exteland, mit, sidow, sam, export. 
--datafolder=DIR     define the folder that has your raw data.
--files="FILES"      directly specifies the files that have your raw data. Use "" to avoid shell expansion (ie replacement of * by all matching files)
--outputfolder=DIR   define the folder that the reads per chromosome will be created.
--verbose=INT        verbose mode (prints out the number of reads)

2. If you have input DNA reads, repeat Step 2 with the --datafolder and --outputfolder pointing to your INPUT folder.

Personal tools