Elementolab/Split raw data
From Icbwiki
Back to Elementolab/ChIPseeqer_Tutorial
SPLIT RAW DATA (ALIGNED READS)
The first step in this analysis is to split read files into one read file per chromosome. Note: split_raw_data_files.pl was renamed to ChIPseeqerSplitReadFiles
ChIPseeqer supports:
- analyzing ChIPseq datasets with or without INPUT data.
- a variety of read file formats: bed, eland, sam and bam. (Supported Formats)
IMPORTANT: ChIP reads and input DNA read files MUST be in different directories. We'll assume for this tutorial that these directories are named CHIP/ and INPUT/.
To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.
1. Based on the format of your data, type the command:
ChIPseeqerSplitReadFiles --format=sam --files="*.gz"
The above command will split reads from all .gz file in the current directory.
Alternatively, you can specify a directory. All files in this directory will be used and split.
ChIPseeqerSplitReadFiles --format=sam --datafolder=CHIP
An output folder can also be specified:
ChIPseeqerSplitReadFiles --format=sam --files="*.gz" --outputFolder=CHIP
The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc.
--format=STR define the format of your raw data. This option can be one of the following: bed, eland, exteland, mit, sidow, sam, export. --datafolder=DIR define the folder that has your raw data. --files="FILES" directly specifies the files that have your raw data. Use "" to avoid shell expansion (ie replacement of * by all matching files) --outputfolder=DIR define the folder that the reads per chromosome will be created. --verbose=INT verbose mode (prints out the number of reads)
2. If you have input DNA reads, repeat Step 2 with the --datafolder and --outputfolder pointing to your INPUT folder.
