Elementolab/Split raw data
From Icbwiki
Back to Elementolab/ChIPseeqer_Tutorial
SPLIT RAW DATA (ALIGNED READS)
The first step in this analysis is to split read files into one read file per chromosome. Note: split_raw_data_files.pl was renamed to ChIPseeqerSplitReadFiles
ChIPseeqer supports:
- analyzing ChIPseq datasets with or without INPUT data.
- a variety of read file formats: bed, eland, sam and bam. (Supported Formats)
IMPORTANT: ChIP reads and input DNA read files MUST be in different directories. We'll assume for this tutorial that these directories are named CHIP/ and INPUT/.
To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.
1. Based on the format of your data, type the command:
ChIPseeqerSplitReadFiles --format=sam --files="*.gz"
The above command will split reads from all .gz file in the current directory.
Alternatively, you can specify a directory. All files in this directory will be used and split.
ChIPseeqerSplitReadFiles --format=sam --datafolder=CHIP
An output folder can also be specified:
ChIPseeqerSplitReadFiles --format=sam --files="*.gz" --outputFolder=CHIP
The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc.
--format=STR Define the format of the aligned data. Supported formats: bam, sam eland, exteland, export, bed. --datafolder=DIR Define the directory with the aligned data files. All files in the directory will be merged and split. --files="FILES" Directly specify the files that have the aligned data data. Use "" to avoid shell expansion (i.e., replacement of * by all matching files). --outputfolder=DIR Define the directory for the output files (i.e., one reads file per chromosome). --verbose=INT Verbose mode. Default is 0.
2. If you have input DNA reads, repeat Step 2 with the --datafolder and --outputfolder pointing to your INPUT folder.
