Elementolab/Split raw data
From Icbwiki
| Revision as of 19:17, 16 October 2010 Ole2001 (Talk | contribs) ← Previous diff |
Revision as of 19:18, 16 October 2010 Ole2001 (Talk | contribs) Next diff → |
||
| Line 16: | Line 16: | ||
| 1. Based on the format of your data, type the command: | 1. Based on the format of your data, type the command: | ||
| - | ChIPseeqerSplitReadFiles --format='''sam''' --files="*.gz" --outputFolder=CHIP | + | ChIPseeqerSplitReadFiles --format='''sam''' --files="*.gz" |
| The above command will split reads from all .gz file in the current directory. | The above command will split reads from all .gz file in the current directory. | ||
| Line 22: | Line 22: | ||
| Alternatively, you can specify a directory. All files in this directory will be used and split. | Alternatively, you can specify a directory. All files in this directory will be used and split. | ||
| - | ChIPseeqerSplitReadFiles --format='''sam''' --datafolder=CHIP --outputFolder=CHIP | + | ChIPseeqerSplitReadFiles --format='''sam''' --datafolder=CHIP |
| + | |||
| + | An output folder can also be specified: | ||
| + | |||
| + | ChIPseeqerSplitReadFiles --format='''sam''' --files="*.gz" --outputFolder=CHIP | ||
| The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc. | The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc. | ||
Revision as of 19:18, 16 October 2010
Back to Elementolab/ChIPseeqer_Tutorial
SPLIT RAW DATA
The first step in this analysis is to split read files into one read file per chromosome. Note: split_raw_data_files.pl was renamed to ChIPseeqerSplitReadFiles
ChIPseeqer supports:
- analyzing ChIPseq datasets with or without INPUT data.
- a variety of read file formats: bed, eland and mit. (Supported Formats)
IMPORTANT: ChIP reads and input DNA read files MUST be in different directories. We'll assume for this tutorial that these directories are named CHIP/ and INPUT/.
To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.
1. Based on the format of your data, type the command:
ChIPseeqerSplitReadFiles --format=sam --files="*.gz"
The above command will split reads from all .gz file in the current directory.
Alternatively, you can specify a directory. All files in this directory will be used and split.
ChIPseeqerSplitReadFiles --format=sam --datafolder=CHIP
An output folder can also be specified:
ChIPseeqerSplitReadFiles --format=sam --files="*.gz" --outputFolder=CHIP
The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc.
--format=STR define the format of your raw data. This option can be one of the following: bed, eland, exteland, mit, sidow, sam, export. --datafolder=DIR define the folder that has your raw data. --files="FILES" directly specifies the files that have your raw data. Use "" to avoid shell expansion (ie replacement of * by all matching files) --outputfolder=DIR define the folder that the reads per chromosome will be created. --verbose=INT verbose mode (prints out the number of reads)
2. If you have input DNA reads, repeat Step 2 with the --datafolder and --outputfolder pointing to your INPUT folder.
