Elementolab/Split raw data

From Icbwiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 19:17, 16 October 2010
Ole2001 (Talk | contribs)

← Previous diff
Revision as of 19:18, 16 October 2010
Ole2001 (Talk | contribs)

Next diff →
Line 16: Line 16:
1. Based on the format of your data, type the command: 1. Based on the format of your data, type the command:
- ChIPseeqerSplitReadFiles --format='''sam''' --files="*.gz" --outputFolder=CHIP+ ChIPseeqerSplitReadFiles --format='''sam''' --files="*.gz"
The above command will split reads from all .gz file in the current directory. The above command will split reads from all .gz file in the current directory.
Line 22: Line 22:
Alternatively, you can specify a directory. All files in this directory will be used and split. Alternatively, you can specify a directory. All files in this directory will be used and split.
- ChIPseeqerSplitReadFiles --format='''sam''' --datafolder=CHIP --outputFolder=CHIP+ ChIPseeqerSplitReadFiles --format='''sam''' --datafolder=CHIP
 + 
 +An output folder can also be specified:
 + 
 + ChIPseeqerSplitReadFiles --format='''sam''' --files="*.gz" --outputFolder=CHIP
The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc. The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc.

Revision as of 19:18, 16 October 2010

Back to Elementolab/ChIPseeqer_Tutorial

SPLIT RAW DATA

The first step in this analysis is to split read files into one read file per chromosome. Note: split_raw_data_files.pl was renamed to ChIPseeqerSplitReadFiles

ChIPseeqer supports:

  • analyzing ChIPseq datasets with or without INPUT data.
  • a variety of read file formats: bed, eland and mit. (Supported Formats)

IMPORTANT: ChIP reads and input DNA read files MUST be in different directories. We'll assume for this tutorial that these directories are named CHIP/ and INPUT/.

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.

1. Based on the format of your data, type the command:

ChIPseeqerSplitReadFiles --format=sam --files="*.gz" 

The above command will split reads from all .gz file in the current directory.

Alternatively, you can specify a directory. All files in this directory will be used and split.

ChIPseeqerSplitReadFiles --format=sam --datafolder=CHIP 

An output folder can also be specified:

ChIPseeqerSplitReadFiles --format=sam --files="*.gz" --outputFolder=CHIP

The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc.

--format=STR         define the format of your raw data. This option can be one of the following: bed, eland, exteland, mit, sidow, sam, export. 
--datafolder=DIR     define the folder that has your raw data.
--files="FILES"      directly specifies the files that have your raw data. Use "" to avoid shell expansion (ie replacement of * by all matching files)
--outputfolder=DIR   define the folder that the reads per chromosome will be created.
--verbose=INT        verbose mode (prints out the number of reads)

2. If you have input DNA reads, repeat Step 2 with the --datafolder and --outputfolder pointing to your INPUT folder.

Personal tools