Elementolab/Split raw data

From Icbwiki

Jump to: navigation, search

Back to Elementolab/ChIPseeqer_Tutorial

SPLIT RAW DATA (ALIGNED READS)

The first step in this analysis is to split read files into one read file per chromosome. Note: split_raw_data_files.pl was renamed to ChIPseeqerSplitReadFiles

ChIPseeqer supports:

  • analyzing ChIPseq datasets with or without INPUT data.
  • a variety of read file formats: bed, eland, sam and bam. (Supported Formats)

IMPORTANT: ChIP reads and input DNA read files MUST be in different directories. We'll assume for this tutorial that these directories are named CHIP/ and INPUT/.

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.

1. Based on the format of your data, type the command:

ChIPseeqerSplitReadFiles --format=sam --files="*.gz" 

The above command will split reads from all .gz file in the current directory.

Alternatively, you can specify a directory. All files in this directory will be used and split.

ChIPseeqerSplitReadFiles --format=sam --datafolder=CHIP 

An output folder can also be specified:

ChIPseeqerSplitReadFiles --format=sam --files="*.gz" --outputFolder=CHIP

The above commands will create files in the CHIP directory, named reads.chr1, reads.chr2 etc.

--format=STR         Define the format of the aligned data. Supported formats: bam, sam eland, exteland, export, bed. 
--datafolder=DIR     Define the directory with the aligned data files. All files in the directory will be merged and split.
--files="FILES"      Directly specify the files that have the aligned data data. Use "" to avoid shell expansion (i.e., replacement of * by all matching files).
--outputfolder=DIR   Define the directory for the output files (i.e., one reads file per chromosome).
--verbose=INT        Verbose mode. Default is 0.

2. If you have input DNA reads, repeat Step 2 with the --datafolder and --outputfolder pointing to your INPUT folder.

Personal tools