Elementolab/ChIPseeqer Species
From Icbwiki
| Revision as of 20:52, 19 April 2011 Eug2002 (Talk | contribs) ← Previous diff |
Revision as of 20:23, 5 May 2011 Eug2002 (Talk | contribs) Next diff → |
||
| Line 57: | Line 57: | ||
| * To create this file we follow the steps [http://icb.med.cornell.edu/wiki/index.php/Elementolab/Mappability here]. | * To create this file we follow the steps [http://icb.med.cornell.edu/wiki/index.php/Elementolab/Mappability here]. | ||
| - | * Once we create the .chrdata file, | + | '''IMPORTANT:''' If the name of the species is '''x''', make sure that you name the file '''x.chrdata''' |
| - | ** we add it to the ChIPseeqer/DATA/ folder | + | |
| - | ** we update the ChIPseeqer/CommonPaths.pm file | + | |
| - | + | ||
| - | our $HG18_CHRDATA = "$ENV{CHIPSEEQERDIR}/DATA/hg18.chrdata"; | + | |
| - | our $MM9_CHRDATA = "$ENV{CHIPSEEQERDIR}/DATA/mm9.chrdata"; | + | |
| - | our $DM3_CHRDATA = "$ENV{CHIPSEEQERDIR}/DATA/dm3.chrdata"; | + | |
| - | our $SACCER_CHRDATA = "$ENV{CHIPSEEQERDIR}/DATA/sacser.chrdata"; | + | |
| - | '''our $NEW_CHRDATA = "$ENV{CHIPSEEQERDIR}/DATA/sacser.chrdata";''' | + | |
| '''2. Download the gene annotation files''' | '''2. Download the gene annotation files''' | ||
| - | * Add a new folder for the new species in $CHIPSEEQERDIR/DATA | + | * Add a new folder for the new species '''x''' in $CHIPSEEQERDIR/DATA |
| cd $CHIPSEEQERDIR/DATA | cd $CHIPSEEQERDIR/DATA | ||
| - | mkdir new_species | + | mkdir '''x''' |
| - | * Download the gene annotation file (e.g., from the UCSC Table browser) in the new folder | + | * Download the gene annotation '''file''' (e.g., from the UCSC Table browser) in the new folder |
| * Run the script: | * Run the script: | ||
| - | make_gene_annotation_files.pl --annotation=$CHIPSEEQERDIR/DATA/new_species/file | + | make_gene_annotation_files.pl --annotation=$CHIPSEEQERDIR/DATA/x/'''file''' |
| all the files needed from the ChIPseeqer framework will be created. | all the files needed from the ChIPseeqer framework will be created. | ||
| - | * We need to update the ChIPseeqer/CommonPaths.pm file and all ChIPseeqer scripts that need to use the annotation files. | + | '''IMPORTANT:''' If the database is refSeq, Ensembl, AceView, or UCSCGenes, make sure to rename all annotation files created, so that they look like this: |
| + | |||
| + | refSeq | ||
| + | refSeq.EXONS | ||
| + | refSeq.GENEPARTS | ||
| + | refSeq.INTRONS | ||
| + | refSeq.new | ||
| + | refSeq.NM2ORF | ||
| + | refSeq.oneperTSS | ||
| + | refSeq.TSS_TES | ||
| - | [http://icb.med.cornell.edu/wiki/index.php/Elementolab/ChIPseeqer_Annotate ChIPseeqerAnnotate] | + | * The ChIPseeqer scripts will automatically use the new species files, given that you specify: |
| - | [http://icb.med.cornell.edu/wiki/index.php/Elementolab/ChIPseeqer_SummaryPromoters ChIPseeqerSummaryPromoters] | + | '''--genome=x''' |
| - | [http://icb.med.cornell.edu/wiki/index.php/Elementolab/ChIPseeqerFindClosestGenes ChIPseeqerFindClosestGenes] | + | '''--db=refSeq''' (or AceView, Ensembl, UCSCGenes) |
| - | [http://icb.med.cornell.edu/wiki/index.php/Elementolab/ChIPseeqerFindDistalPeaks ChIPseeqerFindDistalPeaks] | + | |
| - | [http://icb.med.cornell.edu/wiki/index.php/Elementolab/ChIPseeqer_Cluster ChIPseeqerDensityMatrix] | + | |
| - | [http://icb.med.cornell.edu/wiki/index.php/Elementolab/ChIPseeqerPlotAverageReadDensityInGenes ChIPseeqerPlotAverageReadDensityInGenes] | + | |
| - | [http://icb.med.cornell.edu/wiki/index.php/Elementolab/ChIPseeqerPlotAveragePeaksNumberInGenes ChIPseeqerPlotAveragePeaksNumberInGenes] | + | |
| - | ChIPseeqerCreateRandomRegions | + | |
Revision as of 20:23, 5 May 2011
SUPPORTED SPECIES
The species currenty supported in ChIPseeqer are:
- Human (acembly hg18) - Default in all ChIPseeqer programs
- Mus musculus (acembly mm9)
- Drosophila melanogaster (acembly dm3)
- Saccharomyces cerevisiae
If your favorite organism is not in the list and you would like us to add it, please contact us.
To run ChIPseeqer on a species make sure to use the option:
-chrdata STR to run for organisms other that human (default is hg18), point to files:
DATA/mm9.chrdata for mouse,
DATA/dm3.chrdata for drosophila or
DATA/sacser.chrdata for Saccharomyces cerevisiae
Other programs in the ChIPseeqer framework that you need to specify
* the genome (e.g., --genome=hg18) or * the database (e.g., --db=AceView) or * the gene annotation files (e.g., annotation=DATA/mm9/refGene.txt.mm9.20APR2010)
are:
ChIPseeqerAnnotate ChIPseeqerSummaryPromoters ChIPseeqerFindClosestGenes ChIPseeqerFindDistalPeaks ChIPseeqerDensityMatrix ChIPseeqerPlotAverageReadDensityInGenes ChIPseeqerPlotAveragePeaksNumberInGenes ChIPseeqerCreateRandomRegions
HOW TO ADD A NEW SPECIES
In order to add a new species we need to:
1. Estimate the mappability of the genome and produce a .chrdata file for the new organism.
- The .chrdata file for the new organism, will look like this:
chr10 129993255 0.840 chr11 121843856 0.866 chr12 121257530 0.797 chr13 120284312 0.816 chr14 125194864 0.781 chr15 103494974 0.842 chr16 98319150 0.838 chr17 95272651 0.812 chr18 90772031 0.837 chr19 61342430 0.830
The columns indicate (chromosome_name chromosome_size chromosome_mappability)
- To create this file we follow the steps here.
IMPORTANT: If the name of the species is x, make sure that you name the file x.chrdata
2. Download the gene annotation files
- Add a new folder for the new species x in $CHIPSEEQERDIR/DATA
cd $CHIPSEEQERDIR/DATA mkdir x
- Download the gene annotation file (e.g., from the UCSC Table browser) in the new folder
- Run the script:
make_gene_annotation_files.pl --annotation=$CHIPSEEQERDIR/DATA/x/file
all the files needed from the ChIPseeqer framework will be created.
IMPORTANT: If the database is refSeq, Ensembl, AceView, or UCSCGenes, make sure to rename all annotation files created, so that they look like this:
refSeq refSeq.EXONS refSeq.GENEPARTS refSeq.INTRONS refSeq.new refSeq.NM2ORF refSeq.oneperTSS refSeq.TSS_TES
- The ChIPseeqer scripts will automatically use the new species files, given that you specify:
--genome=x --db=refSeq (or AceView, Ensembl, UCSCGenes)
