Elementolab/Tools tutorial

From Icbwiki

Jump to: navigation, search
  • Convert SRA files to FASTQ format:
  1. Download and compile SRA Toolkit from [here]
  2. Use something like:
${PATH_TO_SRA_BIN_DIR}/fastq-dump -F -A SRR000299 -D SRR000299.sra -O output_directory/


  • LiftOver
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/
chr1	10327	10328	rs112750067
chr1	10440	10441	rs112155239
chr1	10469	10470	rs117577454
chr1	10492	10493	rs55998931
chr1	10519	10520	rs62636508
chr1	10533	10534	rs114315702
chr1	10583	10584	rs58108140
liftOver  dbsnp132_20101103.vcf.bed hg19ToHg18.over.chain dbsnp132_20101103.vcf.bed.hg18 unmapped 


  • LASSO
http://www-stat.stanford.edu/~tibs/lasso.html
  • MyScanACE (an ElementoLab multi-purpose multi-format re-implementation of ScanACE)
MyScanACE -z PfalciparumGenomic_PlasmoDB-6.0_cleaned.fasta  -jb TTCTAGAA_PF13_0267_pwm.txt -g 0.138 -c 2.0 
MyScanACE -z PfalciparumGenomic_PlasmoDB-6.0_cleaned.fasta  -jb TTCTAGAA_PF13_0267_pwm.txt -g 0.138 -c 2.0 -output gmod -mn TCTAGAA_PF13_0267 

Options:

-z   sequences to scan
-g   G+C background
-jb  PBM-type weight matrix
-j   JASPAR type weight matrix
-i   ScanACE type weight matrix
-c   standard deviation cutoff (2.0 means cutoff is avg motif score minus 2.0 std dev)
-output type of output. gmod = GMOD-type track with motif affinities as scores
-mn motif name
  • Cluster 3.0

Get a node on the panda cluster

qrsh -l h_vmem=16g -l h_rt=8:00:00 -now yes

Download either from official web site or from http://physiology.med.cornell.edu/faculty/elemento/lab/files/cluster-1.42.tar.gz

Installation:

./configure --without-x

Hierarchical clustering of the genes using Pearson correlation and complete-linkage

cluster -f expression_normalized_rowavg_ordered.txt  -g 2 -m m

K-means clustering using Pearson correlation

cluster -f expression_normalized_rowavg_ordered.txt -k 131 -r 10 -g 2

To center and scale the genes, add

-cg a -ng
Personal tools