Elementolab/Tools tutorial

  • Convert SRA files to FASTQ format:
  1. Download and compile SRA Toolkit from [here]
  2. Use something like:
${PATH_TO_SRA_BIN_DIR}/fastq-dump -F -A SRR000299 -D SRR000299.sra -O output_directory/

  • LiftOver
chr1	10327	10328	rs112750067
chr1	10440	10441	rs112155239
chr1	10469	10470	rs117577454
chr1	10492	10493	rs55998931
chr1	10519	10520	rs62636508
chr1	10533	10534	rs114315702
chr1	10583	10584	rs58108140
liftOver  dbsnp132_20101103.vcf.bed hg19ToHg18.over.chain dbsnp132_20101103.vcf.bed.hg18 unmapped 

  • MyScanACE (an ElementoLab multi-purpose multi-format re-implementation of ScanACE)
MyScanACE -z PfalciparumGenomic_PlasmoDB-6.0_cleaned.fasta  -jb TTCTAGAA_PF13_0267_pwm.txt -g 0.138 -c 2.0 
MyScanACE -z PfalciparumGenomic_PlasmoDB-6.0_cleaned.fasta  -jb TTCTAGAA_PF13_0267_pwm.txt -g 0.138 -c 2.0 -output gmod -mn TCTAGAA_PF13_0267 


-z   sequences to scan
-g   G+C background
-jb  PBM-type weight matrix
-j   JASPAR type weight matrix
-i   ScanACE type weight matrix
-c   standard deviation cutoff (2.0 means cutoff is avg motif score minus 2.0 std dev)
-output type of output. gmod = GMOD-type track with motif affinities as scores
-mn motif name
  • Cluster 3.0

Get a node on the panda cluster

qrsh -l h_vmem=16g -l h_rt=8:00:00 -now yes

Download either from official web site or from http://physiology.med.cornell.edu/faculty/elemento/lab/files/cluster-1.42.tar.gz


./configure --without-x

Hierarchical clustering of the genes using Pearson correlation and complete-linkage

cluster -f expression_normalized_rowavg_ordered.txt  -g 2 -m m

K-means clustering using Pearson correlation

cluster -f expression_normalized_rowavg_ordered.txt -k 131 -r 10 -g 2

To center and scale the genes, add

-cg a -ng
