Elementolab/Tools tutorial
From Icbwiki
- Convert SRA files to FASTQ format:
- Download and compile SRA Toolkit from [here]
- Use something like:
${PATH_TO_SRA_BIN_DIR}/fastq-dump -F -A SRR000299 -D SRR000299.sra -O output_directory/
- LiftOver
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/
chr1 10327 10328 rs112750067 chr1 10440 10441 rs112155239 chr1 10469 10470 rs117577454 chr1 10492 10493 rs55998931 chr1 10519 10520 rs62636508 chr1 10533 10534 rs114315702 chr1 10583 10584 rs58108140
liftOver dbsnp132_20101103.vcf.bed hg19ToHg18.over.chain dbsnp132_20101103.vcf.bed.hg18 unmapped
- LASSO
http://www-stat.stanford.edu/~tibs/lasso.html
- MyScanACE (an ElementoLab multi-purpose multi-format re-implementation of ScanACE)
MyScanACE -z PfalciparumGenomic_PlasmoDB-6.0_cleaned.fasta -jb TTCTAGAA_PF13_0267_pwm.txt -g 0.138 -c 2.0
MyScanACE -z PfalciparumGenomic_PlasmoDB-6.0_cleaned.fasta -jb TTCTAGAA_PF13_0267_pwm.txt -g 0.138 -c 2.0 -output gmod -mn TCTAGAA_PF13_0267
Options:
-z sequences to scan -g G+C background -jb PBM-type weight matrix -j JASPAR type weight matrix -i ScanACE type weight matrix -c standard deviation cutoff (2.0 means cutoff is avg motif score minus 2.0 std dev) -output type of output. gmod = GMOD-type track with motif affinities as scores -mn motif name
- Cluster 3.0
Get a node on the panda cluster
qrsh -l h_vmem=16g -l h_rt=8:00:00 -now yes
Download either from official web site or from http://physiology.med.cornell.edu/faculty/elemento/lab/files/cluster-1.42.tar.gz
Installation:
./configure --without-x
Hierarchical clustering of the genes using Pearson correlation and complete-linkage
cluster -f expression_normalized_rowavg_ordered.txt -g 2 -m m
K-means clustering using Pearson correlation
cluster -f expression_normalized_rowavg_ordered.txt -k 131 -r 10 -g 2
To center and scale the genes, add
-cg a -ng