GeneTools HiC

From Icbwiki

Jump to: navigation, search

Contents

geneTools

Related pages

Extracting Hi-C interactions

Extract interactions from SAM alignment files

./geneModel pairplot -f1 s1_1_sequence.txt.sam.gz -f2 s1_2_sequence.txt.sam.gz -o output/s1_intra.bed -x output/s1_inter.sif -log output/s1_hic.log

 Usage: ./geneModel <command> [options]
       pairplot                 generates a junction plot (BED format) for paired-end reads
               -f1              SAM alignment file for one of the pair
               -f2              SAM alignment file for the other of the pair
               -outfile         output INTRA-chromosomal interactions (BED file)
               -xfile           output (cross) inter-chromosomal interactions (SIF style; one interaction per line)
               -logfile         output summary statistics in a log file
               --all            use all reads (instead of just the unique reads)
               -maxdist         max distance between two ends of a read pair
                                (maxdist default: no filtering; show all read pairs)

Annotate Hi-C interactions with HindIII sites

1. intra-chromosomal interactions:
sort -k1,1 -k2,2n output/s1_intra.bed > output/s1_intra.bed_sort                        # NOTE: Need the input file to be pre-sorted by chromosome 
./geneModel hic -cmd enz -input output/s1_intra.bed_sort > output/s1_intra.bed_annot

2. inter-chromosomal interactions:
./geneModel hic -cmd enz -i output/s1_inter.sif --inter > output/s1_inter_enz.sif


Downstream Analyses

Generate contact matrices

1. intra-chromosomal
./geneModel hic -cmd mtx -i output/s1_intra.bed_annot -out intmtx/ -res 1000000

2. inter-chromosomal
./geneModel hic -cmd mtx -i output/s1_inter_enz_real.sif -outdir intmtx/ -res 1000000 --inter

Usage: ./geneModel <command> [options]
       HiC                      analysis related to Hi-C data
          -cmd mtx              generate a interaction (contact) matrix per chromosome
               -input           input BED or SIF file
               -outdir          output directory to store the contact matrices
               -res             resolution of the generated interaction matrices
               --inter          specify that the input file is an INTER-chromosomal interaction file

Retrieve intr-chromosomal interactions involving specific region(s)

./geneModel hic -cmd extract -input output/sample001_intra.bed_annot -reg1 chr3:188,879,419-188,983,99 \
                             [ -reg2 chr3:chr3:91,984,420-94,896,854 ] -len 50 > output/extracted_int_region.bed_annot

Compare between Hi-C samples

./geneModel hic -cmd compare-lanes -f1 output/sample001_intra.bed_annot -f2 output/sample002_intra.bed_annot > output/comp_samples_1_2_intra.txt
./geneModel hic -cmd compare-lanes -f1 output/sample001_inter_enz_real.sif -f2 output/sample002_inter_enz_real.sif --inter > output/comp_samples_1_2_inter.txt

Merge Hi-C interactions

./geneModel hic -cmd merge-lanes -f1 output/sample001_intra.bed_annot -f2 output/sample002_intra.bed_annot > output/merged_samples_1_2_intra.bed_annot
./geneModel hic -cmd merge-lanes -f1 output/sample001_inter_enz_real.sif -f2 output/sample002_inter_enz_real.sif --inter > output/merged_samples_1_2_inter_enz_real.sif

Interactions between promoters/gene parts and user-provided intervals/peaks

This program count the number of HiC interactions spanning promoters (or gene parts) and specified peaks. Promoters are defined by the RefSeq definition file, whereas the ChIP-seq peak file can be used to define any types of regions (e.g. cancer related loci, Copy Number Variation, etc.)

./geneModel hic -cmd link
Options:
               -hic                 HiC intra-chromosomal inteaction file
               -chip                ChIP-seq peak file
               -ref                 RefSeq gene data file
               -output              Output HiC interaction frequency file
               -window              Window size (each window is centered on a TSS or the midpoint of a peak)
               -type                Specify the part of gene used as reference
                                       0: promoter (less than window/2 from TSS)
                                       1: gene body (gene body + less than window/2 from TSS or TES)
               -index               Report custom identifiers specified in the index-th column of ChIP-seq peak file
               --verbose            Print promoter-peak information per HiC interaction

Annotation with ChIP-seq data

Annotate if each end of an interaction overlaps with any ChIP-seq peaks

./geneModel hic -cmd annot -hic output/s1_intra.bed_annot -chip data/ChiP_peaks.txt -window 5000 -tag some_txfactor --num -o output/s1_chip.bed_annot

Usage: ./geneModel <command> [options]
       HiC                      analysis related to Hi-C data
          -cmd annot[ate]       annotate the interaction BED file
               -hic             Hi-C interaction file
               -chip            ChIP-seq peak file
               --num            ChIP-seq file contains numeric chromosome ids
               -window          define overlap as a HiC read in a window centered on the peak center
               -tag             tag used in the output annotation column to indicate an overlap with ChIP
               -output          output file

Count the number of interactions between promoters and ChIP-seq peaks

./geneModel hic -cmd link -hic output/s1_intra.bed_annot -chip data/ChiP_peaks.txt -window 5000 -o output/s1_promoters_chip_5000bp.txt

Usage: ./geneModel <command> [options]     
       HiC                      analysis related to Hi-C data
          -cmd link             count the number of HiC interactions between promoters and peaks
               -hic             HiC intra-chromosomal inteaction file
               -chip            ChIP-seq peak file
               -ref             RefSeq gene data file
               -output          output HiC interaction frequency file
               -window          window size (each window is centered on a TSS or the midpoint of a peak)
               -index           report custom identifiers specified in the index-th column of ChIP-seq peak file
               --verbose        print promoter-peak information per HiC interaction

Extract interactions between promoters and ChIP-seq peaks for visualization

./geneModel hic -cmd extract --link -i output/s1_intra.bed_annot -chip output/s1_promoters_chip_5000bp.txt -window 50000 -o output/s1_promoters_chip_5000bp.bed

Usage: ./geneModel <command> [options]     
       HiC                      analysis related to Hi-C data
          -cmd ext[ract]        extract a subset of Hi-C interactions
              --link            if specified, use the following options to extract Hi-C interactions between specified promoter-peak pairs
               -intfile         input HiC BED file
               -chipfile        promoter-peak pairs
               -outfile         output file containing the extracted interactions
               -window          window size (each window is centered on the TSS or the peak summit)

Summarize overlap between ChIP-seq binding sites and Hi-C interaction data

./geneModel hic -cmd sumchip
Options:
               -i                   Input intra- or inter-interaction file
               -track               ChIP-seq binding site track
               --inter              Specify if the input file is an INTER-chromosomal interaction file

Calculate the histogram from a ChIP-seq/Hi-C file and output in Circos format

./geneModel hic -cmd hist                 
Options:
               -i                   Input file (e.g. ChIP-seq binding sites)
               --hic                Input data is a Hi-C interaction file
               -sep                 Field separator in output (default = space)
               -res                 Resolution of histogram
               -scale               Scale down the histogram values by [double] fold
               --circos             Use Circos-style chromosome identifiers (e.g. hs1)

Misc

Generate HindIII restriction enzyme sites on hg18

./geneModel enz -type hind3 | uniq > output/hind3.bg

Generate the density/histogram of ChIP-seq peaks on the genome at user-specified resolution

./geneModel hic -res 1000000 -scale 1000 -cmd hist -i data/TF_targets.txt > output/TF_targets_hist.txt

-res        resolution
-scale      multiply the frequency by this scalar (to normalize between datasets)

The input ChIP-seq file contains three columns specifying the peak regions: E.g.

 chrX 6705768 6705943
Personal tools