Fusion TBAG

From Icbwiki

Jump to: navigation, search

MasonLab

BAGET

BAGET_Tutorial

Usage

For those RNA-seq reads that can not align directly to the human genome, junction genome or does not have putative polyA-tails, they will be divided into 2 shorter reads and aligned again to human genome. Fusion TBAG will find the reads that are from putative junction genes (mRNA contain sequence from the same genes but different exon) and the reads that are from fusion genes (mRNA contain sequence from two different genes). It will also provide the fusion gene and junction gene information and their read counts. Also, the output include a circos friendly fusion input file fusion_link.txt that can be used for fusion genes visualization.

The command for Fusion TBAG includes the fusion1.sam, fusion2.sam, Aceview for annotation and the output directory (sample/)

$./fus sample/sample_fusion1.sam sample/sample_fusion2.sam AceView/ sample/sample_fusfus_output sample/sample_fusjun_output

Output Format

The output Fusion TBAG would contain "sample_fusfus_output", "sample_fusjun_output" and "fusion_link.txt".

fusion_count.txt format:
fusion          fusion_count chr_1   strand_1  start_1     stop_1      count_1     range_1                  chr_2  strand_2  start_2   stop_2    count_2 range_2
GLS_1~~AQR_1    1            chr2    +         191453793   191536218   1           191453892,191453892 (0)  chr15  -         32942360  33049331  1       33049204,33049204 (0)
junction_count.txt format:
junction         junction_count  chr_1   strand_1 start_1   stop_1      count_1 range_1                chr_2    strand_2  start_2   stop_2    count_2   range_2
UBE2O_3~~UBE2O_2 1               chr17   -        71906148  71912992    1       71909771,71909771 (0)  chr17    -         71906513  71913249  1         71910309,71910309 (0)

Each line of the output contains:

Column Field Discription
1 fusion/ junction The composition of the fusion/ junction RNA-seq reads
2 fusion_count/ junction_count The read counts mapped to the feature
3 chr_1 The reference chromosome of the first part of the read
4 strand_1 The reference strand of the first part of the read
5 start_1 The 1-base start position of exon that the first part of the read aligned to
6 stop_1 The 1-base stop position of exon that the first part of the read aligned to
7 count_1 The read counts mapped to the feature after removing duplicates
8 range_1 The locus and length of the area that the first part of the read hit to the same gene and same exon.
9 chr_2 The chromosome of the second part of the read
10 strand_2 The strand of the second part of the read
11 start_2 The 1-base start position of the second part of the read when align to the genome
12 stop_2 The 1-base stop position of the second part of the sequence when align to the genome
13 count_2 The read counts of the second part of the sequence
14 range_2 The locus and length of the area that the second part of the sequence hit to the same gene and same exon.


The fusion_link.txt format:

seqdup1 hs8     676581          677199          thickness=2
seqdup1 hs3     12601507        12692260        thickness=2
seqdup2 hs19    13059803        13068778        thickness=2
seqdup2 hs17    21128562        21156186        thickness=2
seqdup3 hs4     88031633        88076377        thickness=4
seqdup3 hs7     5533304         5534876         thickness=4

Each line contains:

Column Discription
1 Visualization pair number
2 Homo Sapiens chromosome number
3 The exon start loci
4 The exon stop loci
5 Thickness of lines indicates the read count of the fusion genes. Thickness=2: read count=1; Thickness=4: read count=2~20; Thickness=6: read count=21~;
Personal tools