Fusion TBAG
From Icbwiki
Usage
For those RNA-seq reads that can not align directly to the human genome, junction genome or does not have putative polyA-tails, they will be divided into 2 shorter reads and aligned again to human genome. Fusion TBAG will find the reads that are from putative junction genes (mRNA contain sequence from the same genes but different exon) and the reads that are from fusion genes (mRNA contain sequence from two different genes). It will also provide the fusion gene and junction gene information and their read counts. Also, the output include a circos friendly fusion input file fusion_link.txt that can be used for fusion genes visualization.
The command for Fusion TBAG includes the fusion1.sam, fusion2.sam, Aceview for annotation and the output directory (sample/)
$./fus sample/sample_fusion1.sam sample/sample_fusion2.sam AceView/ sample/sample_fusfus_output sample/sample_fusjun_output
Output Format
The output Fusion TBAG would contain "sample_fusfus_output", "sample_fusjun_output" and "fusion_link.txt".
fusion_count.txt format: fusion fusion_count chr_1 strand_1 start_1 stop_1 count_1 range_1 chr_2 strand_2 start_2 stop_2 count_2 range_2 GLS_1~~AQR_1 1 chr2 + 191453793 191536218 1 191453892,191453892 (0) chr15 - 32942360 33049331 1 33049204,33049204 (0)
junction_count.txt format: junction junction_count chr_1 strand_1 start_1 stop_1 count_1 range_1 chr_2 strand_2 start_2 stop_2 count_2 range_2 UBE2O_3~~UBE2O_2 1 chr17 - 71906148 71912992 1 71909771,71909771 (0) chr17 - 71906513 71913249 1 71910309,71910309 (0)
Each line of the output contains:
Column | Field | Discription |
---|---|---|
1 | fusion/ junction | The composition of the fusion/ junction RNA-seq reads |
2 | fusion_count/ junction_count | The read counts mapped to the feature |
3 | chr_1 | The reference chromosome of the first part of the read |
4 | strand_1 | The reference strand of the first part of the read |
5 | start_1 | The 1-base start position of exon that the first part of the read aligned to |
6 | stop_1 | The 1-base stop position of exon that the first part of the read aligned to |
7 | count_1 | The read counts mapped to the feature after removing duplicates |
8 | range_1 | The locus and length of the area that the first part of the read hit to the same gene and same exon. |
9 | chr_2 | The chromosome of the second part of the read |
10 | strand_2 | The strand of the second part of the read |
11 | start_2 | The 1-base start position of the second part of the read when align to the genome |
12 | stop_2 | The 1-base stop position of the second part of the sequence when align to the genome |
13 | count_2 | The read counts of the second part of the sequence |
14 | range_2 | The locus and length of the area that the second part of the sequence hit to the same gene and same exon. |
The fusion_link.txt format:
seqdup1 hs8 676581 677199 thickness=2 seqdup1 hs3 12601507 12692260 thickness=2 seqdup2 hs19 13059803 13068778 thickness=2 seqdup2 hs17 21128562 21156186 thickness=2 seqdup3 hs4 88031633 88076377 thickness=4 seqdup3 hs7 5533304 5534876 thickness=4
Each line contains:
Column | Discription |
---|---|
1 | Visualization pair number |
2 | Homo Sapiens chromosome number |
3 | The exon start loci |
4 | The exon stop loci |
5 | Thickness of lines indicates the read count of the fusion genes. Thickness=2: read count=1; Thickness=4: read count=2~20; Thickness=6: read count=21~; |