Genome TBAG

From Icbwiki

Jump to: navigation, search

MasonLab

TBAG

TBAG_Tutorial

Usage

Genome TBAG provide annotation for RNA-seq reads that can be directly align to genome. The AceView database has been used for annotation.

The command for GenomeTBAG is:

$./beg sample/sample_gdna.sam AceView/

Output Format

Genome TBAG provide 5 outputs:

Output File name Discription
1 xxx_gdna_summary.out Summary of the sequencing and alignment to genome
2 xxx_gdna_gene.out Annotation by gene name for the RNA-seq reads
3 xxx_gdna_exon.out Annotation by exon for the RNA-seq reads
4 xxx_gdna_bedgraph.out Each chromosome's reads' base per kilobase per million reads
5 xxx_gdna_tars.out Detected transcriptionally active regions in the RNA-seq reads which fall outside of annotation


Each line of "xxx_gene.out" and "xxx_exon.out" contain:

Column Field Discription
1 Feature_ID The gene name ("xxx_gene.out") or gene name and exon number ("xxx_exon.out")
2 chr chromosome number
3 start_pos The 1-base start position of the gene in the genome
4 stop_pos The 1-base stop position of the gene in the genome
5 size _gene.out: the sum length of the exons; _exon.out: the length of the exon
6 adj_size The size subtracted by the size of the overlap length with other genes
7 counts The number of bases that align to the feature
8 neg_counts base counts aligned to anti-sense strand
9 pos_counts base counts aligned to sense strand
10 ref_strand reference strand
11 3'_counts The base counts at 3' of the feature (20% of the feature bases)
12 5'_counts The base counts at 5' of the feature (20% of the feature bases)
13 bpkm1 Base reads Per Kilobase per Million reads, normalizes for (1) feature size and (2) number of bases aligned to genome
14 bpkm2 Base reads Per Kilobase per Million reads, normalizes for (1) feature size and (2) number of bases aligned to exome
15 bpkm3 Base reads Per Kilobase per Million reads, normalizes for (1) feature adj_size and (2) number of bases aligned to genome
16 bpkm4 Base reads Per Kilobase per Million reads, normalizes for (1) feature adj_size and (2) number of bases aligned to exome
17 coverage The percentage of length of the bases aligned to the feature
18 adj_cov The percentage of adjusted length of the bases aligned to the feature


Each line of "xxx_gdna_tars.out" contains:

Output File name Discription
1 chr chromosome of the sequence
2 start The 1-base start position of the sequence in the genome
3 stop The 1-base stop position of the sequence in the genome
4 span The length the transcriptionally active region
5 count The read counts of the transcriptionally active region
6 overfold The overfold of the tar
Personal tools