Genome TBAG
From Icbwiki
[edit]
Usage
Genome TBAG provide annotation for RNA-seq reads that can be directly align to genome. The AceView database has been used for annotation.
The command for GenomeTBAG is:
$./beg sample/sample_gdna.sam AceView/
[edit]
Output Format
Genome TBAG provide 5 outputs:
Output | File name | Discription |
---|---|---|
1 | xxx_gdna_summary.out | Summary of the sequencing and alignment to genome |
2 | xxx_gdna_gene.out | Annotation by gene name for the RNA-seq reads |
3 | xxx_gdna_exon.out | Annotation by exon for the RNA-seq reads |
4 | xxx_gdna_bedgraph.out | Each chromosome's reads' base per kilobase per million reads |
5 | xxx_gdna_tars.out | Detected transcriptionally active regions in the RNA-seq reads which fall outside of annotation |
Each line of "xxx_gene.out" and "xxx_exon.out" contain:
Column | Field | Discription |
---|---|---|
1 | Feature_ID | The gene name ("xxx_gene.out") or gene name and exon number ("xxx_exon.out") |
2 | chr | chromosome number |
3 | start_pos | The 1-base start position of the gene in the genome |
4 | stop_pos | The 1-base stop position of the gene in the genome |
5 | size | _gene.out: the sum length of the exons; _exon.out: the length of the exon |
6 | adj_size | The size subtracted by the size of the overlap length with other genes |
7 | counts | The number of bases that align to the feature |
8 | neg_counts | base counts aligned to anti-sense strand |
9 | pos_counts | base counts aligned to sense strand |
10 | ref_strand | reference strand |
11 | 3'_counts | The base counts at 3' of the feature (20% of the feature bases) |
12 | 5'_counts | The base counts at 5' of the feature (20% of the feature bases) |
13 | bpkm1 | Base reads Per Kilobase per Million reads, normalizes for (1) feature size and (2) number of bases aligned to genome |
14 | bpkm2 | Base reads Per Kilobase per Million reads, normalizes for (1) feature size and (2) number of bases aligned to exome |
15 | bpkm3 | Base reads Per Kilobase per Million reads, normalizes for (1) feature adj_size and (2) number of bases aligned to genome |
16 | bpkm4 | Base reads Per Kilobase per Million reads, normalizes for (1) feature adj_size and (2) number of bases aligned to exome |
17 | coverage | The percentage of length of the bases aligned to the feature |
18 | adj_cov | The percentage of adjusted length of the bases aligned to the feature |
Each line of "xxx_gdna_tars.out" contains:
Output | File name | Discription |
---|---|---|
1 | chr | chromosome of the sequence |
2 | start | The 1-base start position of the sequence in the genome |
3 | stop | The 1-base stop position of the sequence in the genome |
4 | span | The length the transcriptionally active region |
5 | count | The read counts of the transcriptionally active region |
6 | overfold | The overfold of the tar |