TiSimilarity/AtaxiaTelangiectasia

From Icbwiki

Jump to: navigation, search

We use TiSimilarity to search the ATN locus, where the gene that causes ataxia telangiectasia (AT) is located. The locus was mapped to a region of 850kb by Imai T in 1995 [1] and the ATM gene was identified at this locus [2].

A recent publication studied gene expression changes in lymphoblasts from patients with AT upon ionizing radiation [3]. The authors identified a list of genes differentially regulated between control and AT patients. The gene list was captured from Table 1 to produce a list of 99 human HGNC identifiers. The list was mapped to Ensembl release 44 to yield

gene-list-through-biomart.txt

Ensembl Gene ID Ensembl Transcript ID   EMBL ID EntrezGene ID   RefSeq DNA ID
ENSG00000184674 ENST00000248935 Z84718  2952    NM_000853
ENSG00000184674 ENST00000248935 X79389  2952    NM_000853
ENSG00000184674 ENST00000248935 AF435971        2952    NM_000853
...

Extract transcript IDs:

cut -f 2 gene-list-through-biomart.txt | sort -u > gene-list-249-transcript-ids.txt

Imai T et al reported that the AT locus was between markers D11S1819 and D11S1818. These markers map to chromosome 11 at positions: D11S1819 107081716-107081849 (bp) D11S1818 108286633-108286798 (bp) Interval 107081716-108286798

We use these positions to retrieve candidate gene and transcripts from biomart. We retrieve 13 candidate genes (21 transcripts). File locus-chr11-candidates.txt

Ensembl Gene ID Ensembl Transcript ID
ENSG00000170290 ENST00000305991
ENSG00000110660 ENST00000265836
ENSG00000110660 ENST00000375682
ENSG00000179331 ENST00000320578
ENSG00000166266 ENST00000299351
ENSG00000075239 ENST00000265838
ENSG00000075239 ENST00000299355
ENSG00000075239 ENST00000375674
ENSG00000149308 ENST00000278612
ENSG00000149311 ENST00000278616
ENSG00000149311 ENST00000299392
ENSG00000149311 ENST00000389512
ENSG00000149311 ENST00000389513
ENSG00000149311 ENST00000389511
ENSG00000166323 ENST00000344283
ENSG00000178202 ENST00000323468
ENSG00000178202 ENST00000375648
ENSG00000110723 ENST00000265843
ENSG00000178105 ENST00000322536
ENSG00000206967 ENST00000384240
ENSG00000200855 ENST00000363985

TiSimilarity locus screen

To screen candidates in the locus by tissue expression similarity to the gene list identified by Innes CL and colleagues, we construct a restriction file which combines transcript ids from the locus and the expression gene list:

cut -f 2 locus-chr11-candidates.txt >locus-chr11-transcript-ids.txt
cat gene-list-249-transcript-ids.txt locus-chr11-transcript-ids.txt | sort -u | grep -v Ensembl > restrict-AT.txt

TiSimilarity is then run with options:

TiSimilarity -b human-counts-data --input H:\projects\ataxia\ataxiatelangectasia\locus-chr11-transcript-ids.txt 
             --scorer edu.mssm.crover.tools.tissue.similarity.scorers.ExpressionConfidenceScorer 
             -o H:\projects\ataxia\ataxiatelangectasia\locus-chr11-screen.txt  
             --restrict H:\projects\ataxia\ataxiatelangectasia\restrict-AT.txt -k 100000

Scoring and annotating candidates

The previous step produces a file where the tisim score of each candidate is given with respect to a transcript in the gene list.

InsightfulMiner can be used to sum individual scores for each candidate into an aggregate score. Such a pipeline is shown below. Image:AtaxiaTLocusScreenScoreAccumulationPipelin.jpg

The result of the screen is reproduced below:

queryID	score.sum	geneID	geneName	description
ENST00000299351	4304	ENSG00000166266	CUL5	Cullin-5 (CUL-5) (Vasopressin-activated calcium-mobilizing receptor) (VACM-1). [Source:Uniprot/SWISSPROT;Acc:Q93034]
ENST00000278616	4086	ENSG00000149311	ATM	Serine-protein kinase ATM (EC 2.7.11.1) (Ataxia telangiectasia mutated)  (A-T, mutated). [Source:Uniprot/SWISSPROT;Acc:Q13315]
ENST00000299392	4086	ENSG00000149311	ATM	Serine-protein kinase ATM (EC 2.7.11.1) (Ataxia telangiectasia mutated) (A-T, mutated). [Source:Uniprot/SWISSPROT;Acc:Q13315]
ENST00000322536	1931	ENSG00000178105	DDX10	Probable ATP-dependent RNA helicase DDX10 (EC 3.6.1.-) (DEAD box protein 10). [Source:Uniprot/SWISSPROT;Acc:Q13206]
ENST00000265836	-533	ENSG00000110660	SLC35F2	solute carrier family 35, member F2 [Source:RefSeq_peptide;Acc:NP_059985]
ENST00000323468	-591	ENSG00000178202	KDELC2	KDEL motif-containing protein 2 precursor.  [Source:Uniprot/SWISSPROT;Acc:Q7Z4H8]
ENST00000265838	-2541	ENSG00000075239	ACAT1	Acetyl-CoA acetyltransferase, mitochondrial precursor (EC 2.3.1.9) (Acetoacetyl-CoA thiolase) (T2). [Source:Uniprot/SWISSPROT;Acc:P24752]
ENST00000389512	-2791	ENSG00000149311	ATM	Serine-protein kinase ATM (EC 2.7.11.1) (Ataxia telangiectasia mutated) (A-T, mutated). [Source:Uniprot/SWISSPROT;Acc:Q13315]
ENST00000389513	-2791	ENSG00000149311	ATM	Serine-protein kinase ATM (EC 2.7.11.1) (Ataxia telangiectasia mutated) (A-T, mutated). [Source:Uniprot/SWISSPROT;Acc:Q13315]
ENST00000278612	-5091	ENSG00000149308	NPAT	nuclear protein, ataxia-telangiectasia locus [Source:RefSeq_peptide;Acc:NP_002510]
ENST00000375674	-5795	ENSG00000075239	ACAT1	Acetyl-CoA acetyltransferase, mitochondrial precursor (EC 2.3.1.9) (Acetoacetyl-CoA thiolase) (T2). [Source:Uniprot/SWISSPROT;Acc:P24752]
ENST00000389511	-6491	ENSG00000149311	ATM	Serine-protein kinase ATM (EC 2.7.11.1) (Ataxia telangiectasia mutated) (A-T, mutated). [Source:Uniprot/SWISSPROT;Acc:Q13315]
ENST00000299355	-8670	ENSG00000075239	ACAT1	Acetyl-CoA acetyltransferase, mitochondrial precursor (EC 2.3.1.9) (Acetoacetyl-CoA thiolase) (T2). [Source:Uniprot/SWISSPROT;Acc:P24752]
ENST00000375682	-9355	ENSG00000110660	SLC35F2	solute carrier family 35, member F2 [Source:RefSeq_peptide;Acc:NP_059985]
ENST00000305991	-11610	ENSG00000170290	SLN	Sarcolipin. [Source:Uniprot/SWISSPROT;Acc:O00631]
ENST00000265843	-12080	ENSG00000110723	EXPH5	Slp homolog lacking C2 domains b (Exophilin-5). [Source:Uniprot/SWISSPROT;Acc:Q8NEV8]
ENST00000375648	-14889	ENSG00000178202	KDELC2	KDEL motif-containing protein 2 precursor. [Source:Uniprot/SWISSPROT;Acc:Q7Z4H8]
ENST00000344283	-18576	ENSG00000166323	C11orf65	Uncharacterized protein C11orf65. [Source:Uniprot/SWISSPROT;Acc:Q8NCR3]
ENST00000320578	-19296	ENSG00000179331	RAB39	Ras-related protein Rab-39A (Rab-39). [Source:Uniprot/SWISSPROT;Acc:Q14964]

In this screen, two transcripts for ATM appear in the top three candidates. The top candidate is Cullin-5

In the ataxia protein-protein interaction screen done by Vidal and colleagues [4], ATM as a bait identified no partner. However, the same screen linked CUL1 and CUL7 to known genetic modifiers of ataxia. See Sup Table S4. The pertinent lines of this table are reproduced below.

GeneSymbol_A	EntrezGeneID_A	GeneSymbol_B	EntrezGeneID_B	Interaction source	Baits vs. Preys	OMIM disease_A	OMIM disease_B
CUL1	8454	UBQLN1	29979	interlog	baits	None	None
UBE3A	7337	CUL7	9820	literature	baits	Angelman syndrome, 105830 (3)	None

Mutations in ATM have been confirmed in many studies as the cause of Ataxia Telangiectasia, including in animal models [5], so the significance of this observation is unclear.

Personal tools