SigPath

org.sigpath.datamodel
Interface BioPolymer

All Superinterfaces:
Chemical, Collectable, MacroMolecule, SigPathEntity
All Known Subinterfaces:
DNA, Protein, RNA
All Known Implementing Classes:
BioPolymerImpl, ProteinImpl

public interface BioPolymer
extends MacroMolecule

Represents a biopolymer (Protein, DNA or RNA). The following filtering should be implemented: [FIXME] do this. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is: >gi|532319|pir|TVFV2E|TVFV2E envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVK Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue). The nucleic acid codes supported are: A --> adenosine M --> A C (amino) C --> cytidine S --> G C (strong) G --> guanine W --> A T (weak) T --> thymidine B --> G T C U --> uridine D --> G A T R --> G A (purine) H --> A C T Y --> T C (pyrimidine) V --> G C A K --> G T (keto) N --> A G C T (any) - gap of indeterminate length For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are: A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine - gap of indeterminate length

Version:
$Revision: 5008 $
Author:
Fabien Campagne

Method Summary
 long getLength()
          Returns the length of the sequence of this biopolymer.
 String getResidueCodes()
          Gets the residue codes for this biopolymer.
 String getResidueCodes(long start, long end)
          Gets a segment of the residue codes for this biopolymer.
 void setResidueCodes(String residue_codes)
          Sets the residue codes for this biopolymer.
 
Methods inherited from interface org.sigpath.datamodel.Chemical
contains, getBecomes, getComesFrom, getOrganism, involvedIn, isBackgroundInformation, setInvolvedIn, setInvolvedIn, setInvolvedIn, setOrganism
 
Methods inherited from interface org.sigpath.datamodel.SigPathEntity
addAlias, addAliases, addReview, addUserComment, changeAlias, getAliases, getAliasesCollection, getAliasesIterator, getChangeLog, getCombinedStringLength, getComments, getDescription, getExternalReferences, getForwardReferences, getLiteratureReferences, getName, getReviews, getSpid, getUserComments, isExportable, removeAlias, removeAliases, removeReviews, setAliases, setAliasesCollection, setChangeLog, setComments, setDescription, setExportable, setName, setSpid
 
Methods inherited from interface org.sigpath.datamodel.lset.Collectable
getCollections
 

Method Detail

getResidueCodes

String getResidueCodes()
Gets the residue codes for this biopolymer. Residue codes are encoded with IUB/IUPAC amino acid and nucleic acid codes.


getResidueCodes

String getResidueCodes(long start,
                       long end)
Gets a segment of the residue codes for this biopolymer. Residue codes are encoded with IUB/IUPAC amino acid and nucleic acid codes.

Parameters:
start - Start offset within the total sequence for the segment to be returned. (Inclusive.)
end - End offset within the total sequence for the segment to be returned. (Inclusive.)

getLength

long getLength()
Returns the length of the sequence of this biopolymer. The length is cached in this implementation so that this operation is quick.


setResidueCodes

void setResidueCodes(String residue_codes)
Sets the residue codes for this biopolymer. Non IUB/IUPAC amino acid and nucleic acid codes are filtered from residue_codes and are not stored.


SigPath

Copyright © 2002-2005 Institute for Computational Biomedicine, All Rights Reserved.