Elementolab/ChIPseeqer Formats

From Icbwiki

Jump to: navigation, search

Elementolab/

SUPPORTED FORMATS

  • SAM (-format sam) (default supported SAM format is the one produced by BWA with XT:A:U tag; see below for Bowtie SAM format)
SRR030735.399990        16      Pf3D7_09        1163673 37      36M     =       1163673 0       GTATATATATATAATATATTTATTTTATATATGATA    @:IFI>IIIIICIIIIIIIIIIIIIIIIIIIIIIHI    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:36
SRR030735.399961        16      Pf3D7_09        12052   0       36M     =       12052   0       AACATAGGTCTTAACTTGACTAACATAGCTCTTAGT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    XT:A:R  NM:i:0  X0:i:2  X1:i:1  XM:i:0  XO:i:0  XG:i:0  MD:Z:36
SRR030735.399937        16      Pf3D7_09        994204  37      36M     =       994204  0       GTTATAAGTTAGCTTTCTTATATGTTGATGATAATA    "%$>)/3"'%5,,&/+-.88(?*3/HII8CIIII4I    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:7T28
SRR030735.399934        16      Pf3D7_09        376817  37      36M     =       376817  0       TCTTTTATGTATATATTATACCAAAAGTAGTTATAT    CIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:36
SRR030735.399927        0       Pf3D7_09        198557  37      36M     =       198557  0       AGGATTAAAACTACGAATAGCAAAAAGTAAATATGG    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:36
SRR030735.399909        16      Pf3D7_09        1196135 37      36M     =       1196135 0       TTCTGAAAAGAAATTTAAAATTTGTGTGTTAAAGTT    5DI7IIIEIIIIIIIIIIIIIIIIIIIIIIIIIIII    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:36
SRR030735.399865        0       Pf3D7_09        1424111 37      36M     =       1424111 0       AGATATCAAAAAAACAACATTAGAATTTATATATAA    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIA    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:36
SRR030735.399823        16      Pf3D7_09        370286  37      36M     =       370286  0       AAATATATTTCATATATTTCATATATATATATATAT    EIIFIIII9IIIBIII9IBIIIIIIIIIIIIIIIII    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:36
SRR030735.399798        16      Pf3D7_09        409008  37      36M     =       409008  0       ACATTACTTCTTTTTATAATTTGCTTAGTTATTACA    $)5'//,$(3$+0$.I&I5%,+DI+4I<)0IIIIII    XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:0T35
SRR030735.399774        16      Pf3D7_09        1105584 0       36M     =       1105584 0       TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    XT:A:R  NM:i:0  X0:i:3582       XM:i:0  XO:i:0  XG:i:0  MD:Z:36
SRR030735.399720        0       Pf3D7_09        1210095 0       36M     =       1210095 0       AAGTAAAAAAAAAAAAAAAAAAAAGAAAAAAAAAAA    IIIIIIIIIIIIIIIIIIIIIIII'IIIIIIIIIII    XT:A:R  NM:i:1  X0:i:25 X1:i:489        XM:i:1  XO:i:0  XG:i:0  MD:Z:24A11
SRR030735.399707        16      Pf3D7_09        509386  37      36M     =       509386  0       TCGATTAGCTATATTTAAAAATATTATGTAATATTG    =I:IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:36

NOTE: default SAM format we support is SAM format produced by BWA; to use the SAM format produced by Bowtie, use -format sam instead in ChIPseeqerSplitReads and -format bowtiesam in ChIPseeqer.bin

  • BAM

For bam files produced by bwa use:

1. --format=bam in ChIPseeqerSplitReads
2. -format bam in ChIPseeqer.bin

For bam files produced by CASAVA1.8 use:

1. --format=bam in ChIPseeqerSplitReads
2. This will produce .fa files (e.g., reads.chr1.fa). To rename these files (from reads.chr1.fa to reads.chr1) do the following:
 Download the script: http://physiology.med.cornell.edu/faculty/elemento/lab/CS_files/renameChroms.sh
 Move it inside the directory with the splitted reads.
 Run: ./renameChroms
3. -format bam in ChIPseeqer.bin

  • ELAND FORMAT
73     AGAGACGATCACAAACATTAAAATCCAAAACAAAAT    U0    1    0    0    chr10    34054332    R
96     GCTAAAAAAATTAGTACGAAATAGTAGGAAGGTATT    U0    1    0    0    chr10    60810246    R
132    GCCCTAGTAGGGGAAAGAGAAACCAGTCGAGTTTTT    U0    1    0    0    chr10    16182814    F
135    GGCCAAACTCACGTTTTCTTTTTTTCTTTTCCAACA    U0    1    0    0    chr10    116275567   F
144    AAAACATTTTAATTATATTTTGCCTTTACCTTTTTC    U0    1    0    0    chr10    4221285     R
150    GGATGACAGCAATTAAGCCATCCTCATAGCAATCCT    U0    1    0    0    chr10    118856670   F
180    GAAATTTAAGTCTTCATTGTGTTCAAAAAAACTTCT    U0    1    0    0    chr10    22945579    F
181    GAGCTCAACTTTTACTTCAGCTGTTATTAACCTTCA    U0    1    0    0    chr10    70329397    F
187    AATAATAATAATTTTTGGTACTTTTATAATATGTAT    U0    1    0    0    chr10    118192374   F
188    ATCATTCCCAAACCTTAAATTATCTATATAATTTTC    U0    1    0    0    chr10    49200435    R
  • EXTENDED ELAND FORMAT
>HWUSI-EAS107:8:1:230:334#0/1	ACCAAAAACAAACAACTACATCAAATGTCTAAAGAA	1:0:0	chr12.fa:56030696F36
>HWUSI-EAS107:8:1:230:257#0/1	AATTTGTCAAAATAGTGTTACTGAAAAAATATATAC	NM	-
>HWUSI-EAS107:8:1:230:1610#0/1	ACATTGGGGGTCATCTTGGGGTCTCATTTAACAGAT	NM	-
>HWUSI-EAS107:8:1:230:751#0/1	ATGCTACATTTATTGCCCATTCTAATCCAGTCTGAT	1:0:0	chr3.fa:175417739F36
>HWUSI-EAS107:8:1:230:106#0/1	CGTCCTAGGCATTCTGGAGACTGCGGCATGAGAATC	NM	-
>HWUSI-EAS107:8:1:230:717#0/1	GTTTATTGAGCTGCACAATTTCTGTTGTGTGTACCT	1:0:0	chr1.fa:228116138R36
>HWUSI-EAS107:8:1:230:294#0/1	GTTCTCTGCCTGGCACACAAAAGACCCTCCTGACAC	20:1:0	-
>HWUSI-EAS107:8:1:230:2007#0/1	GGAAGCTGCGGAGGTGGGGAAGTGCTGTGGTTCCTC	0:1:0	chr19.fa:18063231R8A27
>HWUSI-EAS107:8:1:230:419#0/1	CAGCAAAAGGATCCCACAGCCAAGAAGTGTGTTCCT	1:0:0	chr7.fa:131803144R36
>HWUSI-EAS107:8:1:230:1476#0/1	ACACAGCGAGCAAAGCCTTCCATACTCGTTAACATG	0:0:1	chr18.fa:43512028R3G18G11G1
>HWUSI-EAS107:8:1:230:2026#0/1	CAGATAGCAATTATGCTTATCTAATCCTACATGCCC	1:73:109	chr10.fa:121638200F36
>HWUSI-EAS107:8:1:230:57#0/1	CATGTCCCCTTGGCAAGTGTTCCTCTAGAGCTGATG	1:0:0	chr6.fa:12286968R36
>HWUSI-EAS107:8:1:230:1892#0/1	GGCCCCCTCGTCCGGGTTAATGGGNCGGNTACCCCC	NM	-
  • BED FORMAT
chr10    95452398    95452422    U0    0    +
chr10    88011870    88011894    U0    0    -
chr10    106079691   106079715   U0    0    -
chr10    43505655    43505679    U0    0    -
chr10    95563447    95563471    U0    0    +
chr10    16677710    16677734    U0    0    -
chr10    95102448    95102472    U0    0    +
chr10    124882669   124882693   U0    0    +
chr10    46403867    46403891    U0    0    -
  • MIT FORMAT

(similar to "bed"; we call it "mit" because we downloaded datasets in this format from the Broad Institute's website).

chr1    57042343        57042379        +       203P4.5.23      0
chr1    27285992        27286028        -       203P4.5.31      0
chr1    162710401       162710437       -       203P4.5.33      0
chr1    54300993        54301029        -       203P4.5.39      0
chr1    222672009       222672045       -       203P4.5.82      0
chr1    115541485       115541521       -       203P4.5.89      0
chr1    194850681       194850717       -       203P4.5.95      0
  • SIDOW et al. FORMAT

(similar to "eland"; by way of the script that splits read files. It output reads in conventional "eland" format.

CACCGGAAACGGAAATTCACCGTCC 0 0 chr10:75160261 F
GCTCCACGCGAGGGTCCTGCCGGAA 0 0 chr10:16519152 F
CATTATTGTGACCATAAATTGTATT 0 0 chr10:81942940 R
GGTTACCCCGGGACAAATCCGGGGG 0 0 chr10:99195862 F
TCCAAACACGCGCCCCCTATTTCTC 0 0 chr10:18980527 F
  • EXPORT

(It output reads in conventional "eland" format)


>HWI-EAS159	20A9RAAXX	7	1	15	247     AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	WW\WWWZLWWZZZZ^UZZ^Wa^W\WWO\\W\ZaZZZ	255:255:255	Y
>HWI-EAS159	20A9RAAXX	7	1	16	103	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAA	ZWZZa\WWW\WWZWZWWWZWZWWZLZWZWWaQHQWZ	255:255:255	Y
>HWI-EAS159	20A9RAAXX	7	1	16	755	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAAA	ZWWLWZWW^aLWZW\WWZLZWLWL\ZWWOOLW^W\L	255:255:255	Y
>HWI-EAS159	20A9RAAXX	7	1	17	953	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGA    WWWWWZWZ^^^ZWaWZ\WZZWWWZZ\LZZWZWWZZW	255:255:255	Y


  • We have plans to support reads in BioHDF format (http://www.hdfgroup.org/), as well as interfaces to Oracle databases.
Personal tools