  • What do the .summary.binary and .binary files contain ? what's the difference between the two files?

These two files contain the same exact information. They are binarized expression profiles. Binarized in the following sense: for a given motif, all clusters where the motif is over-represented are combined into cluster 1. All other clusters are combined into cluster 0.

They are intermediate files and not meant to be used for any further analysis.

  • What's in the .profiles file?

The .profiles contains all occurrences of all motifs, together with position, orientation, context. etc. No filter was applied, all occurrences are there.

  • What's in the .motifreport file?

The .motifreport sorts the genes by how likely they are to be targets of whichever factor binds to the motif. The basic idea in the .motifreport file is that if a motif occurrence falls in a gene that is itself in a cluster where the motif is over-represented, it's good (pa). If the motif occurrence is at the preferred distance from the TSS, that's good (pd). If it has the prefered orientation, that's good too. Thus, motifs are sorted by (pa+pd+po).

  • Where can I find weight matrices for FIRE motifs?

FIRE does not produce weight matrices, but consensus sequences, eg [AGT]CCC[AT]A[ACT]. They are basically simplified Perl regular expression. In the paper, we show weight matrices, but they are trivially derived from the regular expressions ([AT] => 0.5A, 0.5T, etc).

  • How can I scan other sequences using the FIRE motifs?

We provide a tool in the FIRE distribution to scan your own sequences. It's $FIREDIR/PROGRAMS/genregexp

Basic usage is:

genregexp -re "[AGT]CCC[AT]A[ACT]" -fastafile yourseq.seq

