Goby

From Icbwiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 18:07, 3 February 2010
Fabien Campagne (Talk | contribs)
(User mailing list)
← Previous diff
Current revision
Fabien Campagne (Talk | contribs)

Line 1: Line 1:
-Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. This wiki provides some documentation about the tool. Documentation is still sparse, but we are actively writing more. Feel free to ask if you have specific questions not answered here.+Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. Please see the official project page at [http://goby.campagnelab.org http://goby.campagnelab.org].
- +
-==File formats==+
- +
-Goby provides compressed file formats that are time and space efficient. Goby facilitates the implementation of efficient next-gen data analysis pipelines Goby defines and uses several file formats. These formats include:+
- +
-* '''compact reads''', an alternative to FASTA/FASTQ, which is fast to parse, unambiguous, compact, and chunckable. Chunkability means that a very large file can be processed in independent chunks without having to traverse the entire file, just the chunk of interest can be read. This property is leveraged by GobyWeb to support parallel alignments.+
-* '''compact alignments''', an alternative to Elan text format, MAQ, or SAM. Goby alignments are chunkable, compact, unambiuous, fast to parse.+
-* '''counts''', a representation of the histogram of read count along a reference sequence, at single base pair resolution. This representation is highly space efficient. Each count transition (positions where the value of the count changes along the histogram) is generally encoded in about 13 bits. +
-* '''count archives''', an archive of counts, one histogram per reference sequence in an alignment. Archives can store histogram data for a complete genome. They are very space efficient, with only about 20Mb needed to store a histogram of reads aligned against the human genome at base pair resolution. In contrast, a wiggle plot stored at 20bp resolution needs about 45Mb.+
- +
-==Utilities==+
-In addition to these file formats, Goby provides a few utilities that implement common next-gen data computations. Each utility is implemented in a Goby mode. Each mode can be run from the command line as follows:+
- +
- java -jar goby.jar -m <mode-name>+
- +
-Help is built into goby. To access detailed usage instructions, use the --help/-h flag, as in +
- +
- java -jar goby.jar --help (lists all modes)+
- java -jar goby.jar --help --mode <mode-name> (provides mode specific usage info)+
- +
-See also below for a list of the current Goby modes.+
- +
-==Common analyses==+
-Goby is designed to make common analyses simple, and more complex analyses possible. More complex analyses usually require programming directly with the framework, in Java, or scripting a sequence of mode with the language of your choice. +
- +
-Two common applications are currently supported:+
-# Creation of wiggle plots from next-gen data (see quick demo on the project home page, and this [[Goby/Example|detailed walk-through]]).+
-# Calculation of differential expression statistics across groups of samples (see help for mode alignment-to-annotation-counts, and [[Goby/DE|quick demo here]]).+
-# Filter out [[Goby/Non-redundant|redundant reads]]+
- +
-Details about each mode will be available here:+
- +
-{{:Goby/Modes}}+
- +
-==For (Java) developers==+
-See our source distribution on the [http://icbtools.med.cornell.edu/goby/download.html download page] and the published [http://icbtools.med.cornell.edu/javadocs/goby Goby Javadocs].+
- +
-Goby is a next-gen data management framework developed in the [http://icb.med.cornell.edu/research/labs/campagne/index.xml Campagne laboratory], at the Weill Medical College of Cornell University.+
- +
-== Credits ==+
- +
-The following people have or are contributing to the development of Goby:+
- +
-# Jaaved Mohammed (Tri-I student in Computational Biology and medicine, rotation Jan 2010)+
-# Nyasha Chambwe (Tri-I student in Computational Biology and medicine, testing and using Goby for RNA-Seq projects)+
-# Stuart Andrews (post-doc, testing, color-space support with bwa) +
-# Xutao Deng (CTSC biomarker analyst, now at City of Hope, efficient annotation and base pair count algorithms)+
-# Kevin Dorff and Marko Srdanovic (software developers, support, testing, documentation, release preparation) +
-# Fabien Campagne (lab head, compressed formats and overall project architecture)+
- +
- +
-== User mailing list ==+
- +
-Subscribe to the Goby user mailing list to receive notifications about future software releases and new features. See details on the [http://icbtools.med.cornell.edu/goby/download.html download page].+

Current revision

Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. Please see the official project page at http://goby.campagnelab.org.

Personal tools