QtClustering
From Icbwiki
Contents |
Overview
Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure.
Java implementation of the Quality Threshold clustering algorithm
A collection of clustering algorithms and tools written in Java have been developed at the ICB and is available as part of a library called "QtClustering". This is free software distributed under the GNU General Public License.
Algorithms Implemented
- QT clustering algorithm. QT stands for Quality Threshold (the diameter of a cluster). See Wikipedia and Heyer LJ et al 1999 for the article were the algorithm was reported and tested on microarray data.
Software Requirements
- Java Development Kit 1.5+ (note: versions 1.4 and below are not supported)
- Apache Commons Lang
- Apache log4j
- fastutil Fast and compact type-specific collections for Java
- MG4J (Managing Gigabytes for Java)
- Parallel Java Library (PJ)
- Apache Ant (if building library from source code)
Getting the library
The clustering library is distributed as a precompiled jar files and also in source code form. Distribution types are described in the following sections.
Binary Distribution
The binary distribution of the clustering library contains two jar files described as follows:
- qtclustering.jar
- includes all the external classes needed to run (i.e., fastutil)
- qtclustering-core.jar
- includes only the project classes and will require a fastutil.jar to use in your own projects
Source Distribution
The source distribution of the clustering library contains the Java source code along with supporting files that are used to compile and test the package.
Building
Note that this section is meant only for those with the source distribution or subverion access. Users of the binary distribution should skip this section.
Compiling and packaging
The target used to build the clustering package is called "jar". Executing ant jar will produce the a file called "qtclustering.jar" in the <install-dir>.
Running JUnit Tests
The clustering library is built using ant and a build.xml file located in the <install-dir>. The default target will compile the source and run the junit tests.
Subversion Access from the ICB local environment
This project's Subversion repository can be checked out through SVN with the following instruction set:
svn co https://pbtech-vc.med.cornell.edu/public/svn/icb/trunk/icb-commons/qtclustering
Browse the clustering package in the Subversion repository.
Documentation
The clustering Javadoc API is available here and is also included with the binary distribution.
More Information
CruiseControl Test Results
Cluster_analysis page at wikipedia.
Contact. Email feedback and questions to icb at med.cornell.edu.
