The ClusterFinder™ Software

Peak characterization: artifact, control, experimental

Massive Data Reduction


Normalization and Identification

The distinctive IROA patterns described in the Section “the IROA Peaks” are used in the interpretation of the resulting composite spectra because it is possible to discriminate the origin of every peak in the composite; 12C-derived molecules, 13C-derived molecules, artifacts and derivatives of exogenously applied compounds.

The IROA peaks are all mathematically calculable and each set (12C and 13C) of carbon isotopomers reliably and accurately accounts for the other set providing a redundant quality control check point making it possible to create software algorithms to interpret the analytical results of the composite sample.

The software package ClusterFinder™, achieves a data reduction of complex raw data, to concise, high value information, as follows:

Step 1: Characterize all peaks according to source (artifact, experimental (12C), control (13C), or standard)

Step 2: Remove all artifacts

Step 3: Align and pair all remaining peaks across all scans

Step 4: Normalize and identify all pairs

Step 5: Determine the relative 12C/13C ratios of analytes in each sample.

Step 6: Determine the statistical variance of the sample ratios.

Any experimental analyte compound that has a ratio with a significant deviation (two or more standard deviations) from the average ratio will indicate a point where the biochemistry was altered. For instance, if the average ratio for all of the analytes is 1 (1:1 12C/13C ratio), but some analytes have ratios of 10 (10:1) or 0.1 (1:10) then the analytes that are outliers to the general population are those most strongly affected by the stressor.

To achieve Step 1, ClusterFinder performs a scan-by-scan analysis of the complete dataset and identifies all IROA peaks based on their extended isotopic envelopes. Step 2, the removal of all non-IROA peaks, results in a significantly simplified dataset which may then be aligned based on mass across adjacent scans in Step 3. The mirrored symmetry of the C12 and C13 halves of each IROA cluster (called the “IROA smile”) allow the isotopic peaks to be correctly associated with their appropriate 12C-base peak or 13C-base peak. In Step 4, because artifacts and non-biological compounds are identified and removed, as they carry no IROA signatures, normalization is achieved utilizing the total intensity of components common to all samples after baseline correction, avoiding the influence of matrix and solvent-related intensity measurements. As the isotopic dilution distributes molecules over a number of masses, the sum of the base peaks and their associated isotopic peaks must be used to calculate their respective areas accurately. Once the summed areas are available their ratio is simply calculated in Step 5. On the whole, because the isotopic patterns carry so much information, the tasks the software performs are well defined but not overly difficult. For Step 6 it's a quick jump over to the IROA data portal which will provide additional high quality data interpretation of the IROA data set. The resulting report summarizes all aspects of the experiment and its data, and all likely interpretation based on the provided experimental design.  The portal provides basis statistics (regressions, variances, etc.), Principal Component, Random Forest, Self-Organizing Maps, data factorizations, Volcano and other hybrid plots, Hierarchial Clustering, Correlation Analysis, metabolic mappings, enrichment, and a variety of Graph-based analyses.  Finally, it presents summary plots of the distribution of all significant compounds.  

The ClusterFinder software has two modes of use. Global: Where both the C12 and C13 media are utilized, the software can find all IROA peaks regardless of their source. This represents a fully unbiased analysis as compounds of biological origin present or absent in either samples (control or experimental) will be identified by the software and artifacts will be simultaneously identified and removed from the dataset. Phenotypic or Targeted: The software will also find all IROA-labeled compounds where the IROA peak is present in one sample but not the other. This allows for a complex targeted analysis whereby the 13C IROA peaks in the control sample are used to identify their associated natural abundance peaks. This mode of analysis is called "Phenotypic Analysis" (described in another Section).

The ClusterFinder software is optimized and packaged with Excelsior JET and distributed free as part of a collaboration.

Hardware Requirements:

  1. Intel Pentium III/800 MHz or higher (or compatible); dual core processor or higher;
  2. 4 GB RAM minimum

Software Requirements:

  1. Windows XP (x64 and x84)
  2. Windows 7 SP1 (x86 and x84)
  3. Windows Server 2008 R2 SP1 (x64)
  4. Windows Server 2008 SP2 (x86 and x84)
  5. Windows 8
  6. Windows Server 2012
Computer needs to be network-accessible.