SIGNATURE

A Workbench for Gene Expression Signature Analysis.

This is the front page for the SIGNATURE platform for gene expression signature analysis. This lists the modules currently available for analysis and provides brief examples of their use. More detailed descriptions of each module are available in the help files in GenePattern. For a description of SIGNATURE, or to cite it in a publication, please see:

Chang JT, Gatza ML, Lucas JE, Barry W, Vaughn P, and Nevins JR (2011). "SIGNATURE: A Workbench for Gene Expression Signature Analysis." BMC Bioinformatics 12:443.

The source code and data files for SIGNATURE are available publicly from: http://www.bioinformatics.org/signature/.

The following modules are available through the GenePattern interface:


ScoreSignatures

This module will predict, in a gene expression data set, the activation of signaling pathways. The gene expression data should be provided on an Affymetrix U133A array and normalized with both RMA and MAS5.

    Example: Predict pathway activation in 19 breast cancer cell lines.

  1. In the GenePattern interface, select the ScoreSignatures module under the SIGNATURE category.

  2. For the rma_expression_file parameter, upload an RMA-normalized gene expression data set. This data should have been collected on an Affymetrix U133Aarray.
    [Sample: http://genepattern.genome.duke.edu/signature/breast_19.rma.gz]

  3. For the mas5_expression_file parameter, upload a MAS5-normalized gene expression data set. These should be the same samples as those in the rma_expression_file data set (but normalized with mas5 instead).
    [Sample: http://genepattern.genome.duke.edu/signature/breast_19.mas5.gz]

  4. Click Run.

  5. After processing (which may take an hour), see the REPORT.html file for a summary of the analysis.

CreateSignature

This module will help to create a new signature to predict pathway activation. You must provide a gene expression data set in which your pathway is off, as well as a data set where your pathway is on.
    Example: Create a signature for ER activation.

  1. In the GenePattern interface, select the CreateSignatures module under the SIGNATURE category.

  2. For the train0 parameter, upload a data set that contains samples where the pathway of interest is off. For this file, we typically use between 5 and 10 samples.
    [Sample: http://genepattern.genome.duke.edu/signature/er.l2.mas5.train0.gz]

  3. For the train1 parameter, upload a data set that contains samples where the pathway of interest is on. For this file, we typically use between 5 and 10 samples.
    [Sample: http://genepattern.genome.duke.edu/signature/er.l2.mas5.train1.gz]

  4. For the test parameter, upload a data set that contains samples where the pathway activity should be predicted. Leave this empty if you do not have any samples where you want to predict pathway activation.
    [Sample: http://genepattern.genome.duke.edu/signature/breast_19.mas5.gz]

  5. The values for the rest of the parameters depend on the signature used. For the example here of an ER signature, we recommend the following values:
    Parameter Value
    num_genes 125
    num_metagenes 2
    apply_quantile_normalization yes
    apply_shiftscale_normalization yes
    More help for these and other parameters is available in the documentation for the module in the GenePattern interface.

  6. Click Run.

  7. After processing (which will take a few minutes), see the REPORT.html file for a summary of the analysis.

BFRMNormalize

This module will normalize multiple data sets, reducing variation from batch effects. Data sets must be collected on Affymetrix arrays and include the Affymetrix control probe sets (i.e. those that start with AFFX-).

    Example: Normalize three breast cancer data sets collected at different centers.

  1. In the GenePattern interface, select the BFRMNormalize module under the SIGNATURE category.

  2. Set the num_factors parameter to 15, the default. This controls the degree of normalization. Larger numbers remove more differences at the expense of removing signal. Smaller numbers remove fewer differences and may leave more noise. You can try a range of values from 2 to 50.

  3. Set the file parameters to the data sets that you want to normalize together. In this example, we will normalize three breast cancer data sets.
    [Sample 1: http://genepattern.genome.duke.edu/signature/GSE1561.rma.gz]
    [Sample 2: http://genepattern.genome.duke.edu/signature/GSE4922_SINGAPORE.rma.gz]
    [Sample 3: http://genepattern.genome.duke.edu/signature/GSE6596.rma.gz]

  4. Click Run.

  5. After processing (which will take a few hours), see the REPORT.html file for a summary of the analysis.

FindSubtypes

This module will group a set of samples into subtypes using a mixture model. This is typically applied on pathway predictions made with the ScoreSignatures module.

    Example: Divide breast cancer samples into subtypes.

  1. In the GenePattern interface, select the FindSubtypes module under the SIGNATURE category.

  2. Set the dataset parameter to the file of pathway predictions that you wish to analyze. This is typically a probabilities.pcl file generated by the ScoreSignatures module.
    [Sample: http://genepattern.genome.duke.edu/signature/probabilities_GSE1561.pcl]

  3. Set the penalty parameter. This number should be less than 0. Higher values yield more subtypes, and lower ones fewer subtypes.

  4. Click Run.

  5. After processing (which will take a few minutes), the predictions.png file contains the probability that each sample is in a subtype.

PredictSubtypes

This module will use a subtype model created by FindSubtypes, and predict those subtypes for samples in a data set.

    Example: Predict the breast cancer subtypes on a breast cancer data set.

  1. In the GenePattern interface, select the FindSubtypes module under the SIGNATURE category.

  2. Set the dataset parameter to the file of pathway predictions that you wish to analyze. This is typically a probabilities.pcl file generated by the ScoreSignatures module.
    [Sample: http://genepattern.genome.duke.edu/signature/probabilities_GSE6596.pcl]

  3. Set the subtype_model parameter to the subtype model. This should be the subtype_model.zip file that was previously created by FindSubtypes.
    [Sample: http://genepattern.genome.duke.edu/signature/subtype_model.zip]

  4. Click Run.

  5. After processing (which will take a few minutes), the predictions.png file contains the probability that each sample from your data set is in a subtype defined by the subtype_model.

BFRMFactor

This module will perform a Bayes latent factor decomposition on a gene expression data set. You can find factors over all genes in the data set, or as an option, you can also focus on a gene set of interest, termed the nucleus genes. If a nucleus is provided, then this module will perform an iterative evolutionary search, where at each iteration, additional genes and factors are added to the model. The search will stop once it reaches a pre-specified maximum number of genes or factors.

    Example: Find the E2F factors in a data set.

  1. In the GenePattern interface, select the BFRMFactor module under the SIGNATURE category.

  2. Set the dataset parameter to the gene expression data set that you wish to analyze.
    [Sample: http://genepattern.genome.duke.edu/signature/nci60.u133a.rma.gz]

  3. Set the filter_mean and filter_var parameters each to 0.25, to exclude the genes with lowest 25% mean and lowest 25% variance.

  4. Set the nucleus_file to the file that contains the names of the genes to model.
    [Sample: http://genepattern.genome.duke.edu/signature/rbe2f_nucleus.txt]

  5. Set start_factors to 1, so that the evolutionary search will start with 1 latent factor.

  6. Set max_genes to 750, so that the evolutionary search will stop once it has reached 750 genes. (Ignore the max_factors parameter.)

  7. Click Run.

  8. After processing (which may take an hour), see the factors.png file for a heatmap that shows the factors across your data set.

BFRMProject

This module will project (calculate the scores of) the factors in a factor model onto another data set. This is useful if a factor model has already been created, and you wish to evaluate the scores of those factors in another data set.

    Example: Calculate the scores of the E2F factors in another data set.

  1. In the GenePattern interface, select the BFRMProject module under the SIGNATURE category.

  2. Set the dataset parameter to the gene expression data set where you wish to project the factors.
    [Sample: http://genepattern.genome.duke.edu/signature/GSE1561.rma.gz]

  3. Set the bfrm_model parameter to the factor model. This should be the bfrm_model.zip file that was created in a previous BFRMFactor analysis.
    [Sample: http://genepattern.genome.duke.edu/signature/bfrm_model.zip]

  4. Click Run.

  5. After processing (which may take an hour), see the factors.png file for a heatmap showing the factors across your new data set.

For questions, email the Microarray Core Facility at microarray@duke.edu.

Last updated: 21 March 2012.