Our current system for identifying differentially expressed transcripts relies on using the EdgeR Bioconductor package. We have a protocol and scripts described below for identifying differentially expressed transcripts and clustering transcripts according to expression profiles. This process is somewhat interactive, and described are automated approaches as well as manual approaches to refining gene clusters and examining their corresonding expression patterns.

Note
We recommend generating a single Trinity assembly based on combining all reads across all samples as inputs. Then, reads are aligned separately back to the single Trinity assembly for downstream analyses of differential expression. If you decide to assemble each sample separately, then you'll have difficulty comparing the results across the different samples due to differences in assemlbed transcript lengths and contiguity.
Note
If you have biological replicates, align each replicate set of reads to the Trinity contigs independently. See the DESeq protocol below for analyzing differential expression leveraging biological replicates.

First, join the RSEM-estimated abundance values for each of your samples by running:

TRINITY_RNASEQ_ROOT/util/RSEM_util/merge_RSEM_counts_single_table.pl  sampleA.RSEM.isoform.results sampleB.RSEM.isoform.results ... > all.counts.matrix

Edit the column headers in the matrix file to your liking, since this is how the samples will be named in the downstream analysis steps.

Identifying Differentially Expressed Transcripts

Run EdgeR (no biological replicates)

In case you need to install edgeR, visit the edgeR Bioconductor site.

Using the all.counts.matrix file created above, perform TMM (trimmed mean of M-values) normalization and identify differentially expressed transcripts resulting from pairwise comparisons among the samples like so:

TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/run_EdgeR.pl --matrix all.counts.matrix --transcripts Trinity.fasta --output edgeR_results_dir

If you have only a single reference sample that you want the other samples to be compared to, as opposed to the all-vs-all comparisons, indicate the reference sample's column heading with: —reference ref_column_name as it exists in the all.counts.matrix file.

example_edgeR_MA_plot

The TMM and length-normalized (FPKM) expression values are provided in a file: edgeR_results_dir/Trinity.fasta.normalized.FPKM, which can be examined using additional methods described below.

Run DESeq (leveraging biological replicates)

In case you need to install DESeq, visit the DESeq Bioconductor site.

Create a text file samples_described.txt that describes the relationship between samples and biological replicates like so:

conditionA   condA-rep1
conditionA   condA-rep2
conditionB   condB-rep1
conditionB   condB-rep2
conditionC   condC-rep1
conditionC   condC-rep2

Where condA-rep1, condA-rep2, condB-rep1, etc…, are all column names in the all.counts.matrix generated earlier (see top of page). Your sample names that group the replicates are user-defined here.

Now, run DESeq for each pair of conditions like so:

TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/run_DESeq_all_vs_all.pl 'samples_described.txt' 'all.counts.matrix'

Analyzing Differentially Expressed Transcripts

An initial step in analyzing differential expression is to extract those transcripts that are most differentially expressed (most significant P-values and fold-changes) and to cluster the transcripts according to their patterns of differential expression across the samples. To do this, you can run the following from within the edgeR output directory

TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/analyze_diff_expr.pl --matrix transcript_read_counts.RAW.normalized.FPKM -P 1e-3 -C 2
If you used DESeq to call differentially expressed transcripts, then
first, perform the TMM normalization:
TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/run_EdgeR.pl --matrix all.counts.matrix --transcripts Trinity.fasta --just_TMM --output TMM
and then run the analysis script, including the --DESeq option:
TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/analyze_diff_expr.pl --matrix TMM/matrix.TMM_normalized.FPKM -P 1e-3 -C 2 --DESeq

heatmap

The above is mostly just a visual reference. To more seriously study and define your gene clusters, you will need to interact with the data as described below. The clusters and all required data for interrogating and defining clusters is all saved with an R-session, locally with the file all.RData. This will be leveraged as described below.

Automatically defining a K-number of Gene Clusters

Run the command below to automatically split the data set into a set of $num_clusters (similar to k-means clustering).

TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/define_clusters_by_cutting_tree.pl -K $num_clusters

To plot the mean-centered expression patterns for each cluster, visit that directory and run:

TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/plot_expression_patterns.pl subcluster_*

This will generate a summary image file: my_cluster_plots.pdf, as shown below:

expression_profiles_for_clusters

Manually Defining Gene Clusters

Manually defining your clusters is the best way to organize the data to your liking. This is an interactive process. Fire up R from within your output directory, being sure it contains the all.RData file, and enter the following commands: R

load("all.RData")
source("TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/R/manually_define_clusters.R")
manually_define_clusters(hc_genes, centered_data)

This should yield a display containing the hierarchically clustered genes, as shown below:

expression_hcl_tree

Now, manually define your clusters from left to right (order matters here, so you can decipher the results later!) by clicking on the branch vertical branch that defines the clade of interest. After clicking on the branch, it will be drawn with a red box around the selected clade, as shown below:

manually_selected_hcl_clusters_from_tree

Right click with the mouse (or double-touch a touchpad) to exit from cluster selection.

The clusters as selected will be written to a subdirectory manually_defined_clusters_$count_clusters, and exist in a format similar to the automated-selection of clusters described above. Likewise, you can generate plots of the expression patterns for each cluster using the plot_expression_patterns.pl script.