After downloading the sofware, simply type make in the base installation directory. This should build Inchworm and Chrysalis, both written in C++. Butterfly should not require any special compilation, as its written in Java and already provided as portable precompiled software.
Trinity has been tested and is supported on both Mac OSX and Linux.
Trinity is run via the script: Trinity.pl in the base installation directory.
Usage info is as follows:
##################################################################
#
# Required:
#
# --seqType <string> :type of reads: (fq or fa)
#
# If paired reads:
#
# --left <string> :left reads
# --right <string> :right reads
#
# If unpaired reads:
#
# --single <string> :single reads
#
# --output <string> :name of directory for output (will be created if it doesn't already exist)
#
#
# if strand-specific data, set:
#
# --SS_lib_type <string> :if paired: RF or FR, if single: F or R
#
#
#
# Butterfly options:
#
# --run_butterfly :executes butterfly commands. Do not set this if you want to spawn them on a computing grid.
#
# --bfly_opts :parameters to pass through to butterfly (see butterfly documentation) default: "--edge-thr=0.16"
#
# --bflyHeapSpace :java heap space setting for butterfly (default: 1000M) => yields command java -Xmx1000M -jar Butterfly.jar ... $bfly_opts
#
# Misc:
#
# --CPU <int> :number of CPUs to use, default: 2
#
# --min_contig_length <int> :minimum assembled contig length to report (def=300)
#
# --paired_fragment_length <int> :size of a read pair insert (def=300)
#
# --jaccard_clip :option, set if you have paired reads and you expect high gene density with UTR overlap (use FASTQ input files for reads).
#
# --run_ALLPATHSLG_error_correction :runs the read error correction process built into ALLPATHSLG.
# (requires ALLPATHSLG to be installed, and installation directory indicated
# by the env variable 'ALLPATHSLG_BASEDIR')
#
#
#############################################################################################################
Note
|
Trinity performs best with strand-specific data, in which case sense and antisense transcripts can be resolved. |
If you have strand-specific data, specify the library type. There are four library types:
-
Paired reads:
-
RF: first read of fragment pair is sequenced as anti-sense (reverse), and second read is in the sense strand (forward); typical of the dUTP sequencing method.
-
FR: first read of fragment pair is sequenced as sense (forward), and second read is in the antisense strand (reverse)
-
Unpaired (single) reads:
By setting the —SS_lib_type parameter to one of the above, you are indicating that the reads are strand-specific. By default, reads are treated as not strand-specific.
Whether you use Fastq or Fasta formatted input files, be sure to keep the reads oriented as they are reported by Illumina, if the data are strand-specific. This is because, Trinity will properly orient the sequences according to the specified library type. If the data are not strand-specific, now worries because the reads will be parsed in both orientations.
When Trinity completes, it will create a Trinity.fasta output file in the Trinity_dir/ output directory (or output directory you specify). Expression values are roughly computed for transcripts based on the number of reads incorporated into assembled transcripts. We are actively working on improved computations for transcript expression levels that resolve ambiguities in read mappings.
If your transcriptome RNA-Seq data are derived from a gene-dense compact genome, such as from fungal genomes, where transcripts may often overlap in UTR regions, you can minimize fusion transcripts by leveraging the —jaccard_clip option if you have paired reads. Trinity will examine the consistency of read pairings and fragment transcripts at positions that have little read-pairing support. In expansive genomes of vertebrates and plants, this is unnecessary and not recommended. In compact fungal genomes, it is highly recommended. In addition to requiring paired reads, you must also have the Bowtie short read aligner installed. As part of this analysis, reads are aligned to the Inchworm contigs using Bowtie, and read pairings are examined across the Inchworm contigs, and contigs are clipped at positions of low pairing support. These clipped Inchworm contigs are then fed into Chrysalis for downstream processing. Be sure that your read names end with "/1" and "/2" for read name pairings to be properly recognized.
The Inchworm and Chrysalis steps can be memory intensive. A basic recommendation is to have 1G of RAM per 1M pairs of Illumina reads. Simpler transcriptomes (lower eukaryotes) require less memory than more complex transcriptomes such as from vertebrates. Butterfly requires less memory and can be executed in parallel on a computing grid. The Chrysalis step can sometimes enter a deep recursion, in which case the stack memory can exceed default limits. Before running Trinity, set the stacksize to unlimited (or as high as you can). See FAQ for more details.
If you do not have a computing grid, but you have a multi-core server, be sure to set the —run_butterfly parameter, and indicate the number of butterfly processed to run in parallel by the —CPU paramter. If you have access to a computing grid, do not set these parameters. Instead, a butterfly_commands file will be generated in the Trinity_dir/Chrysalis output directory. Run these commands in parallel on your computing grid. Most butterfly jobs require minimal memory (<1G), but some read-rich graphs can require up to 10G of RAM or more.
Our experience is that the entire process can require about 1 hour per million pairs of reads in the current implementation (see FAQ). We're striving to improve upon both memory and time requirements.
The Trinity software distribution includes sample data in the sample_data/ directory. Simply run the runMe.sh shell script in the sample_data/ directory to execute the Trinity assembly process with provided paired strand-specific Illumina data derived from mouse. Running Trinity on the sample data requires ~2G of RAM.
The Butterfly —edge-thr value is a major determinant of the complexity of compacted graphs and the sensitivity for the detection of alternative splice variants. The default value of 0.16 indicates that incoming or outgoing graph edges that have less than 16% the weight of all incoming or outgoing edges are pruned from the graph. In the context of alternative splicing, this means that portions of isoforms branching off with less read support are not resolved. In the Trinity paper (to be published shortly), this value was set at 0.05 for improved sensitivity, though a handful of transcript graphs are processed very slowly at this setting. For further improved runtime performance and to capture only the most strongly supported alternatively spliced isoforms, set this value to 0.26. Set this via the —bfly_opts "—edge-thr=0.26" parameter of the Trinity.pl wrapper.
Visit the Advanced Guide to Trinity for more information regarding Trinity behavior, intermediate data files, and file formats.
Additional questions, comments, etc?