Hi All!
I've been working for a few months on a command-line de novo transcriptome assembly pipeline called TFLOW (Transcriptome-Flow). I wanted to see if anyone might be interested in using/testing it and giving me some feedback on whether it is useful!
It can be downloaded here: http://www.github.com/fsugenomics/TFLOW/
The TFLOW framework supports a few different assembly pipes at this point, and is designed to be modular so different pipe segments can be inserted.
The main Trinity_Pipe is based on Trinity for primary sequence assembly but builds on Trinity by providing several auxiliary features. These include:
- Read File Parsing (where applicable)
- External Trimmomatic Read Trimming, for maximum flexibility of accessibility and reproducibility,
- Trinity Assembly, with any desired parameters passed through,
- CAP3 Assembly on Trinity output, prepares single-tissue assemblies for combination into a multi-organism/multi-tissue transcriptome.
- Automatic Statistical Analysis on Trinity and CAP3 Outputs (Total, min-len, max-len, N50, etc…)
- Automated analysis to determine the amount of genes from two benchmarking gene databases via BLAST Homology:
- CEGMA (Core Eukaryotic Gene Database)
- BUSCO (Further Benchmarking Genes that are Species-Subset Specific)
To combine multiple tissues, a similar CAP3 Pipeline is used within the TFLOW Framework:
- CAP3 (to combine individual tissue transcriptomes)
- Statistical Analysis
- CEGMA and BUSCO Gene Recapture Analyses
The pipeline is designed to be easily-accessible while still allowing the full breadth of features for each of the component segments by allowing advanced parameter passthrough.
The Trimmomatic read trimming parameters default to a minimum quality threshold of PHRED:30 for each read, Illumina adapter trimming, and a minimum length of 75bp, but all Trimmomatic, Trinity, and CAP3 settings can be easily changed as desired for a particular project.
If you use different assembly or analysis steps in your transcriptome assembly process, I would be very interested in communicating to find out what they are! TFLOW has been designed to work with modular “segments,” so I would like to create and include modules that would work with whatever process is needed for a particular type of work.
Please let me know what you think!
I've been working for a few months on a command-line de novo transcriptome assembly pipeline called TFLOW (Transcriptome-Flow). I wanted to see if anyone might be interested in using/testing it and giving me some feedback on whether it is useful!
It can be downloaded here: http://www.github.com/fsugenomics/TFLOW/
The TFLOW framework supports a few different assembly pipes at this point, and is designed to be modular so different pipe segments can be inserted.
The main Trinity_Pipe is based on Trinity for primary sequence assembly but builds on Trinity by providing several auxiliary features. These include:
- Read File Parsing (where applicable)
- External Trimmomatic Read Trimming, for maximum flexibility of accessibility and reproducibility,
- Trinity Assembly, with any desired parameters passed through,
- CAP3 Assembly on Trinity output, prepares single-tissue assemblies for combination into a multi-organism/multi-tissue transcriptome.
- Automatic Statistical Analysis on Trinity and CAP3 Outputs (Total, min-len, max-len, N50, etc…)
- Automated analysis to determine the amount of genes from two benchmarking gene databases via BLAST Homology:
- CEGMA (Core Eukaryotic Gene Database)
- BUSCO (Further Benchmarking Genes that are Species-Subset Specific)
To combine multiple tissues, a similar CAP3 Pipeline is used within the TFLOW Framework:
- CAP3 (to combine individual tissue transcriptomes)
- Statistical Analysis
- CEGMA and BUSCO Gene Recapture Analyses
The pipeline is designed to be easily-accessible while still allowing the full breadth of features for each of the component segments by allowing advanced parameter passthrough.
The Trimmomatic read trimming parameters default to a minimum quality threshold of PHRED:30 for each read, Illumina adapter trimming, and a minimum length of 75bp, but all Trimmomatic, Trinity, and CAP3 settings can be easily changed as desired for a particular project.
If you use different assembly or analysis steps in your transcriptome assembly process, I would be very interested in communicating to find out what they are! TFLOW has been designed to work with modular “segments,” so I would like to create and include modules that would work with whatever process is needed for a particular type of work.
Please let me know what you think!