SEQanswers (
-   Bioinformatics (
-   -   Minimum short read required for transcriptome assembly (

edge 09-21-2011 09:25 AM

Minimum short read required for transcriptome assembly
I have Illumina short read, 2X50bp right now, around 14Gb data.
I just curious whether got any parameter or formula able to calculate the minimum short read required to assemble a transcript sequence by transcriptome assembler program in order to obtain comprehensive transcript?
eg. must have at least 1Mb Illumina short read in order to assemble it.

Do we need consider coverage and depth of data when determine or calculate the minimum short read required for transcriptome assembly as well?

Many thanks for advice.

westerman 09-21-2011 11:31 AM

Ah, I should have noted that you are a "Senior Member" and thus undoubtedly already know more about sequencing than many of us. My response below was more aimed towards the many new people we get on SeqAnswers thus it may not be applicable to you. Wish I did have more than a rough guide on an actual formula to use.



Originally Posted by edge (Post 51906)
Do we need consider coverage and depth of data...

Yes you do. In particular for a non-normalized transcriptome or non-rRNA-depleted sample then you need to be concerned with picking up low expression genes.

You do not give enough information for us to make an intelligent decision for your particular case (e.g., we would need information on the organism you are sequencing, the complexity of the genes for the organism, if your sequence sample is normalized or not, etc.) However we can play around with some very rough numbers.

Let us assume that your sample is completely normalized. In other words each transcript (gene) is present once and only once in your sample. Assume a complex eukaryotic organism. Then our numbers could look like:

100,000 genes at 1000 bases each ... equals a sequence space of 100 Mbase

Desire 30x sequencing coverage ... means we need 3 GB of sequence.

Your 14 GB will do quite nicely.

On the other hand let us assume that you do not have a normalized sample. Then some genes will be present thousands of times. Others only once. I am sure that there is some graph out there that describes this behavior and provides a multiplication factor but I'll make a wild guess that this increase the sequence space by at least 10. Thus you would need 30 GB of sequence.

The numbers above are very, very rough so do not base your research off of them. The numbers are more meant as a way to say "... it depends ..."

tbanks 09-21-2011 12:22 PM

The following publication shows a number of simulations on transcriptome assembly and the effects of coverage and sequencing technology. It`s a bit dated now but should help you out. I believe they also have some online software so you can do your own rough simulation.

Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang H, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW. Comparison of next generation sequencing technologies for transcriptome characterization. BMC Genomics. 2009 Aug 1;10:347.

edge 09-22-2011 12:18 AM

many thanks, westerman.

I have a RNA-seq human lung sample, 2X100bp, pair-end read with total 14GB file size right now.
I plan to map my RNA-seq data against transcriptome database that downloaded from NCBI.
After then, I plan to cluster all the short read depend on their mapped transcript group.
My problem facing is to determine how many minimum pair-end read is best to be a cut-off for assembly purpose.
From the mapping result, some of the transcript group only mapped by thousand read pair.

Thanks for any advice.

mruizm 08-25-2013 10:16 PM

Minimum deep of coverage in transcriptome assembly
Hi everyone, i have 4,46 Gigas of information on various sequencing of transcripts in various tissues of Illumina Miseq paired-end reads. I had assembly all these reads and i found that the mean deep of coverage is of 27,9X (Deep of coverage = efficiency of sequencing / efficiency of assembly)
My question here is, what is de minimun of the deep of coverage for obtain robust information of the assembled transcriptome in a de novo transcriptome analysis?

Best regards!

All times are GMT -8. The time now is 02:07 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.