Unconfigured Ad

**Brian Bushnell** · 01-14-2015, 11:17 PM

For RNA-seq data of a eukaryote, if you have overlapping paired reads, the best way to get the insert size distribution is by merging the reads based on overlap, as that will not be affected by introns. You can do that with BBMerge like this:

bbmerge.sh in1=read1.fq in2=read2.fq ihist=ihist_merging.txt loose

It is best in this case to NOT quality-trim, as that may eliminate some of the longer inserts. Trimming adapters is fine.

If the reads are not overlapping, I recommend BBMap, which calculates insert size in such a way that reads spanning introns still yield the correct insert size:

bbmap.sh ref=reference.fasta in1=read1.fq in2=read2.fq ihist=ihist_mapping.txt out=mapped.sam maxindel=200000

For either program you can cap it at, say, 1M read pairs with the flag "reads=1000000". This will make it stop early, since 1 million is plenty to calculate an insert size distribution, if you don't want the sam file.

**pkstarstorm05** · 01-14-2015, 11:38 PM

Reply-insert size calculation

Hi Brian ,

Sorry, I should have clarified that - I'm working on Mouse tissue.

So if I'm understanding this correctly - you're suggesting to run BBmap or BBmerge just to estimate the insert size, and then use the estimated insert size in Tophat? (Very clever if so!)

I'm not at all sure if we have overlapping reads or not. I'm assuming that we do - the output from the RNA-seq was ~30 million pairs of reads per read set, so there is bound to be overlapping reads. What am I using to determine/How do I tell if the reads are overlapping? (i.e. is this something special about the RNA-seq run or anything?)

Thanks for the clarification and help!

Cheers,
Paul

**Brian Bushnell** · 01-15-2015, 09:00 AM

Hi Paul,

BBMerge will tell you the insert size distribution in under a minute if you cap it at 1M reads and they are, in fact, overlapping. You'll have to graph the histogram to determine whether the mode was captured or if the insert size was too long for merging. If you have 2x150bp reads, BBMerge can capture insert sizes up to around 288bp with default settings. If the graph is ascending, then there is a sudden, steep dropoff at 188bp, and/or you only get, say, 20% of your reads to merge, then the inserts were too long (or reads too short) for that approach. You know the mode is captured if the graph goes up, peaks, and then starts going back down well before the sharp dropoff to zero just before 2x(read length).

BBMap is able to calculate the distribution for arbitrarily long inserts, but takes longer and of course requires a reference. You can use the insert size distribution to feed into Tophat, or you can just use BBMap's sam file directly; it is a splice-aware aligner that does not need insert size as a parameter (since it auto-detects it while running) and is substantially faster and more sensitive than Tophat anyway. To do that, and produce output with cufflinks-compatible tags, you'd also need to add the flags "xstag=unstranded intronlen=10 ambig=random" (unless your data is strand-specific, in which case you could use 'firststrand' or 'secondstrand').

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, Yesterday, 10:26 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 Yesterday, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 26 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

How to correctly estimate RNA-seq mean insert size and standard distribution

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News