Seqanswers Leaderboard Ad

**Brian Bushnell** · 08-24-2016, 12:45 PM

I would expect most analyses to be better with paired data, though I have never personally used MISO. You can calculate insert-size distributions quite rapidly with BBMerge. How long are your reads, and what kind of organism are the from?

**fpbarthel** · 08-24-2016, 01:35 PM

Hi Brain,

Does BBMerge insert-size distribution calculation work efficiently with transcriptome mapped reads, where reads may span exons that are kilobases apart?

At this point I only have miniBAMs for a single gene, generated from the large BAMs stored on external drive. These minibams may have as little as 20 reads. All is human data (hg19 aligned). Read lengths are 48 or 75, depending on the sample.

Thanks!

**Brian Bushnell** · 08-24-2016, 02:10 PM

Oh, that could be a bit of a problem... BBMerge does not care about the presence of introns, but it does require the reads to be overlapping or near-overlapping (so, those reads are probably too short for a 250bp average insert size). Although as long as you map to the transcriptome, an insert size calculated from mapping will be be unaffected by introns. There will still be a bit of uncertainty due to differential splicing, but I think you'll still get a pretty accurate value.

For example, if you convert a large bam to fastq, and the reads are in their original order or name-sorted, you can run BBMap like this:

bbmap.sh ref=transcriptome.fasta in=reads.fq reads=1m ihist=ihist.txt interleaved

That will map the first 1 million pairs to the transcriptome and calculate the insert size distribution, which should only take a minute or so per bam file.

If BBTools and samtools are installed, you can do the conversion like this:

reformat.sh in=x.bam out=x.fq reads=2m

...which will just convert the first 2 million reads (1 million pairs) of the bam file, so you don't have to convert the whole thing. Again, though, the bam file must be unsorted (original order) or name-sorted.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

MISO: Best-guessing insert size and stddev versus running single-ended for PE data

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News