SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Insert size important? 454andSolid De novo discovery 3 12-27-2017 02:33 AM
Insert size != Fragment size? Boel Bioinformatics 6 12-12-2013 09:28 AM
About Insert, Insert size and MIRA mates.file aarthi.talla 454 Pyrosequencing 1 08-01-2011 02:37 PM
Insert size for Pindel mard Bioinformatics 0 12-15-2010 09:08 PM
insert size polystone Sample Prep / Library Generation 0 05-04-2010 11:07 AM

Reply
 
Thread Tools
Old 03-18-2010, 04:01 PM   #1
adrian
Member
 
Location: baltimore

Join Date: Oct 2009
Posts: 89
Default insert size

Dear group:
I need some help in understanding insert size concept.

I have a targeted exome sequencing data using paired-end approach with 76 bp. I have lots of duplicates in the sam file. I should use rmdup with insert size correctly mentioned.

i was told by technician that insert size is between 150-300 bp.

When I see 9th tag which is inferred insert size in sam file, I have lots of numbers that range from 0 to 100,000.

Since the experiment is done with an insert size 150-300 bp, and BWA inferred insert size has lots of ranges, what number should I use in using rmdup. Heng Li recommends that we should use correct insert size always. If I have range from 150-300 (technician) and SAM file inferred sizes are spanning across wide ranges, Which insert size should I select to remove duplicates and call SNPs.

OR should I make sets of reads that fall into certain ranges and call SNPs in each bin.

Also what is inferred insert size '0' mean and what is 345,039 mean.

thanks
Adrian
adrian is offline   Reply With Quote
Old 03-18-2010, 05:55 PM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Many if not all of the very large insert sizes are probably due to falsely aligned reads to repeats (LINEs, SINEs, Satellite, etc); take a few of them and look to see whether both reads actually align to non-repeat DNA.

Also, take a sample from your SAM file (ideally random) & look at the size distribution -- you may see a long tail of weird sizes, but you'll probably see most of the counts in a distribution around what the technician said. It's a worthwhile check on the library in any case & easy to generate the data table with a little bit of perl (or even UNIX shell commands).
krobison is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO