SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
velvet parameters for different insert length biohumin Bioinformatics 0 08-29-2011 04:04 AM
About Insert, Insert size and MIRA mates.file aarthi.talla 454 Pyrosequencing 1 08-01-2011 02:37 PM
how to determine a snp ? biocc Genomic Resequencing 5 11-27-2010 02:51 PM
You Determine the Next Innovative Technology kevotu General 3 11-02-2010 09:00 AM
Velvet insert length on Illumina NGS Paired end reads sari_khaleel Illumina/Solexa 0 10-29-2010 09:12 AM

Reply
 
Thread Tools
Old 05-29-2009, 11:34 PM   #1
anyone1985
Member
 
Location: shanghai, chia

Join Date: Mar 2009
Posts: 67
Default How to determine the insert length?

I have some Solexa pair-end data. But my colleague forgot to tell me the insert length How can I determine the insert length of the data? First, I have no reference genome.
anyone1985 is offline   Reply With Quote
Old 06-01-2009, 06:50 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

The easiest way would be to go back to your colleague and get the insert length. But if you really insist on doing this the hard way then I would suggest doing a de-novo assembly using the reads as if they were fragments (i.e., not as paired ends). This should give you some decent size contigs. Then map the paired ends map onto the contigs. From this you should be able to figure out how far apart are the paired ends that do map. After you obtain the numbers that define the range you can then do an new assembly but this time as a 'paired end' instead of a 'fragment' assembly.
westerman is offline   Reply With Quote
Old 06-01-2009, 06:58 PM   #3
anyone1985
Member
 
Location: shanghai, chia

Join Date: Mar 2009
Posts: 67
Default

thank you, i think i know what i should do
anyone1985 is offline   Reply With Quote
Old 06-01-2009, 08:49 PM   #4
system7
Junior Member
 
Location: California

Join Date: Apr 2008
Posts: 5
Default

Quote:
Originally Posted by anyone1985 View Post
I have some Solexa pair-end data. But my colleague forgot to tell me the insert length How can I determine the insert length of the data? First, I have no reference genome.
The summary.htm file from Pipeline should have that info in it at the bottom of the file
system7 is offline   Reply With Quote
Old 06-02-2009, 08:15 PM   #5
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by westerman View Post
The easiest way would be to go back to your colleague and get the insert length.
The problem with this is that the DNA fragment selection step is inexact. You may be aiming for 250 bp, but the average is 220 say, with a standard deviation of 30.

Quote:
But if you really insist on doing this the hard way then I would suggest doing a de-novo assembly using the reads as if they were fragments (i.e., not as paired ends). This should give you some decent size contigs. Then map the paired ends map onto the contigs. From this you should be able to figure out how far apart are the paired ends that do map. After you obtain the numbers that define the range you can then do an new assembly but this time as a 'paired end' instead of a 'fragment' assembly.
This is good advice. If you have a close reference sequence, you can use that instead of de novo contigs. I usually use MAQ to align a SUBSET of the reads in paired-end mode, and MAQ itself will print out the mean and s.d. of the insert size.

And as another poster said, if this is Illumina GA Pipeline, the Summary HTML files contain an estimate of the insert size which it obtains by using ELAND to map the reads to the reference genome specified in the gerald.cfg file.
Torst is offline   Reply With Quote
Old 06-03-2009, 10:13 AM   #6
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by Torst View Post
The problem with this is that the DNA fragment selection step is inexact. You may be aiming for 250 bp, but the average is 220 say, with a standard deviation of 30.
Well yes this could be a problem if your colleague only gives a single number then you have problems. I always ask for a minimum and maximum insert length knowing that those numbers are also uncertain. Also sometimes you can have a mixture of libraries with different insert sizes; e.g. average of 500 bp; 3K, 20K. Then one needs to know not only the range but also which reads corresponds to which library.

Quote:
And as another poster said, if this is Illumina GA Pipeline, the Summary HTML files contain an estimate of the insert size which it obtains by using ELAND to map the reads to the reference genome specified in the gerald.cfg file.
Ah, but the original poster said he did not have a reference genome.

It was an interesting theoretical question -- how does one figure out insert sizes when only given paired ends. A question that I am glad that I do not have to do in practice!
westerman is offline   Reply With Quote
Old 08-16-2010, 04:59 PM   #7
ashrafi_h
Junior Member
 
Location: Davis

Join Date: Jan 2010
Posts: 7
Question How do you use maq to determine the insert size?

Quote:
Originally Posted by Torst View Post
The problem with this is that the DNA fragment selection step is inexact. You may be aiming for 250 bp, but the average is 220 say, with a standard deviation of 30.



This is good advice. If you have a close reference sequence, you can use that instead of de novo contigs. I usually use MAQ to align a SUBSET of the reads in paired-end mode, and MAQ itself will print out the mean and s.d. of the insert size.

And as another poster said, if this is Illumina GA Pipeline, the Summary HTML files contain an estimate of the insert size which it obtains by using ELAND to map the reads to the reference genome specified in the gerald.cfg file.
Hi, If you do not have ref sequence, how do you use maq to determine the insert size. Could you please have a sample command line.

Thanks
ashrafi_h is offline   Reply With Quote
Old 08-16-2010, 05:00 PM   #8
ashrafi_h
Junior Member
 
Location: Davis

Join Date: Jan 2010
Posts: 7
Default

Quote:
Originally Posted by system7 View Post
The summary.htm file from Pipeline should have that info in it at the bottom of the file
Where is this summary.htm that people are talking about?
ashrafi_h is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:52 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO