SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
50 bp paired end reads vs. 100 bp single end reads efoss Bioinformatics 12 01-15-2014 08:05 PM
Can Cuffdiff treat paired-end and single-end reads at the same time? zun RNA Sequencing 3 06-12-2012 05:37 PM
Can paired-end mapping produce more reads than single-end ? warrenemmett Bioinformatics 13 03-20-2012 11:10 PM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM

Reply
 
Thread Tools
Old 04-19-2011, 11:21 PM   #1
suludana
Member
 
Location: Spain

Join Date: May 2008
Posts: 61
Question Overlap of paired-end reads

Hi,
we are starting to analyze a PE 100 run of a resequencing project. Unfortunately the library is too short and the majority of paired-end reads overlap. In this study we are interested in SNPs, INDELs and big rearrangements.
What could be the best option? Cut the 100 bp fragments and leave them in 75 bp (for example) or using the 100 bp overlapping reads? Do the advantages of the PE get lost if the reads are overlapped?
Thanks in advance for your help.
suludana is offline   Reply With Quote
Old 04-20-2011, 05:11 AM   #2
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by suludana View Post
Hi,
we are starting to analyze a PE 100 run of a resequencing project. Unfortunately the library is too short and the majority of paired-end reads overlap. In this study we are interested in SNPs, INDELs and big rearrangements.
What could be the best option? Cut the 100 bp fragments and leave them in 75 bp (for example) or using the 100 bp overlapping reads? Do the advantages of the PE get lost if the reads are overlapped?
They should work fine as is, since you're using them for alignment.

If it was for denovo, i'd suggest to merge them into longer single ended reads (at least the ones which have a single strong overlap), but i'm not sure there's any advantage to this for alignment.

You should definitely check for adapter though, there is a fine line between a 'clean' overlapped read, and a read which is going into the 'opposite' adapter at the end.
tonybolger is offline   Reply With Quote
Old 04-20-2011, 05:38 AM   #3
niceday
Member
 
Location: cambridge

Join Date: Apr 2010
Posts: 68
Default

for rearrangements it better to have a gap in the pe sequences. bigger the gap the bigger the sequence product with both ends sequenced. If you were looking for bigger rearrangements mate pair would be a better bet.
niceday is offline   Reply With Quote
Old 04-20-2011, 05:39 AM   #4
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Definitely don't trim them. If you're looking for SNPs, any that you find that lie within the overlapping region will have been sequenced twice for that fragment, improving your accuracy.
pbluescript is offline   Reply With Quote
Old 04-20-2011, 05:59 AM   #5
honey
Senior Member
 
Location: Pittsburgh

Join Date: Feb 2010
Posts: 151
Default No problem with overalp

I will also second (rather third) the approach that no need to trip rather the overalp will increase chances of accuracy
honey is offline   Reply With Quote
Old 04-20-2011, 06:03 AM   #6
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by niceday View Post
for rearrangements it better to have a gap in the pe sequences. bigger the gap the bigger the sequence product with both ends sequenced. If you were looking for bigger rearrangements mate pair would be a better bet.
All true. However the the gap needs to be in the actual library and not artificially created in the data set via throwing away part of the reads. Which is what the original poster is suggesting.

I suppose that there might be some software which will work better with shorten reads but I would be concerned about getting false positives due to potentially poorer mapping with the shorter reads.
westerman is offline   Reply With Quote
Old 04-20-2011, 06:17 AM   #7
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by niceday View Post
for rearrangements it better to have a gap in the pe sequences. bigger the gap the bigger the sequence product with both ends sequenced. If you were looking for bigger rearrangements mate pair would be a better bet.
Agreed, longer pairs would help (though mate pairs are a pain on several levels).

But given that the library is already sequenced, i think the best results with the data will be using it as is.
tonybolger is offline   Reply With Quote
Old 04-20-2011, 06:58 AM   #8
suludana
Member
 
Location: Spain

Join Date: May 2008
Posts: 61
Default

Thanks for all your comments.
If I were to look only SNPs, I have no doubt: I will use the overlapping reads of 100 bp. But I am interested also in big INDELs and rearrangements. In this case a real PE (with a gap between reads) would be better, right?
suludana is offline   Reply With Quote
Old 04-20-2011, 07:23 AM   #9
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by suludana View Post
But I am interested also in big INDELs and rearrangements. In this case a real PE (with a gap between reads) would be better, right?
Yes, you'll need to resequence with a longer paired-ends or even mate pairs (if you have to find nasty rearrangements bordered by long repeats).
tonybolger is offline   Reply With Quote
Old 04-26-2011, 11:59 PM   #10
suludana
Member
 
Location: Spain

Join Date: May 2008
Posts: 61
Default

Thanks for your comments, but my question is: Do I lose the advantages of the PE if the reads overlap?
suludana is offline   Reply With Quote
Old 04-27-2011, 12:14 AM   #11
honey
Senior Member
 
Location: Pittsburgh

Join Date: Feb 2010
Posts: 151
Default

What is the advantage of PE---
It will depend on whether your questions are answered or not. Standard PE benefits will be there, However can you accurately study rearrangment?
honey is offline   Reply With Quote
Old 04-27-2011, 01:02 AM   #12
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by suludana View Post
Thanks for your comments, but my question is: Do I lose the advantages of the PE if the reads overlap?
It's not so much a question of overlap as a question of paired distance.

Longer PE distance increases the likelihood of a given read pair spanning a given re-arrangement. More spanning read pairs 'disagreeing' when aligned to the reference within a specific area increases the confidence that something 'interesting' is going on there.

In the simple case, you'll see a pile of 'unhappy' pairs (which should span the region of the re-arrangement), and a lack of alignment / low agreement with consensus at the borders of the re-arrangement. Without PE, you'd just get the latter. And if it's a repeat rich region, you might not even get that - hence the importance of having long paired data as an indicator in such regions.
tonybolger is offline   Reply With Quote
Old 04-27-2011, 08:11 AM   #13
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

Identification of indels by many variant callers requires that the novel junction lie in the unsequenced region, so that the end reads can be accurately mapped to the reference. All indel-containing reads that overlap will not meet this criterion. An anchored split-read mapper, such as Pindel, will be required for the overlapping reads. Alternatively, you can create an artificial 'gap' by aligning some portion (say, the first 50bp) of each end, then screening the data for ends that map aberrantly (too far apart, or to different chromosomes). As a bonus, the novel junction will be present in the gap, so you can identify it at base-pair resolution.
HESmith is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO