![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Overlapping and non-Overlapping pair-end reads with Tophat | senpeng | Illumina/Solexa | 4 | 10-16-2011 07:43 PM |
Determine paired end overlapping | chariko | Bioinformatics | 2 | 04-29-2011 12:52 AM |
Questions about overlapping paired-end reads... | FredOnSeq | Illumina/Solexa | 6 | 04-18-2011 06:19 PM |
How to manage overlapping paired-end reads? | FredOnSeq | Bioinformatics | 2 | 09-09-2010 02:27 AM |
How do variant callers deal with overlapping paired end reads? | krobison | Bioinformatics | 1 | 04-30-2010 12:58 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
Hi,
I have a paired end (2x75) Illumina data set that might have overlap at the ends. The fragment size selected was 240 and after subtracting adapter/primer sequences, there was about 120 bp left, which generated about 30bp overlap at the ends. My questions are: 1) is this going to affect tophat alignment ? how should the -m option be specified? 2) when counting coverage, my intuition is that those overlapping bases might be counted twice, while they only appear in the library once, is there any way to get around this? 3) is this going to affect cufflinks transcript assembly and quantitation? Thanks for your help! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
I don't know how TopHat reacts to it but I can already tell you that Bowtie won't like it, and hence Tophat will fail, too.
I'm currently working with a similar data set and noted that Bowtie fails to find an alignment for an overlapping paired read (and so does Eland). I ended up aligning the two ends separately and then stitching things together manually. Of course, this is not an ideal solution. Simon |
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]() Quote:
samtools merge?
__________________
http://kevin-gattaca.blogspot.com/ |
|
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]() Quote:
Since there is a 30 bp overlap. they should assemble into a single read quite nicely. so you end up with a 120 bp SE data.
__________________
http://kevin-gattaca.blogspot.com/ |
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
My alignment did not seem to have too much problem. Here is just a sample of the first few alignments. It appeared to me that the two reads were processed separately, but I am not so sure about that.
HWUSI-EAS787_0001:5:70:1610:809#AAATAG 99 chr1 5312 255 81M = 5366 0 GCGAGGAAAGAAATGCACTAAGTAAAAAACTTAGTCATTTTTTAAAGAGAATTAAAATGAAGTCCAATTCCTTTGAGTTAC HGHHI HHHGHHHGGGHHHHHHHHIHHHGHFHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHEHHFHEHGHHG NM:i:0 HWUSI-EAS787_0001:5:70:1610:809#AAATAG 147 chr1 5366 255 81M = 5312 0 AAATGAAGTCCAATTCCTTTGAGTTACAAATTTACAATCACTACTCAGTAATTAAAACTATTCAGTTATAGTGAACTGATT IHFHH IHBGHHHHHGHHFEHHHHHHHHHHHHHHHHHHHHEHHGHHHHHHHHHHHHGGHHHHHHHHHHIHHHHHHGHHHHHH NM:i:0 HWUSI-EAS787_0001:5:30:1504:1763#TTGTCG 163 chr1 5822 255 81M = 5860 0 CCAGAGCCCACAGCTTACTTTTGGTGGTACCCATCCTAAGGGTCTGGGCAAACATATAACGATAAATGTCCATCATTATAA HHGHH GGFHHHHHHHHHEHHHHHHHHHHHEHHGHDEGHHHHHBBBGGG7FHH2HEHBHH0FHEFHC+?6><CC-CEDDBA@ NM:i:0 HWUSI-EAS787_0001:5:30:1504:1763#TTGTCG 83 chr1 5860 255 81M = 5822 0 AGGGTCTGGGCAAACATATAACGATAAATGTCCATCATTATAATATCACACAGAGTAGTTTCACTGCCCTGAAACTCTTTT G@CBF HE?G=HHGIHHHHGHGHBHGHHHEGHDHHGHHFFHHHHHHHHHHGHHGHGFHCHHGHHHHFHHHHHHHHHHHHHHH NM:i:0 Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
I think this is a decent solution. Many of my reads suffered from bad quality at the end though. Can you recommend a type of tools that might do this job ? Thanks!
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]()
I only know phrap which can do this but if applied to so many reads I am not sure how long it will take.
__________________
http://kevin-gattaca.blogspot.com/ |
![]() |
![]() |
![]() |
#8 | |
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: Madison, WI Join Date: Jul 2008
Posts: 6
|
![]()
Here are more details about Wen's run which was 2x75.
The minimum fragment size, including flanking adapters is 150 bp. Thus fragments with the smallest insert could be diagrammed like this with 32 bases of overlapping cDNA [adapter:59][cDNA 32][adapter:59] o~~~~~~~~~~~> (with 43bp of adapter) <~~~~~~~~~~~~o I am assuming, however that reads this short would fail to map because of the high proportion of adapter-derived sequences embedded in the reads. These considerations lead me to the following questions: 1) Does the negative inner distance of, for example, -30 reflect an expected mean of 30 bp of overlap or does it specify a maximum amount of overlap. Afterall, most of Wen's reads don't overlap and the overlap could be as high as a full 75bp for a 193bp fragment. If I were to calculate the actual mean inner distance taking overlaps as having negative distances, the overall mean might well turn out to be positive. 2) If we were to trim the adapters this would invariably lead to a distribution of read lengths rather than a uniform 75 bases. Can Bowtie and TopHat deal with unequal read lengths or is this likely to be a problem? |
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Madison, WI Join Date: Jul 2008
Posts: 6
|
![]()
Here is how the diagram from my previous posting should look (with dots replacing whitespace). Sorry for the confusion.
[adapter:59][cDNA 32][adapter:59] .............................o~~~~~~~~~~~> (with 43bp of adapter) ...........<~~~~~~~~~~~~o |
![]() |
![]() |
![]() |
#11 | |
Member
Location: california Join Date: Jul 2009
Posts: 24
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#12 | |
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
![]() Quote:
To answer a previous question - TopHat will not handle reads of different lengths gracefully, so if you make "virtual" long reads from overlapping mates, make sure to trim the products down to a uniform length. |
|
![]() |
![]() |
![]() |
#13 |
Junior Member
Location: Gainesville, FL Join Date: Sep 2009
Posts: 8
|
![]()
I had to edit this post. I wrote a program that assembles overlapping paired ends from illumina. It used to be public but now it's private because I want to do a paper on it.
If you want a copy, you can e-mail me and I'll send it to you. I tested it on 1.5 million reads that overlapping ~25 bp and it assembled about 78% into larger contigs which can then be de novo assembled. In the overlapping region, it chooses the nucleotide with the best quality score (if there is a discrepancy). If the there is a discrepancy and the quality scores are the same it chooses the appropriate ambiguous nucleotide. Last edited by ACTGangster; 07-24-2010 at 06:26 PM. Reason: makebettered |
![]() |
![]() |
![]() |
#14 |
(Jeremy Leipzig)
Location: Philadelphia, PA Join Date: May 2009
Posts: 116
|
![]()
I uploaded a python script I wrote for this to SVAR:
http://code.google.com/p/standardize.../mergePairs.py |
![]() |
![]() |
![]() |
#15 |
Junior Member
Location: Gainesville, FL Join Date: Sep 2009
Posts: 8
|
![]()
I open-sourced my Stitch program as I do not plan on writing a paper on it specifically.
http://github.com/audy/stitch It runs on as many cores as you have. I did 20 million reads in 40 minutes on a 16-core mac pro. |
![]() |
![]() |
![]() |
#16 |
Member
Location: usa Join Date: May 2010
Posts: 18
|
![]()
Iam trying to use stitch but got below error : Any suggestions?
$ stitch Traceback (most recent call last): File "/usr/bin/stitch", line 7, in ? sys.exit( File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-py2.4.egg/pkg_resources.py", line 318, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-py2.4.egg/pkg_resources.py", line 2221, in load_entry_point return ep.load() File "/usr/lib/python2.4/site-packages/setuptools-0.6c11-py2.4.egg/pkg_resources.py", line 1954, in load entry = __import__(self.module_name, globals(),globals(), ['__name__']) File "/usr/lib64/python2.4/site-packages/PIL/__init__.py", line 1, in ? # File "build/bdist.linux-x86_64/egg/stitch/stitch.py", line 13, in ? ImportError: No module named multiprocessing |
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: Gainesville, FL Join Date: Sep 2009
Posts: 8
|
![]()
ImportError: No module named multiprocessing
What version of python are you using? What operating system? |
![]() |
![]() |
![]() |
#18 |
Member
Location: usa Join Date: May 2010
Posts: 18
|
![]()
@ACTGangster
using python2.4 on centos5.5 |
![]() |
![]() |
![]() |
#19 |
Junior Member
Location: Gainesville, FL Join Date: Sep 2009
Posts: 8
|
![]()
You need python 2.6 or greater.
|
![]() |
![]() |
![]() |
#20 |
Member
Location: usa Join Date: May 2010
Posts: 18
|
![]()
another error with python2.7
$ sudo python2.7 setup.py install Traceback (most recent call last): File "setup.py", line 9, in <module> setup( NameError: name 'setup' is not defined |
![]() |
![]() |
![]() |
Thread Tools | |
|
|