SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Overlapping Paired End reads - questions... NRiddiford RNA Sequencing 0 07-19-2012 07:08 AM
Overlapping and non-Overlapping pair-end reads with Tophat senpeng Illumina/Solexa 4 10-16-2011 06:43 PM
Questions about overlapping paired-end reads... FredOnSeq Illumina/Solexa 6 04-18-2011 05:19 PM
How to manage overlapping paired-end reads? FredOnSeq Bioinformatics 2 09-09-2010 01:27 AM
How do variant callers deal with overlapping paired end reads? krobison Bioinformatics 1 04-30-2010 11:58 AM

Reply
 
Thread Tools
Old 09-25-2012, 01:25 PM   #1
karenr
Junior Member
 
Location: Delaware, USA

Join Date: Sep 2012
Posts: 4
Default Merging non-overlapping paired end reads

I'm looking to assemble some paired-end reads, but I'm having some problems getting the ends back together before assembly. I have 105 bp reads, from either end of 300 bp fragments - so there's 90 bp of 'space' in between, with no overlap. I've seen lots of programs dealing with overlapping reads, but is there anything out there that will account for that 'gap'?

Thanks!
karenr is offline   Reply With Quote
Old 09-25-2012, 01:30 PM   #2
ugolino
Member
 
Location: maryland, usa

Join Date: Oct 2011
Posts: 14
Default

I don't understand what use will those reads be to you once you merge them? The positional information, ie 90 bp in between, is also critical to the assembly. Why do you want to merge them?
ugolino is offline   Reply With Quote
Old 09-25-2012, 01:37 PM   #3
karenr
Junior Member
 
Location: Delaware, USA

Join Date: Sep 2012
Posts: 4
Default

That's the idea - I'd like to preserve the positional information, but from what I understand, most assembly programs require a single input file, hence the merging. So I'd like to put the end back together, with that 90 bp unsequenced portion in the middle.
karenr is offline   Reply With Quote
Old 09-25-2012, 01:41 PM   #4
ugolino
Member
 
Location: maryland, usa

Join Date: Oct 2011
Posts: 14
Default

this is a perl script from velvet's contrib folder and will do what you need so long as the reads from two separate files are paired.

shuffleSequences_fastq.pl

#!/usr/bin/perl

$filenameA = $ARGV[0];
$filenameB = $ARGV[1];
$filenameOut = $ARGV[2];

open $FILEA, "< $filenameA";
open $FILEB, "< $filenameB";

open $OUTFILE, "> $filenameOut";

while(<$FILEA>) {
print $OUTFILE $_;
$_ = <$FILEA>;
print $OUTFILE $_;
$_ = <$FILEA>;
print $OUTFILE $_;
$_ = <$FILEA>;
print $OUTFILE $_;

$_ = <$FILEB>;
print $OUTFILE $_;
$_ = <$FILEB>;
print $OUTFILE $_;
$_ = <$FILEB>;
print $OUTFILE $_;
$_ = <$FILEB>;
print $OUTFILE $_;
}
ugolino is offline   Reply With Quote
Old 09-26-2012, 08:27 AM   #5
karenr
Junior Member
 
Location: Delaware, USA

Join Date: Sep 2012
Posts: 4
Default

That puts the two files together in that it orders the paired ends together - say, end 1, then its pair, end 2, then its pair, end 3, then its pair, etc.. But wouldn't that still lose the positional information?
karenr is offline   Reply With Quote
Old 09-26-2012, 09:36 AM   #6
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,147
Default

Quote:
Originally Posted by karenr View Post
I'm looking to assemble some paired-end reads, but I'm having some problems getting the ends back together before assembly. I have 105 bp reads, from either end of 300 bp fragments - so there's 90 bp of 'space' in between, with no overlap. I've seen lots of programs dealing with overlapping reads, but is there anything out there that will account for that 'gap'?

Thanks!
karenr,

One would never do something like this with paired data. The first and most important reason is that your library IS NOT made up of 300bp fragments. It is made up of a population of fragments with an AVERAGE size of 300bp. The distribution of sizes may be narrow or wide depending on the particulars of the library preparation protocol. You can not know a priori what the size is of any individual fragment which produced a pair of reads, hence would have know way to determine how large a gap to insert between them.

All short read mappers or de novo assemblers understand that the distance between read pairs will fall within a distribution. Some programs expect you to provide an insert size average and insert size standard deviation as command line parameters when you launch the program, and some will determine the distribution empirically from a sample of your data. You will need to read the documentation of the software you plan to use.

The other matter is whether the software expects paired reads to be supplied as two separate files or as a single file. Again this is program specific and you need to read the documentation. [The script posted by ugolino above is actually the shuffleSequences_fastq.pl script from the velvet package which is intended to create a single, interleaved read file for input to velvet from two separate input files.]
kmcarr is offline   Reply With Quote
Old 12-12-2016, 03:22 PM   #7
dhtaft
Junior Member
 
Location: Davis, CA

Join Date: Dec 2016
Posts: 3
Default

I know this thread has been inactive for a long time, but I am in a situation where I need to "merge" non-overlapping paired-end reads and preserve the positional information. Basically, I have a genus specific bacteria PCR of a region where I should be able to sort out the species present from the sequence information. Unfortunately, the amplicon is 630 bp, and I'm using Illumina MiSeq 250 bp PE for sequencing. I'd like to take the forward read, insert 180 Ns, and then take the reverse read prior to aligning to my database... Unfortunately, I haven't found any good ways to do this, and am a bit limited in my programming skills. Does anyone have any suggestions on how I could do this?

Thanks
dhtaft is offline   Reply With Quote
Old 12-12-2016, 05:28 PM   #8
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

You can do that with the BBMap package like this:

Code:
fuse.sh in1=r1.fq in2=r2.fq pad=130 out=fused.fq fusepairs
It will automatically reverse-complement read 2. Given that you stated 2x250 and 630bp insert, I'm assuming that the pad amount should be (630-2*250)=130bp, even though you mentioned "180 Ns", so adjust that as necessary.

Even though I wrote a tool for this specific purpose, it seems like kind of an odd use-case. What will you do with the merged reads?
Brian Bushnell is offline   Reply With Quote
Old 12-13-2016, 09:16 AM   #9
dhtaft
Junior Member
 
Location: Davis, CA

Join Date: Dec 2016
Posts: 3
Default

Thanks! There are barcodes and primers I have to trim before considering my insert size, and those take ~50 base pairs that I discard.

I'm trying to sort out the percentage of different species using genus specific primers. The forward read lets me sort out a chunk of the different species, but makes a tangle messed of a different. The reverse read is able to sort out the species that are a tangled mess from the forward read, but on its own can't separate everything either. I am hoping that by joining the two reads, I'll be able to sort out the full set of species I'm interested in... But I was worried that just concatenating the two reads would result in very odd results when comparing to my reference database due to the large size of the gap and the high penalty the aligner I'm using gives to long gaps.
dhtaft is offline   Reply With Quote
Old 12-16-2016, 06:02 PM   #10
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Of course, there are always other aligners that don't give large penalties to large gaps But yes, that sounds like a potentially good solution. Simple concatenation without reverse-complementation would be a very bad idea, but as long as your aligner does not penalize Ns, your approach sounds fine. I'd be interested in hearing your results.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO