SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
50 bp paired end reads vs. 100 bp single end reads efoss Bioinformatics 12 01-15-2014 09:05 PM
Can Cuffdiff treat paired-end and single-end reads at the same time? zun RNA Sequencing 3 06-12-2012 06:37 PM
Can we extract f3 reads while f5 reads are being sequenced in paired end Raa Bioinformatics 2 12-25-2011 09:46 PM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 03:06 AM

Reply
 
Thread Tools
Old 10-05-2012, 06:26 AM   #1
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default Concatenating paired end reads when there are missing reads

Hi all,

I am looking for a tool that can take Illumina fastq paired end reads (already trimmed and quality filtered, so that not all the read1 sequences are paired with read2 sequences, and vice versa), and concatenate them (by taking the reverse complement of read 2 and attaching it to the end of read1). These reads do not overlap. I'm concatenating them so that when I do database searches (e.g. BLAST), I have more information to use to determine what organism my amplicons came from (this is metagenomics work).

Does anyone have such a tool they would be willing to share? I have zero programming experience and our bioinformatician left months ago.

I've looked at ill2fastq.pl, but it seems to be designed for working with only pairs of reads, and can't handle unpaired reads.

Last edited by LizBent; 10-05-2012 at 06:27 AM. Reason: incomplete
LizBent is offline   Reply With Quote
Old 10-05-2012, 06:44 AM   #2
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

So what you need to do is:
1. Interlace your fastq files, grouping your paired ends by seq ID...i.e. cluster coordinates for read /1 and /2
2.De-interlace the paired reads
3. stitch together these two sequences.

This can easily be done on a local instance of Galaxy (I dont think the web portal has interlacer tool installed).

However, seeing as your paired ends come from opposite ends of sequence I don't see how stitching them together will help you in a BLAST search. You are creating an artificial sequence, and Genbank sequences are individual "real" fragments.
If I were you I wouldn't concatenate the PEs. Use FASTX sequence collapser in Galaxy and batch BLAST your unique reads individually.
JackieBadger is offline   Reply With Quote
Old 10-05-2012, 07:09 AM   #3
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

Hi Jackie- actually, if you BLAST the two ends of a sequence (with or without an artificial gap in the middle), you get better matches than if you BLAST just one end at a time. It is possible you'd get a nonsense window where the two ends meet, but the best overall matches would be for the longer ends that match real sequences, so those are the hits that will come out on top.

As for the solution you describe, I was rather hoping to find a script that would allow me to keep track of unpaired read1 and read2 sequences so I can use them as well.
LizBent is offline   Reply With Quote
Old 10-08-2012, 09:17 AM   #4
jbrwn
Member
 
Location: Denver, CO

Join Date: Mar 2011
Posts: 37
Default

do you know how to use the command line at all? post the first read name from each fastq and i'll try to help you out. my solution will require python.
jbrwn is offline   Reply With Quote
Old 10-08-2012, 09:57 AM   #5
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

The solution I posted tracks unpaired reads.
otherwise check out here http://sfg.stanford.edu/quality.html
Their PECombiner.sh has a bug in it...they may have updated this on the site?
If they have not ask the authors to send you the working script.
JackieBadger is offline   Reply With Quote
Old 10-09-2012, 03:26 PM   #6
jbrwn
Member
 
Location: Denver, CO

Join Date: Mar 2011
Posts: 37
Default

uses a good amount of memory since it's storing one fq in a dict, but seems to work:

https://gist.github.com/3861828

edit: you didn't say anything about preserving the quals, so this prints a fasta.

Last edited by jbrwn; 10-09-2012 at 03:28 PM.
jbrwn is offline   Reply With Quote
Old 10-10-2012, 02:17 PM   #7
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

Thanks so much, I will try it
LizBent is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:02 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO