Seqanswers Leaderboard Ad

**JackieBadger** · 10-05-2012, 05:44 AM

So what you need to do is:
1. Interlace your fastq files, grouping your paired ends by seq ID...i.e. cluster coordinates for read /1 and /2
2.De-interlace the paired reads
3. stitch together these two sequences.

This can easily be done on a local instance of Galaxy (I dont think the web portal has interlacer tool installed).

However, seeing as your paired ends come from opposite ends of sequence I don't see how stitching them together will help you in a BLAST search. You are creating an artificial sequence, and Genbank sequences are individual "real" fragments.
If I were you I wouldn't concatenate the PEs. Use FASTX sequence collapser in Galaxy and batch BLAST your unique reads individually.

**LizBent** · 10-05-2012, 06:09 AM

Hi Jackie- actually, if you BLAST the two ends of a sequence (with or without an artificial gap in the middle), you get better matches than if you BLAST just one end at a time. It is possible you'd get a nonsense window where the two ends meet, but the best overall matches would be for the longer ends that match real sequences, so those are the hits that will come out on top.

As for the solution you describe, I was rather hoping to find a script that would allow me to keep track of unpaired read1 and read2 sequences so I can use them as well.

**jbrwn** · 10-08-2012, 08:17 AM

do you know how to use the command line at all? post the first read name from each fastq and i'll try to help you out. my solution will require python.

**JackieBadger** · 10-08-2012, 08:57 AM

The solution I posted tracks unpaired reads.
otherwise check out here http://sfg.stanford.edu/quality.html
Their PECombiner.sh has a bug in it...they may have updated this on the site?
If they have not ask the authors to send you the working script.

**jbrwn** · 10-09-2012, 02:26 PM

uses a good amount of memory since it's storing one fq in a dict, but seems to work:

join paired-end or print unique single-end reads

https://gist.github.com/3861828

join paired-end or print unique single-end reads. GitHub Gist: instantly share code, notes, and snippets.

edit: you didn't say anything about preserving the quals, so this prints a fasta.

**LizBent** · 10-10-2012, 01:17 PM

Thanks so much, I will try it

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 27 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 43 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 29 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Concatenating paired end reads when there are missing reads

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News