Unconfigured Ad

**dawe** · 12-09-2010, 12:26 PM

Originally posted by Protaeus View Post

In some examples that I've read for using bwa to analyze paired end data, a fastq for each member of the pair is included (in other words, R1.fastq and R2.fastq). Will bwa handle paired end data that is in a single fastq? The reads are denoted with \1 and \2.

AFAIK no, it won't. You may separate reads into two different files, I guess with

Code:

$ grep -A2 ^@*1 filein.fq > reads_1.fq
$ grep -A2 ^@*2 filein.fq > reads_2.fq

d

**kmcarr** · 12-09-2010, 02:24 PM

Originally posted by dawe View Post

AFAIK no, it won't. You may separate reads into two different files, I guess with

Code:

$ grep -A2 ^@*1 filein.fq > reads_1.fq
$ grep -A2 ^@*2 filein.fq > reads_2.fq

d

Not quite. First, FASTQ sets are four lines long so you have to collect the matched line and the 3 following (-A3). Your regular expression means "match 0 or more "@" at the beginning of a line, followed by a 1 (or 2). You need to specify an "@" followed by 0 or more of any character (.*). You are also not anchoring the 1 or 2 to the end of the line. Finally need to enclose the regular expression in quotes. To get what you intended it should be:

Code:

$ grep -A3 ^"@.*1"$ filein.fq > reads_1.fq
$ grep -A3 ^"@.*2"$ filein.fq > reads_2.fq

There is however a hidden gotcha in this method. @, 1 and 2 are valid characters for the quality string if the FASTQ is Sanger (or Illumina prior to 1.5). This means that your grep could match a quality string and then write it and the next three lines as a FASTQ block. This will cause whatever program was trying to parse this to puke (from personal experience).

In a random FASTQ file of ~20m reads I found 511 quality strings which were matched by these grep patterns. An incredibly small fraction to be sure but you need one to screw up your FASTQ file.

**maubp** · 12-09-2010, 03:14 PM

For the reasons kmcarr gives (and other issues like this), personally I'd use a simple script using Biopython, BioPerl or similar rather than grep.

**dawe** · 12-09-2010, 03:28 PM

Originally posted by maubp View Post

For the reasons kmcarr gives (and other issues like this), personally I'd use a simple script using Biopython, BioPerl or similar rather than grep.

I wrote the wrong grep expression, my bad. Indeed I used to grep @XXXX where XXXX is my machine ID for most of the operations... Also, bwa doesn't use quality for alignment (so it will work with A1 or A3).
Nevertheless, I believe grep is much faster than any bioperl/biopython script.

d

**barak** · 11-09-2013, 11:42 PM

Hi. Just found this post in the GATK forum: http://gatkforums.broadinstitute.org...o-fastq-format
Essentially, you can use BWA with interleaved BAM files containing info from both pairs. I know that was not exactly the question, but it is related, and hopefully will save time for some (as with my case).

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 100 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 121 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 113 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

paired end fastq format in bwa

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News