SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract fastq files of unaligned reads with Bowtie 2 Mad4Seq Bioinformatics 4 06-19-2013 09:53 PM
Splitting a BAM based on # of reads? kga1978 Bioinformatics 6 05-02-2013 09:55 AM
TopHat internals (splitting longer reads) ocs Bioinformatics 5 08-11-2011 01:15 AM
splitting 454 reads into kmers for diff expression Jeremy RNA Sequencing 0 01-18-2011 06:17 PM
Splitting 454 paired reads in a FASTQ file sjackman Bioinformatics 5 09-10-2010 11:09 AM

Reply
 
Thread Tools
Old 11-04-2010, 03:51 AM   #1
JayM
Junior Member
 
Location: Cape Town, South Africa

Join Date: Nov 2010
Posts: 4
Unhappy Splitting concatenated PE fastq file to two files for the respect reads

I have a fastq file that is read1 and read2 split, concatenated and shuffled to form 1 file. However, I need it as read1 and read2 files separate for bwa alignment, does anyone know how to do this?
I don't have the GERALD file, this processed fastq file came from a core facility so I'm stuck at this point. Can anyone help?

Last edited by JayM; 11-04-2010 at 04:24 AM.
JayM is offline   Reply With Quote
Old 11-04-2010, 10:58 AM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

You might be able to grep it out. Something like:

grep -A 3 pattern_that_is_only_in_read_1_sample_name combined_file.fq > read1.fq
swbarnes2 is offline   Reply With Quote
Old 11-04-2010, 11:40 AM   #3
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Assuming single.

Code:
$ cat ./single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[1234]/; @i = 0 if @i == 8' > one.fq && cat single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[5678]/; @i = 0 if @i == 8' > two.fq
Also, you may want to check the fastx package. It should include that feature.
__________________
-drd
drio is offline   Reply With Quote
Old 11-05-2010, 12:13 AM   #4
JayM
Junior Member
 
Location: Cape Town, South Africa

Join Date: Nov 2010
Posts: 4
Default

Quote:
Originally Posted by drio View Post
Assuming single.

Code:
$ cat ./single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[1234]/; @i = 0 if @i == 8' > one.fq && cat single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[5678]/; @i = 0 if @i == 8' > two.fq
Also, you may want to check the fastx package. It should include that feature.
I take it 'assume single' here refers to assume single [input] file with read1 and read2.
JayM is offline   Reply With Quote
Old 11-05-2010, 02:34 AM   #5
JayM
Junior Member
 
Location: Cape Town, South Africa

Join Date: Nov 2010
Posts: 4
Default

Quote:
Originally Posted by swbarnes2 View Post
You might be able to grep it out. Something like:

grep -A 3 pattern_that_is_only_in_read_1_sample_name combined_file.fq > read1.fq
But how do you grep for read1 and not read2 from a paired end fastq given that essentially the whole name is identical except one character at the end and there are millions of such scenarios in the file...?
I'm just thinking about which pattern that could be.
JayM is offline   Reply With Quote
Old 11-05-2010, 02:58 AM   #6
JayM
Junior Member
 
Location: Cape Town, South Africa

Join Date: Nov 2010
Posts: 4
Default

Quote:
Originally Posted by drio View Post
Assuming single.

Code:
$ cat ./single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[1234]/; @i = 0 if @i == 8' > one.fq && cat single.fq | ruby -ne 'BEGIN{@i=0} ; @i+=1; puts $_  if @i.to_s =~ /[5678]/; @i = 0 if @i == 8' > two.fq
Also, you may want to check the fastx package. It should include that feature.
Wow! Thanks, it worked and an arbitrary inspection of the respective reads seems to confirm a perfect split into read1 and read2.
JayM is offline   Reply With Quote
Reply

Tags
split fastq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO