SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
COSMIC vcf file contains "N" & "." characters in ALT column vipul jain Bioinformatics 1 09-03-2015 12:16 AM
glimmer .predict file: "from" larger than "to" on + strand shany Bioinformatics 0 05-27-2014 02:21 PM
fastq file with "." in sequence line zinky Bioinformatics 3 01-01-2013 01:34 PM
Relatively large proportion of "LOWDATA", "FAIL" of FPKM_status running cufflink ruben6um Bioinformatics 3 10-12-2011 12:39 AM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 07:55 AM

Reply
 
Thread Tools
Old 08-07-2017, 05:26 AM   #1
cement_head
Senior Member
 
Location: Oxford, Ohio

Join Date: Mar 2012
Posts: 185
Exclamation How to "chop" a large FASTQ PE file in half?

Hello,

I hesitate to use the following terms: "split", "partition", "trim" - because they all have special connotations.

What I'd like to do is to take a large FASTQ file of PE reads and cut the file into two approximately equal halves. However, I want to do it in such a manner that a given file does not have only one half of a pair of a PE read group. In other words, I want to insure that when the file is halved, or split, that no PE reads are separated.

For example, if I have a file with 11 PE reads, I do NOT want 5 reads in one file and 6 reads in another.

Will FASTQ Splitter work for what I want?

- Andor
cement_head is offline   Reply With Quote
Old 08-07-2017, 05:49 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

Quote:
Originally Posted by cement_head View Post
Hello,

For example, if I have a file with 11 PE reads, I do NOT want 5 reads in one file and 6 reads in another.

- Andor
How would you split a PE dataset otherwise? I assume your PE reads are in two separate files (R1/R2) and the split would have to split both files?
GenoMax is offline   Reply With Quote
Old 08-07-2017, 07:29 AM   #3
cement_head
Senior Member
 
Location: Oxford, Ohio

Join Date: Mar 2012
Posts: 185
Default

Quote:
Originally Posted by GenoMax View Post
How would you split a PE dataset otherwise? I assume your PE reads are in two separate files (R1/R2) and the split would have to split both files?
No, there are in ONE file, PE reads, RAW FASTQ.
cement_head is offline   Reply With Quote
Old 08-07-2017, 08:11 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

So your paired-end reads are interleaved? Why not use "split -n 2" to divide the original into to two parts. If you have an odd number of fastq records then using an explicit "split -l ((n+1)/2*4)" may be better (n = number of fastq records).

Last edited by GenoMax; 08-07-2017 at 08:28 AM.
GenoMax is offline   Reply With Quote
Reply

Tags
fastq splitter, partition, pe reads, split

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO