SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Split Large FASTQ file in small FASTQ files with user defined number of reads Windows deepbiomed Bioinformatics 3 04-04-2013 08:14 AM
question about random select certain number of reads from ChIP-seq bed file yaolij@gmail.com Bioinformatics 1 09-15-2012 03:13 PM
How to replace select reads in a bam file? Heisman Bioinformatics 8 01-02-2012 03:49 PM
extracting reads from a large fastq file Wallysb01 Bioinformatics 23 08-08-2011 02:43 PM
Does bowtie randomly select a match from two equally valid alignments? Kennels Bioinformatics 1 03-24-2011 10:12 PM

Reply
 
Thread Tools
Old 08-15-2013, 08:49 AM   #1
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Default how to randomly select 20m reads out of a FASTQ file

Hi All,

I have an RNASeq data from which I need to randomly select 20 million reads, around 5 times. The whole file is about 200m reads.

What is a way to do this? Does anyone have a script to share? Thanks.
angerusso is offline   Reply With Quote
Old 08-15-2013, 09:01 AM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Googling/DuckDucking might have turned up the answer you are looking for.

Regardless, check this thread : http://seqanswers.com/forums/showthread.php?t=16505

Are your reads paired ?

Last edited by Richard Finney; 08-15-2013 at 09:09 AM.
Richard Finney is offline   Reply With Quote
Old 08-15-2013, 09:09 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

There are a few ways, some mentioned on this site and some over on biostars. One of those ought to work for you.
dpryan is offline   Reply With Quote
Old 08-15-2013, 09:14 AM   #4
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Default

Yes my data is paried end. Another complication is that the two pairs are of unequal size.

du -s command gives:
*_R1* = 64850642
*_R2* = 48640554

Quote:
Originally Posted by Richard Finney View Post
Googling/DuckDucking might have turned up the answer you are looking for.

Regardless, check this thread : http://seqanswers.com/forums/showthread.php?t=16505

Are your reads paired ?
angerusso is offline   Reply With Quote
Old 08-15-2013, 09:17 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You're going to want to resync them before you do anything else. Google "paired-end fastq sync" for a plethora of solutions.
dpryan is offline   Reply With Quote
Old 08-15-2013, 12:02 PM   #6
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Default

So I ran the following perl script:

https://bitbucket.org/jesseerdmann/f...ync-2-files.pl

and it says: "passed full check" using "quick" which means the two files are in SYNC.

"QUICK CHECK enabled
Casava 1.8 read id style
PASSED full check"

But before I use random selection of reads from the two files, following your google links, shouldn't I make them equal size? As R1 is bigger than R2, even through they are in sync, I assume they are in SYNC only for the reads size that's common between them. Am I right?

Quote:
Originally Posted by dpryan View Post
You're going to want to resync them before you do anything else. Google "paired-end fastq sync" for a plethora of solutions.
angerusso is offline   Reply With Quote
Old 08-15-2013, 12:10 PM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I've never seen that perl script, so I can't say that it works correctly. If you follow the instructions from this thread on biostars (Pierre's comment first, followed by Steffi's), you'll get two synchronized files of the same size.
dpryan is offline   Reply With Quote
Old 08-15-2013, 12:13 PM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I'll add, this sort of different number of reads in paired-end files issue usually only crops up when mates from a pair are trimmed separately. If that's the case here and you're the one that did the trimming, you're life will be easier if you use a different trimmer next time (trimmomatic and trim_galore are common choices).
dpryan is offline   Reply With Quote
Old 08-15-2013, 12:24 PM   #9
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Default

Ignore my previous msg. My files are same size when I used "du -b" command.
angerusso is offline   Reply With Quote
Old 08-15-2013, 12:26 PM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

As long as they're also the same when you use "wc -l" as well then things are OK.
dpryan is offline   Reply With Quote
Reply

Tags
random fastq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:30 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO