![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Split Large FASTQ file in small FASTQ files with user defined number of reads Windows | deepbiomed | Bioinformatics | 3 | 04-04-2013 08:14 AM |
question about random select certain number of reads from ChIP-seq bed file | yaolij@gmail.com | Bioinformatics | 1 | 09-15-2012 03:13 PM |
How to replace select reads in a bam file? | Heisman | Bioinformatics | 8 | 01-02-2012 03:49 PM |
extracting reads from a large fastq file | Wallysb01 | Bioinformatics | 23 | 08-08-2011 02:43 PM |
Does bowtie randomly select a match from two equally valid alignments? | Kennels | Bioinformatics | 1 | 03-24-2011 10:12 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
Hi All,
I have an RNASeq data from which I need to randomly select 20 million reads, around 5 times. The whole file is about 200m reads. What is a way to do this? Does anyone have a script to share? Thanks. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Googling/DuckDucking might have turned up the answer you are looking for.
Regardless, check this thread : http://seqanswers.com/forums/showthread.php?t=16505 Are your reads paired ? Last edited by Richard Finney; 08-15-2013 at 09:09 AM. |
![]() |
![]() |
![]() |
#3 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
There are a few ways, some mentioned on this site and some over on biostars. One of those ought to work for you.
|
![]() |
![]() |
![]() |
#4 | |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
Yes my data is paried end. Another complication is that the two pairs are of unequal size.
du -s command gives: *_R1* = 64850642 *_R2* = 48640554 Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
You're going to want to resync them before you do anything else. Google "paired-end fastq sync" for a plethora of solutions.
|
![]() |
![]() |
![]() |
#6 |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
So I ran the following perl script:
https://bitbucket.org/jesseerdmann/f...ync-2-files.pl and it says: "passed full check" using "quick" which means the two files are in SYNC. "QUICK CHECK enabled Casava 1.8 read id style PASSED full check" But before I use random selection of reads from the two files, following your google links, shouldn't I make them equal size? As R1 is bigger than R2, even through they are in sync, I assume they are in SYNC only for the reads size that's common between them. Am I right? |
![]() |
![]() |
![]() |
#7 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
I've never seen that perl script, so I can't say that it works correctly. If you follow the instructions from this thread on biostars (Pierre's comment first, followed by Steffi's), you'll get two synchronized files of the same size.
|
![]() |
![]() |
![]() |
#8 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
I'll add, this sort of different number of reads in paired-end files issue usually only crops up when mates from a pair are trimmed separately. If that's the case here and you're the one that did the trimming, you're life will be easier if you use a different trimmer next time (trimmomatic and trim_galore are common choices).
|
![]() |
![]() |
![]() |
#9 |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
Ignore my previous msg. My files are same size when I used "du -b" command.
|
![]() |
![]() |
![]() |
#10 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
As long as they're also the same when you use "wc -l" as well then things are OK.
|
![]() |
![]() |
![]() |
Tags |
random fastq |
Thread Tools | |
|
|