Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
random subset paired-end fastq dnusol Bioinformatics 15 04-17-2016 03:36 AM
fastq-dump and paired end reads moritz Bioinformatics 3 01-09-2014 02:57 AM
Fastq: Paired end reads and mapping cedance Bioinformatics 7 06-18-2011 01:33 PM
how to identify paired end from qseq or fastq zhaowei Bioinformatics 1 02-02-2011 01:46 PM
Why are Illumina paired-end SRA datasets made up of 3 FASTQ files? Bio.X2Y Illumina/Solexa 9 12-21-2010 12:36 PM

Thread Tools
Old 01-05-2012, 12:31 PM   #1
Junior Member
Location: USA

Join Date: Dec 2011
Posts: 5
Default Illumina Paired End FASTQ

Hi guys,

I'm very new to this. I have a paired end data set from HiSeq1000. I want to take the first 10,000 or 100,000 reads out of ~40mil reads to use for tests rather than putting the entire 40 mil reads through the tests.

What is the easiest way to generate files of only the first 100,000 reads??

kjsalimian is offline   Reply With Quote
Old 01-05-2012, 01:18 PM   #2
Senior Member
Location: San Diego

Join Date: May 2008
Posts: 912

I'm not really familar with HiSeq data foramts, but there are probably some kind of coordinates, maybe by tile. If the data below from the checked answer is correct, you could use a grep to get only the entries with '[sequencer name]_[run]:[lane]:10:'. That's be a fairly random.

Or simpler, head -400000 to get the top 1000k reads?

The ones in the very beginning are probably at the edge of the flow-cell, and will have more bad quality reads. Pulling from the middle might get you more good reads.
swbarnes2 is offline   Reply With Quote
Old 01-05-2012, 01:19 PM   #3
Devon Ryan
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480

Have a look at this thread. You'll often find that the first batch of reads in a file are crap (at least from some machines), so you're better off randomly selecting them.
dpryan is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 09:24 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO