SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
random subset paired-end fastq dnusol Bioinformatics 15 04-17-2016 02:36 AM
fastq-dump and paired end reads moritz Bioinformatics 3 01-09-2014 01:57 AM
Fastq: Paired end reads and mapping cedance Bioinformatics 7 06-18-2011 12:33 PM
how to identify paired end from qseq or fastq zhaowei Bioinformatics 1 02-02-2011 12:46 PM
Why are Illumina paired-end SRA datasets made up of 3 FASTQ files? Bio.X2Y Illumina/Solexa 9 12-21-2010 11:36 AM

Reply
 
Thread Tools
Old 01-05-2012, 11:31 AM   #1
kjsalimian
Junior Member
 
Location: USA

Join Date: Dec 2011
Posts: 5
Default Illumina Paired End FASTQ

Hi guys,

I'm very new to this. I have a paired end data set from HiSeq1000. I want to take the first 10,000 or 100,000 reads out of ~40mil reads to use for tests rather than putting the entire 40 mil reads through the tests.

What is the easiest way to generate files of only the first 100,000 reads??

Thanks.
kjsalimian is offline   Reply With Quote
Old 01-05-2012, 12:18 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

I'm not really familar with HiSeq data foramts, but there are probably some kind of coordinates, maybe by tile. If the data below from the checked answer is correct, you could use a grep to get only the entries with '[sequencer name]_[run]:[lane]:10:'. That's be a fairly random.

Or simpler, head -400000 to get the top 1000k reads?

The ones in the very beginning are probably at the edge of the flow-cell, and will have more bad quality reads. Pulling from the middle might get you more good reads.


http://biostar.stackexchange.com/que...d-naming-tiles
swbarnes2 is offline   Reply With Quote
Old 01-05-2012, 12:19 PM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

Have a look at this thread. You'll often find that the first batch of reads in a file are crap (at least from some machines), so you're better off randomly selecting them.
dpryan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO