SEQanswers

Go Back   SEQanswers > Introductions



Similar Threads
Thread Thread Starter Forum Replies Last Post
50 bp paired end reads vs. 100 bp single end reads efoss Bioinformatics 12 01-15-2014 08:05 PM
RNA-Seq 50 bp vs 100 bp Bardj Bioinformatics 10 09-02-2011 01:02 PM
Bowtie and reads that failed to align: (100.00%) michy Bioinformatics 7 02-08-2011 06:42 PM
Duplicated bases in 100 bp GA2 reads wraithnot Illumina/Solexa 4 10-26-2010 01:04 PM
What does mean 2 x 35 or 2 x 100 in illumina solexa? Stock General 1 02-14-2010 02:09 AM

Reply
 
Thread Tools
Old 02-11-2011, 01:13 PM   #1
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default Hello and a Question: 50 or 100 bp reads?

Greetings all,

I'm a 'senior' grad student at UCB working on a maize genetics/epigenetics project. I've prepared a couple libraries that we are planning to have sequenced here on one of our campus facility's nice new HiSeq 2000 machines! Validating them right now by small scale cloning, but from the size of most of the inserts, it looks like they are exactly what we expected, so all systems are go.

I'm quite new to this whole deep sequencing technique, but I'm very excited to start the learning process of how to analyze these data sets! On advice from this excellent post (http://seqanswers.com/forums/showthr...good+computers), which explains my situation exactly, I am slowly but surely working through the Unix and Perl for Biologists primer (http://korflab.ucdavis.edu/Unix_and_Perl/). Hopefully I'll have at least a novice understanding of programming by the time we get our reads.

But more importantly, a question: Should I get 50 or 100 bp reads for these libraries?

Here are some details and issues that we are dealing with:

The libraries were prepared using the small RNA adapters, so they will have to be done with single reads. Our main goal is to compare the two libraries, which represent two biological samples (WT vs. mutant), quantitatively, so getting fairly deep coverage is important to our analysis. However, we are working with the highly repetitive maize genome, so we also want to maximize the number of reads we can unambiguously map to the genome. In fact, reads that contain repetitive sequence AND unique sequence (eg., the insertion site of a transposon or other repeat into a unique genomic region) may be of particular interest, so capturing as many of these sites would be super. I'm guessing that longer reads would help in this respect.

From the Bioanalyzer traces for the libraries, it looks like the most *abundant* inserts are ~75 and ~56 bp, ie. that's where the peaks are. The insert size range is ~30-230 bp though (I cut out between 100-300 bp on the gel). Does the range really matter here? What percentage of 75 and 56 bp-sized inserts can we expect out of all of the reads we get? And from the larger sized inserts that we capture, can we expect to get decent enough coverage to be able to compare the two libraries at a particular region?

I would just automatically go with 100 bp reads I guess, but am wondering: is coverage significantly reduced with an increase in read length from what people have seen?

It looks like there are many programs out there which recognize and trim
adapter sequences from Illumina reads, for the reads that sequence INTO
the 3' adapters. So it seems like that wouldn't be TOO big of a problem.

Any advice/help on this would be much appreciated!
kerhard is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO