SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
- N options with maq 0.7.1 seq_GA Bioinformatics 1 10-21-2011 03:14 AM
tophat with --best --strata options dariober Bioinformatics 0 02-02-2011 03:07 AM
Tophat options to report unaligned reads and controlling Bowtie options Siva Bioinformatics 0 10-15-2010 07:38 PM
Question about BWA options corthay Bioinformatics 0 04-21-2010 01:04 AM
Breakdancer options bawee Bioinformatics 3 03-23-2010 06:01 PM

Reply
 
Thread Tools
Old 06-02-2010, 11:25 AM   #1
Gen2007
Member
 
Location: St. Louis

Join Date: Jun 2010
Posts: 10
Default Novoalign options

Hello Group,

I've been trying to find the optimal parameters for aligned bisulfite treated 76bp PE reads to a small reference that includes repeats. I'm confused with the option to set the fragment length and standard deviation. My question is: what exactly does the fragment length refer to? Is it the distance between mapped mates or does it include the reads.

The library I am using isolated adapter fragments between 250-300bp (minus adapter = 131-181). Given 76 bp reads, some mates could overlap. If the fragment length refers to distance between aligned reads than this value could be negative. Anyone want to clarify?
Thanks a lot!
Gen2007 is offline   Reply With Quote
Old 09-27-2010, 07:13 PM   #2
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

HI, Just found your post. The fragment length mean and standard deviation are outer coordinates, it doesn't matter if you set them a bit higher than the real values.
Colin
sparks is offline   Reply With Quote
Old 09-28-2010, 09:06 AM   #3
Gen2007
Member
 
Location: St. Louis

Join Date: Jun 2010
Posts: 10
Default

So by outer coordinates you mean the most distal coordinates of an adapter-ligated molecule, i.e. average length of your library that you would see after PCR amplification on a gel?

For the library I mentioned, I used 150bp as the mean fragment length since I assumed it was the distance between reads, and the alignment worked pretty well. Is it more computationally strenuous to have a larger fragment length setting?
Gen2007 is offline   Reply With Quote
Old 09-28-2010, 06:42 PM   #4
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

It's the length of the DNA fragment as mapped by the aligner and hence doesn't include any adapters. It's not the length of the gap between the two read alignments.
If your gel includes PCR adapters then you need to adjust for this.

It doesn't really affect computation as Novoalign adjusts to the fragment lengths it sees, but setting it too short may mean reads don't get paired properly.
It's not a major issue as range for pairs is from 0 to mean + 6 standard deviations so it's usually enough.
You can also run Novoalign on a few K reads and check the reported fragment length distribution. Use the -# option to limit the number of reads processed. e.g. -# 2K will map 2000 reads and stop.
sparks is offline   Reply With Quote
Old 09-28-2010, 08:34 PM   #5
Gen2007
Member
 
Location: St. Louis

Join Date: Jun 2010
Posts: 10
Default

Great, that makes a lot of sense. I actually meant "reads" instead of "adapters." Of course in Illumina sequencing the read begins before the adapter since the sequencing primer anneals to the adapter... I'm glad you caught that. I appreciate all the help. Thanks a lot!
Gen2007 is offline   Reply With Quote
Reply

Tags
novoalign, options

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO