SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with outputting GT information using samtools Rcucullatus Bioinformatics 0 09-26-2012 07:35 AM
A request for Newbler ontheway Bioinformatics 5 09-18-2012 11:04 AM
Newbler 2.7? dan Bioinformatics 1 09-13-2012 10:38 PM
New to Newbler Resa7362 Bioinformatics 2 04-26-2012 01:34 PM
Getting Newbler animesh 454 Pyrosequencing 21 12-26-2009 06:22 AM

Reply
 
Thread Tools
Old 12-06-2012, 03:20 AM   #1
drdna
Member
 
Location: Kentucky

Join Date: May 2012
Posts: 76
Default Newbler outputting garbage

I am wondering if anyone else has seen the following problem: I am using RNAseq to identify viral sequences among transcripts from an infected plant. Assembly of a single chip of Ion Torrent data was taking far too long (> 1 week - that's a separate issue) so I decided to use BLAST to identify reads matching short pieces of reference sequence from the virus in question. The matching reads were then assembled with Newbler 2.8. Three contigs were produced, the largest of which was 678 bp in length, contained 3/4 of the reads going into the assembly (3,700/4,500) and was reported as having high quality (most bases were Phred 64). Problem is, none of the resulting contigs match the viral sequences used to identify the constituent reads. Nor do any of the constituent reads match the assembly!

Most of the reads had very good matches to the reference (< 3 mismatches) and assembly with phrap (using reads generated with sffinfo) produced contigs with lower reported quality values but which DO match the viral references. Thus Newbler is outputting assemblies which it reports as high quality but which are, in fact, complete garbage.

Last edited by drdna; 12-06-2012 at 05:15 AM.
drdna is offline   Reply With Quote
Old 12-10-2012, 04:29 AM   #2
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Maybe you are giving newbler too many reads? How many bases do you give, and how many do you expect in your contigs? In other words, that is the coverage in raw reads you feed into newbler?
flxlex is offline   Reply With Quote
Old 12-10-2012, 09:52 AM   #3
drdna
Member
 
Location: Kentucky

Join Date: May 2012
Posts: 76
Default

Coverage is approx. 50-fold. - not overly high. Main problem is that the contigs produced have ZERO similarity to the input reads. I could understand it if too much coverage produced too many contigs, or redundancy in the assembly due to experimentally-induced errors but no similarity whatsoever?
drdna is offline   Reply With Quote
Old 12-17-2012, 11:37 PM   #4
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Well, then I don't know either. I'd stick to your phrap results...
flxlex is offline   Reply With Quote
Old 12-18-2012, 06:23 AM   #5
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Both gsMapper and gsAssembler v2.7 are buggy and 454 is either unable or have no will/resources to fix them. I gave up on 2.7 and get back to 2.5p1.
yaximik is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:40 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO