SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
454 Mate pair naming convention, for input to BAMBUS aarthi.talla Bioinformatics 4 05-17-2011 01:47 PM
Any good idea for assembling 454 and Solexa mate-pair data anyone1985 Bioinformatics 0 09-09-2010 05:26 AM
Difference between mate pair and pair end bassu General 2 06-19-2010 06:13 AM
Mate pair, high GC chen Sample Prep / Library Generation 3 05-25-2010 08:45 AM
mate pair sequencing Chien-Yuan Chen Illumina/Solexa 8 03-25-2010 07:55 PM

Reply
 
Thread Tools
Old 04-22-2011, 08:42 AM   #1
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Default Split mate pair 454 for newbler

Hi,

I have a bunch of 454 3kb mate pair files in SFF format which I'd like to use to get a Newbler assembly. However, I want Newbler to use only those reads which have a linker sequence. Is it possible to provide such parameters to Newbler?

Thanks
flobpf is offline   Reply With Quote
Old 04-27-2011, 11:39 PM   #2
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

The short answer is 'no'. Newbler will check each read for the presence of the linker, split the reads that have one, but it uses the non-linker containing reads as shotgun reads.

One way to achieve what you want would be to do an regular newbler assembly, extract IDs of the reads containing the linker from the 454PairStatus.txt (only reads with linkers are mentioned here), put these IDs in a text file, and use the -fi option with this file to have newbler assemble only those reads.
flxlex is offline   Reply With Quote
Old 04-28-2011, 06:58 AM   #3
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Default

Hi Flxlex,

Thanks for your response. That is what I'm attempting to do. However, my Newbler run takes a really long time (100 hours, 30gb and still only 4% complete!!). I have 7 plates of 3kb mate pairs.

I was thinking of the following alternate approach. Would that work??
1) Use sff_extract to identify "linkered" sequences
2) Split them into .f and .r based on linkers and quality-clip sequences
3) Generate FASTQ files from only the seq with .f and .r
4) Convert FASTQ to FASTA
4) Use FASTA as input to Newbler.

Would be glad to know if that'd work. Also, is there a way to speed up my Newbler run? I'm using the steps mentioned in your post here and here:


Thanks for your help!

Last edited by flobpf; 04-28-2011 at 07:00 AM. Reason: added another link.
flobpf is offline   Reply With Quote
Old 04-28-2011, 07:46 AM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,148
Default

Quote:
Originally Posted by flobpf View Post
Hi Flxlex,

Thanks for your response. That is what I'm attempting to do. However, my Newbler run takes a really long time (100 hours, 30gb and still only 4% complete!!). I have 7 plates of 3kb mate pairs.

I was thinking of the following alternate approach. Would that work??
1) Use sff_extract to identify "linkered" sequences
2) Split them into .f and .r based on linkers and quality-clip sequences
3) Generate FASTQ files from only the seq with .f and .r
4) Convert FASTQ to FASTA
4) Use FASTA as input to Newbler.

Would be glad to know if that'd work. Also, is there a way to speed up my Newbler run? I'm using the steps mentioned in your post here and here:


Thanks for your help!
To me that method seems overly complicated and you would loose one of the advantages of Newbler, namely performing its alignments in "flow-space" vs. "base-space". The major problem now seems to be getting Newbler to perform the first assembly so that you can generate a list of reads with are truly paired to pass to a second Newbler assembly. I would suggest an alternate method of identifying the paired reads.

1. Dump FASTA format sequence files from your SFF files using the Roche sffinfo tool.

2. Using your favorite nucleotide pattern matching program (cross_match, SSAHA2, fuzznuc (EMBOSS)) search the FASTA files for reads containing the PE linker sequence.

3. Save the list of accessions for reads with the PE linker to a text file.

4. Use this text file with the -fi option as described above.

This is really just a modification of the method you are currently trying but using, perhaps, a faster method of identifying the paired reads.

I am a little surprised though by how long Newbler is taking and if a significant fraction of your reads are truly paired (i.e. you won't be eliminating the majority of your input reads) it may still stump Newbler.
kmcarr is offline   Reply With Quote
Old 04-28-2011, 07:59 AM   #5
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Thumbs up Thanks!

Ah Kevin.

Thanks. Thats actually way simpler. Will do it that way.
flobpf is offline   Reply With Quote
Old 05-01-2011, 11:25 PM   #6
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Hi,

If you have an assembly running, you will notice that after the phase where newbler reads all the sequence file, there is the 454ReadStatus.txt file. This file can be used even if the assembly is not yet finished to get to the reads with the linker: these will be marked _left and _right. Saves you from having to do the mapping of the linker yourself...

About the assembly speed: have you tried using more cpus's (with the -cpu flag) and the -large option?
flxlex is offline   Reply With Quote
Old 05-02-2011, 10:21 AM   #7
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Default Solved!

Quote:
Originally Posted by flxlex View Post
Hi,

If you have an assembly running, you will notice that after the phase where newbler reads all the sequence file, there is the 454ReadStatus.txt file. This file can be used even if the assembly is not yet finished to get to the reads with the linker: these will be marked _left and _right. Saves you from having to do the mapping of the linker yourself...

About the assembly speed: have you tried using more cpus's (with the -cpu flag) and the -large option?
Hi Flxlex,

Thanks for your response. I did provide it with the -cpu option and the -large option. That made all the difference and my assembly got over at a blazing speed.
flobpf is offline   Reply With Quote
Reply

Tags
454, mate pair, newbler

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:11 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO