SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicating contigs in caf file Derek_S Bioinformatics 1 10-10-2011 12:57 AM
Help:scaffold 2 contigs! rogerholmes.novogene Bioinformatics 11 07-28-2011 10:05 PM
Calculating Percentage of n's for Each Transcript in FASTA File KDS Bioinformatics 7 06-07-2011 11:06 PM
how to get number of records of bam file using picard jay2008 Bioinformatics 0 05-23-2011 03:11 PM
number of contigs in velvet bioenvisage Bioinformatics 6 03-24-2010 08:10 PM

Reply
 
Thread Tools
Old 01-05-2011, 06:12 AM   #1
avtsanger
Junior Member
 
Location: Cambridge, UK

Join Date: Jun 2010
Posts: 8
Default Calculating the number of contigs in a scaffold file

I am trying to calculate the number of contigs in a scaffold file i.e. a consensus sequence separated by n's. I have been working on an assembly generated by Newbler and have closed some of the gaps computationally or experimentally. I need to know how many contigs are left in each scaffold. Could anyone point me in the right direction?
avtsanger is offline   Reply With Quote
Old 01-05-2011, 06:28 AM   #2
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

The file 454Scaffolds.txt generated by Newbler has the information you need.

See http://contig.wordpress.com/2010/03/...-file/#more-56 for more information.
nickloman is offline   Reply With Quote
Old 01-05-2011, 07:07 AM   #3
avtsanger
Junior Member
 
Location: Cambridge, UK

Join Date: Jun 2010
Posts: 8
Default

I should clarify: the assembly was imported in to gap4 and worked on by joining contigs to the scaffold consensus and closing gaps computationally or experimentally. I can save out the updated consensus files but these will still contain n's due to the scaffold sequence that I joined in. I need to find a way of calculating the number of contigs ie. the number of sequences separated by n's in this file.
avtsanger is offline   Reply With Quote
Old 01-05-2011, 09:40 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Assuming you are a unix type system, one answer is to use the 'tr' command along with 'sed' and 'wc'. First get rid of the fasta headers. Then get rid of the newlines. Then reduce all the of the 'n's to a single character. Finally delete all non-n's and then count up the remaining n's. That number will represent the number of gaps you have plus one thus the number of contigs.

sed -e 's/>.*/n/' scaffold.fasta | tr -d '\n' | tr -s 'n' | tr -d 'acgt' | wc -c

The above assumes only acgtn in lower case. I suspect there are as many other answers as there are people on this bulletin board.
westerman is offline   Reply With Quote
Old 01-05-2011, 08:57 PM   #5
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Quote:
Originally Posted by westerman View Post
I suspect there are as many other answers as there are people on this bulletin board.
Perhaps, but yours is hard to beat for shortness...
flxlex is offline   Reply With Quote
Old 01-06-2011, 12:43 AM   #6
avtsanger
Junior Member
 
Location: Cambridge, UK

Join Date: Jun 2010
Posts: 8
Default

Thanks. That's great and gives me the answer I was looking for.
avtsanger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO