SEQanswers Calculating the number of contigs in a scaffold file
 Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post Derek_S Bioinformatics 1 10-10-2011 12:57 AM rogerholmes.novogene Bioinformatics 11 07-28-2011 10:05 PM KDS Bioinformatics 7 06-07-2011 11:06 PM jay2008 Bioinformatics 0 05-23-2011 03:11 PM bioenvisage Bioinformatics 6 03-24-2010 08:10 PM

 01-05-2011, 06:12 AM #1 avtsanger Junior Member   Location: Cambridge, UK Join Date: Jun 2010 Posts: 8 Calculating the number of contigs in a scaffold file I am trying to calculate the number of contigs in a scaffold file i.e. a consensus sequence separated by n's. I have been working on an assembly generated by Newbler and have closed some of the gaps computationally or experimentally. I need to know how many contigs are left in each scaffold. Could anyone point me in the right direction?
 01-05-2011, 06:28 AM #2 nickloman Senior Member   Location: Birmingham, UK Join Date: Jul 2009 Posts: 356 The file 454Scaffolds.txt generated by Newbler has the information you need. See http://contig.wordpress.com/2010/03/...-file/#more-56 for more information.
 01-05-2011, 07:07 AM #3 avtsanger Junior Member   Location: Cambridge, UK Join Date: Jun 2010 Posts: 8 I should clarify: the assembly was imported in to gap4 and worked on by joining contigs to the scaffold consensus and closing gaps computationally or experimentally. I can save out the updated consensus files but these will still contain n's due to the scaffold sequence that I joined in. I need to find a way of calculating the number of contigs ie. the number of sequences separated by n's in this file.
 01-05-2011, 09:40 AM #4 westerman Rick Westerman   Location: Purdue University, Indiana, USA Join Date: Jun 2008 Posts: 1,104 Assuming you are a unix type system, one answer is to use the 'tr' command along with 'sed' and 'wc'. First get rid of the fasta headers. Then get rid of the newlines. Then reduce all the of the 'n's to a single character. Finally delete all non-n's and then count up the remaining n's. That number will represent the number of gaps you have plus one thus the number of contigs. sed -e 's/>.*/n/' scaffold.fasta | tr -d '\n' | tr -s 'n' | tr -d 'acgt' | wc -c The above assumes only acgtn in lower case. I suspect there are as many other answers as there are people on this bulletin board.
01-05-2011, 08:57 PM   #5
flxlex
Moderator

Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415

Quote:
 Originally Posted by westerman I suspect there are as many other answers as there are people on this bulletin board.
Perhaps, but yours is hard to beat for shortness...

 01-06-2011, 12:43 AM #6 avtsanger Junior Member   Location: Cambridge, UK Join Date: Jun 2010 Posts: 8 Thanks. That's great and gives me the answer I was looking for.