Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating the number of contigs in a scaffold file

    I am trying to calculate the number of contigs in a scaffold file i.e. a consensus sequence separated by n's. I have been working on an assembly generated by Newbler and have closed some of the gaps computationally or experimentally. I need to know how many contigs are left in each scaffold. Could anyone point me in the right direction?

  • #2
    The file 454Scaffolds.txt generated by Newbler has the information you need.

    See http://contig.wordpress.com/2010/03/...-file/#more-56 for more information.

    Comment


    • #3
      I should clarify: the assembly was imported in to gap4 and worked on by joining contigs to the scaffold consensus and closing gaps computationally or experimentally. I can save out the updated consensus files but these will still contain n's due to the scaffold sequence that I joined in. I need to find a way of calculating the number of contigs ie. the number of sequences separated by n's in this file.

      Comment


      • #4
        Assuming you are a unix type system, one answer is to use the 'tr' command along with 'sed' and 'wc'. First get rid of the fasta headers. Then get rid of the newlines. Then reduce all the of the 'n's to a single character. Finally delete all non-n's and then count up the remaining n's. That number will represent the number of gaps you have plus one thus the number of contigs.

        sed -e 's/>.*/n/' scaffold.fasta | tr -d '\n' | tr -s 'n' | tr -d 'acgt' | wc -c

        The above assumes only acgtn in lower case. I suspect there are as many other answers as there are people on this bulletin board.

        Comment


        • #5
          Originally posted by westerman View Post
          I suspect there are as many other answers as there are people on this bulletin board.
          Perhaps, but yours is hard to beat for shortness...

          Comment


          • #6
            Thanks. That's great and gives me the answer I was looking for.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X