Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    yes, sort the lengths and calculate N50, e.g.

    the sum of all scaffolds is 1901, the biggest contig is 1000, so, in this case, your N50 is 1000, as this is more than 50% of the total contig size ...

    For N90 you need to calculate accordingly ..

    cheers,
    Sven

    Comment


    • #17
      Hi sklages,

      Thanks a lot for your explanation.
      I fully understand about N50 and N90 now
      Really thanks a lot again ^^

      Originally posted by sklages View Post
      yes, sort the lengths and calculate N50, e.g.

      the sum of all scaffolds is 1901, the biggest contig is 1000, so, in this case, your N50 is 1000, as this is more than 50% of the total contig size ...

      For N90 you need to calculate accordingly ..

      cheers,
      Sven

      Comment


      • #18
        Originally posted by edge View Post
        Hi sklages,
        Thanks for your suggestion.
        I face some problems when try to find out the N50,N90,minimum contig size and maximum contig size,etc based on the *.contig file at "projectname_d_result".
        The figure I find out can't match with the *_info_assembly.txt
        Do you have any idea to calculate the N50,N90,minimum contig size and maximum contig size at *_info_assembly.txt ?
        I do calculate N50 using *_info_contigstats.txt (which gives the same as results as if I use the contigs.fasta file).
        This gives the same as the N50 calculated in *_info_assembly.txt (Section "All Contigs"!).

        Btw, .. the padded fasta output contains the sequences with pads (if there are pads). The unpadded sequence has all pads been removed and is usually used for further analysis (but this depends on what you are doing).

        cheers,
        Sven

        Comment


        • #19
          Hi sklages,

          You are right d.
          I think I do calculate N50 using *_info_contigstats.txt (which gives the same as results as if I use the contigs.fasta file).
          At the above sentence, we should use *_out.unpadded.fasta instead of contigs.fasta file to calculate N50,right?
          I try using contigs.fasta file to calculate the N50 but it can't match with the
          *_info_contigstats.txt. However, *_out.unpadded.fasta can do it
          It seem like contigs.fasta file same with the *_out.padded.fasta file, both just the header a bit different, right?
          Besides that, refer to the *_info_assembly.txt at Section "All Contigs". I try to use the *_info_contigstats.txt/contigs.fasta file/*_out.unpadded.fasta to find out the "Total consensus","Number of contigs",etc at Section "All Contigs".
          Unfortunately, it can't match with the figure at Section "All Contigs".
          Do you have any idea about this problem facing?
          It seem like all the figure that I get from the file (*_info_contigstats.txt/contigs.fasta file/*_out.unpadded.fasta) is lesser than the figure at Section "All Contigs"


          Originally posted by sklages View Post
          I do calculate N50 using *_info_contigstats.txt (which gives the same as results as if I use the contigs.fasta file).
          cheers,
          Sven

          Comment


          • #20
            Originally posted by edge View Post
            Hi sklages,

            [...]Besides that, refer to the *_info_assembly.txt at Section "All Contigs". I try to use the *_info_contigstats.txt/contigs.fasta file/*_out.unpadded.fasta to find out the "Total consensus","Number of contigs",etc at Section "All Contigs".
            Unfortunately, it can't match with the figure at Section "All Contigs".
            Do you have any idea about this problem facing?
            It seem like all the figure that I get from the file (*_info_contigstats.txt/contigs.fasta file/*_out.unpadded.fasta) is lesser than the figure at Section "All Contigs"
            Well, you are right. I just checked for this as well. N50 and "largest contig" are "correct" (in a sense how I calculated it), all other numbers differ slightly from what I was counting ...

            No idea why ..
            Sven

            Comment


            • #21
              Yup...
              Seem like we facing the same problem
              Maybe need ask and sharing the experience from other users
              I will share with you if I find out the solution to match all the figure with the
              *_info_contigstats.txt

              Originally posted by sklages View Post
              I do calculate N50 using *_info_contigstats.txt (which gives the same as results as if I use the contigs.fasta file).
              This gives the same as the N50 calculated in *_info_assembly.txt (Section "All Contigs"!).

              Btw, .. the padded fasta output contains the sequences with pads (if there are pads). The unpadded sequence has all pads been removed and is usually used for further analysis (but this depends on what you are doing).

              cheers,
              Sven
              Originally posted by sklages View Post
              Well, you are right. I just checked for this as well. N50 and "largest contig" are "correct" (in a sense how I calculated it), all other numbers differ slightly from what I was counting ...

              No idea why ..
              Sven

              Comment


              • #22
                Originally posted by edge View Post
                Hi BENM,
                If I got a long list of contents:
                scaff_123 20
                scaff_223 60
                scaff_122 1000
                scaff_125 15
                scaff_23 30
                scaff_13 26
                scaff_230 50
                scaff_153 500
                scaff_173 200

                Based on the column two,
                Do you have any idea how to calculate the N50 and N90 from this long list of contents?
                I need to do descending order of this long list of contents before I calculate the N50 and N90,right?
                Thanks again for your help
                hi edge,

                I have written a PERL script for stat. length and gc content of FASTA/FASTQ file, hope it ould help you.
                Attached Files
                Last edited by BENM; 12-15-2009, 01:59 AM.

                Comment


                • #23
                  Hi BENM,

                  Thanks a lot for your script.
                  I just run it d.
                  It worked excellent and fast.
                  I and sklages facing the same problem of the output result of MIRA.
                  Still finding the solution now.
                  Thanks again for your explanation and script

                  Comment


                  • #24
                    Originally posted by edge View Post
                    Hi BENM,

                    Thanks a lot for your script.
                    I just run it d.
                    It worked excellent and fast.
                    I and sklages facing the same problem of the output result of MIRA.
                    Still finding the solution now.
                    Thanks again for your explanation and script
                    hi Edge,

                    The "calengc.pl" has a bug in stat. gc content, it has been corrected.

                    Comment


                    • #25
                      Hi BENM,
                      Thanks for your remind. Can I know what you mean by "The "calengc.pl" has a bug in stat. gc content, it has been corrected. "?
                      You are a perl programmer expert?
                      Seem like the perl script you write can deal with data quite fast

                      Comment


                      • #26
                        Hi Edge,

                        I don't think I am PERL programmer expert, just a junior learner.
                        It is the same as python or C/C++, or other program language, I think the most important is algorithm or thinking idea. Just PERL can be written easily.
                        I have another bug in length stat. Sorry for my slack.

                        Comment


                        • #27
                          I very appreciate for your script.
                          I also prefer perl too.
                          In between, I feel awk and sed sometimes also quite useful as well.
                          Can I know what is the bug or error that you have been written at gc content and length stat?

                          Comment


                          • #28
                            Hi BENM,

                            Did you post the corrected 'calengc.pl' script?

                            Originally posted by BENM View Post
                            hi Edge,

                            The "calengc.pl" has a bug in stat. gc content, it has been corrected.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            51 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            68 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X