Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making sense of low coverage plant genome

    In our lab we have de novo assembled non-model plant WGS ( 1 paired-end library - 2x101 bp) of insert size 240 bp, with N50 around 1 kbp. The estimated genome size is around
    2GB. Through read mapping we found coverage around 5x. I would like to get the idea to make possible of publication with this data.

    I have some idea in mind to make use of this low coverage genome

    1. Calling variant - finding SNP, heterozygosity & homozygosity (samtools, GATK)
    2. Finding microsatellites (MISA etc)
    3. Finding repeats using repeatmasker
    4. Extracting and assembling mitochondria and chloroplast genome

    Please add me if any ideas or related papers which can make of this low coverage genome.

  • #2
    Hi bioman1,

    if I were you I would proceed with annotation first, that is finding the coding sequences in your assembled genome. Number of found genes will give you an idea of how good is your assembly. On genes you can then run some gene ontology analysis.

    Comment


    • #3
      It does not make a lot of sense to me to try to publish (or spend lots of time on) such a low-coverage assembly. It would be be much more cost-effective and useful to the rest of the world if you generated more coverage, and hence a better assembly, before going forward with further analysis.

      Comment


      • #4
        I agree with Brian actually. I would not trust snps and indels called with such a low coverage in the absence of a reference genome

        Comment


        • #5
          In your original post on the quality metrics of your assembly (http://seqanswers.com/forums/showthread.php?t=45673) we already discussed that your data is not good enough for publication. If the backbone of your analysis (i.e. the genome reference) is not in an adequate shape, how can any downstream analysis (#1-3) be?
          You might have sufficient coverage to assemble the mitochondria or chloroplast genome, but unless they are extremely unusual, I doubt that this alone will suffice for a publication.

          Comment


          • #6
            You would not be able to call heterozygosity with any accuracy. Think of a region with 5X read depth (your average). This means you are sampling the two chromosomes (if diploid) with 5 reads. What is the chance of not ever sampling one of the chromosomes? It would be 0.5^5 or 3%, or 6% chance of missing one or the other. You also couldn't call a SNP with just 1 read, and you would get 1X coverage of a chromosome 30% of the time.

            At 3X coverage you miss a chromosome 26% of the time, and the best case is that one chromosome gets 1 read and the other 2, so would never be able to call a SNP.
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #7
              Originally posted by bioman1 View Post
              The estimated genome size is around
              2GB. Through read mapping we found coverage around 5x.
              If you know the genome size, a more accurate estimate of coverage could be obtained by simply counting the total length of bases produced, rather than trying to infer this from mapping. This is really a minor point but it may make some difference.

              As others have said, anything close to 5X is way too low for producing an assembly but you still have plenty of data for exploring a number of interesting questions. For example, you have more than enough coverage for assembling the organelle genomes and for describing repeat properties in the genome (I can offer specific advice for each of these tasks if that is of interest).

              Comment


              • #8
                Thank you all for suggestiona. We have budget constraint, we can proceed gor further funding by making one publication with available data. Do I open to any kind of suggestind.
                SES please let me know your advice regarding organelle genome assembling and repeat properties identification.

                Comment


                • #9
                  Originally posted by bioman1 View Post
                  Thank you all for suggestiona. We have budget constraint, we can proceed gor further funding by making one publication with available data. Do I open to any kind of suggestind.
                  SES please let me know your advice regarding organelle genome assembling and repeat properties identification.
                  I recommend trying Chloro for chloroplast genome assembly, and the same program can be used for mitochondrial genomes given a database (just a fasta file) of mitochondrial genomes to screen against. Transposome is a tool for identifying repeats from WGS reads, so the input would be your unassembled sequence reads. Please let me know if you have questions about either tool, perhaps via email or message would be more appropriate since this would be getting a bit off topic of this thread.

                  Comment


                  • #10
                    Thanks SES. I will try and I will contact you if any difficulties.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-27-2024, 06:37 PM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-27-2024, 06:07 PM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    69 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X