Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying collapsed repeats and other misassemblies

    I have been using R to browse and produce graphs of the alignmentinfo file pproduced by Newbler.
    Graphs of consensus Depth have revealed some interesting features. For example Read depth has a mean of 21. However there are peaks in the graphs where read depth climbs to over 400.
    Examining the contig where this occurs reveals that these contigs are very short (always less that 300bp).
    I am curious as to how i should interpret these features. I am assuming that they are collapsed repeats.

    regards

    Brian

  • #2
    Originally posted by coldturkey View Post
    I am curious as to how i should interpret these features. I am assuming that they are collapsed repeats.
    It helps to BLAST those things quickly at the NCBI.

    Most of the time they are repeats: rRNA (in bacteria), pseudogenes, SINEs, LINEs etc.

    Sometimes it's also some weird kind of contamination ... e.g., be wary of herpes simplex virus sequence in your bacterial 454 sequencing project

    B.

    Comment


    • #3
      Thanks BaCH,

      Why HSV in particular or is it just an example?

      Comment


      • #4
        Originally posted by coldturkey View Post
        Why HSV in particular or is it just an example?
        Just an example I have seen. Other examples include sequences from bacteria that carry "gingivitis" in the name etc.pp. I suspect that in these cases, contamination occured because someone breathed either onto the culture medium, during sample preparation or whenever ... the high-throughout sequencing machines will really sequence everything.

        I've seen this in data from at least three different sources (US and Europe). I'm sure I'd find more if I really searched for it.

        Though the rates are usually pretty low and can be easilly filtered out in bacterial sequencing projects, I wonder whether the instrument vendors should update the workflow recommendations toward higher "standards" (like wearing masks when preparing the DNA) when working with eukaryotic samples (plants excepted).

        B.

        Comment


        • #5
          So given that these reads are all on small contigs (under 300bp) is it safe to exclude them from the assembly?

          Comment


          • #6
            I check with BLAST against the nucleotide collection of the NCBI. If they clearly don't belong to the organsim one is analysing, discard. If there's a remote possibility, keep (but perhaps annotate as dubious).

            Comment


            • #7
              thanks again

              Brian

              Comment


              • #8
                Originally posted by coldturkey View Post
                So given that these reads are all on small contigs (under 300bp) is it safe to exclude them from the assembly?

                Small contigs are useful for blast QC.

                But let's say even if the small contigs are not contaminations, they are less trust worthy, of poor quality.

                We do QC on small contigs, But at end, we usually discard them and use large contigs for down stream work.

                Comment


                • #9
                  By the way, I do believe in highly repetitive regions, particularly short tandem repeat regions, the newbler would either not assemble them, label them as repeat reads, or put them into tons of little small contigs.

                  But chances of contaminations in small contigs are also high. Usually small traces of other species would end up in small contigs, but not enough to form big contigs.

                  Comment


                  • #10
                    amosvalidate might help you analyze the problematic regions in your assembly.

                    Comment


                    • #11
                      Yeah I tired amos validate, but I couldn't get it to identify my mate pairs. I was told then that amosvalidate did not support 454 mate pairs at the moment and I should skim this part of the validation

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      59 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      57 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      56 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X