Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying collapsed repeats and other misassemblies

    I have been using R to browse and produce graphs of the alignmentinfo file pproduced by Newbler.
    Graphs of consensus Depth have revealed some interesting features. For example Read depth has a mean of 21. However there are peaks in the graphs where read depth climbs to over 400.
    Examining the contig where this occurs reveals that these contigs are very short (always less that 300bp).
    I am curious as to how i should interpret these features. I am assuming that they are collapsed repeats.

    regards

    Brian

  • #2
    Originally posted by coldturkey View Post
    I am curious as to how i should interpret these features. I am assuming that they are collapsed repeats.
    It helps to BLAST those things quickly at the NCBI.

    Most of the time they are repeats: rRNA (in bacteria), pseudogenes, SINEs, LINEs etc.

    Sometimes it's also some weird kind of contamination ... e.g., be wary of herpes simplex virus sequence in your bacterial 454 sequencing project

    B.

    Comment


    • #3
      Thanks BaCH,

      Why HSV in particular or is it just an example?

      Comment


      • #4
        Originally posted by coldturkey View Post
        Why HSV in particular or is it just an example?
        Just an example I have seen. Other examples include sequences from bacteria that carry "gingivitis" in the name etc.pp. I suspect that in these cases, contamination occured because someone breathed either onto the culture medium, during sample preparation or whenever ... the high-throughout sequencing machines will really sequence everything.

        I've seen this in data from at least three different sources (US and Europe). I'm sure I'd find more if I really searched for it.

        Though the rates are usually pretty low and can be easilly filtered out in bacterial sequencing projects, I wonder whether the instrument vendors should update the workflow recommendations toward higher "standards" (like wearing masks when preparing the DNA) when working with eukaryotic samples (plants excepted).

        B.

        Comment


        • #5
          So given that these reads are all on small contigs (under 300bp) is it safe to exclude them from the assembly?

          Comment


          • #6
            I check with BLAST against the nucleotide collection of the NCBI. If they clearly don't belong to the organsim one is analysing, discard. If there's a remote possibility, keep (but perhaps annotate as dubious).

            Comment


            • #7
              thanks again

              Brian

              Comment


              • #8
                Originally posted by coldturkey View Post
                So given that these reads are all on small contigs (under 300bp) is it safe to exclude them from the assembly?

                Small contigs are useful for blast QC.

                But let's say even if the small contigs are not contaminations, they are less trust worthy, of poor quality.

                We do QC on small contigs, But at end, we usually discard them and use large contigs for down stream work.

                Comment


                • #9
                  By the way, I do believe in highly repetitive regions, particularly short tandem repeat regions, the newbler would either not assemble them, label them as repeat reads, or put them into tons of little small contigs.

                  But chances of contaminations in small contigs are also high. Usually small traces of other species would end up in small contigs, but not enough to form big contigs.

                  Comment


                  • #10
                    amosvalidate might help you analyze the problematic regions in your assembly.

                    Comment


                    • #11
                      Yeah I tired amos validate, but I couldn't get it to identify my mate pairs. I was told then that amosvalidate did not support 454 mate pairs at the moment and I should skim this part of the validation

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Advancing Precision Medicine for Rare Diseases in Children
                        by seqadmin




                        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                        12-16-2024, 07:57 AM
                      • seqadmin
                        Recent Advances in Sequencing Technologies
                        by seqadmin



                        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                        Long-Read Sequencing
                        Long-read sequencing has seen remarkable advancements,...
                        12-02-2024, 01:49 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 12-17-2024, 10:28 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-13-2024, 08:24 AM
                      0 responses
                      43 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-12-2024, 07:41 AM
                      0 responses
                      29 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-11-2024, 07:45 AM
                      0 responses
                      42 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X