Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie alignments not matching 100%

    I'm trying to decide if our data shows that we require the depth of HiSeq or if we could do MiSeq.

    What I have done is concatenated all the contigs into one fasta file which I used to create an index in Bowtie.

    Then, I tried aligning the raw reads to this index using Bowtie.

    Unfortunately, I found that only 75% of my "genome" was covered. Is there something I'm missing?

  • #2
    How many times do you need to roll a pair of dice to be guaranteed to see every combination at least once? When you know why the answer is "infinite times", you'll know why you shouldn't expect 100% coverage (this greatly oversimplifies things, of course).

    The goal isn't 100% coverage, but an average coverage of some fold (10x, 4x, whatever).

    Comment


    • #3
      I'm not sure I understand. You're saying if I map back the raw reads to the sequence created from those same raw reads, I should not expect that they will cover the entire sequence?

      Comment


      • #4
        Maybe I wasn't clear in my explanation (I apologize, I'm new to all of this). What I'm trying to do is map back the raw reads that created the sequence, to see what the coverage is. Unfortunately, I am seeing that they only cover 75% of the sequence they were used to create, regardless of the depth of coverage.
        Last edited by sewellh; 06-13-2014, 03:25 PM.

        Comment


        • #5
          It's important to place ambiguously-mapped reads randomly or to all possible locations if you want to analyze coverage. What's your mapping command line?

          But as dpryan said, what's most important is that you get high enough coverage for whatever your purpose is. You can estimate coverage with a kmer-counter, without even assembling. What are you trying to do, and how were the contigs generated?

          Comment


          • #6
            I didn't do the original assembly, but the contigs were generated via the SPAdes assembler.

            To map the raw reads back to the assembled sequence I used the following:

            bowtie2 -p 2 -x DscP-kaster -1 KM01_R1.fastq -2 KM01_R2.fastq -S KM01_bowtie.sam

            Comment


            • #7
              Originally posted by sewellh View Post
              Maybe I wasn't clear in my explanation (I apologize, I'm new to all of this). What I'm trying to do is map back the raw reads that created the sequence, to see what the coverage is. Unfortunately, I am seeing that they only cover 75% of the sequence they were used to create, regardless of the depth of coverage.
              I suspect that this means that 25% of your contigs -- which, from what I gather, you generated via a denovo SPADES assembly of your reads -- are incorrect or at least a poorer representation of the reads than the other contigs. While the 25% number is high I am not surprised that there are some of your contigs which are not the best ones to use for back-mapping of reads.

              If you have not already do so then I suggest only looking at the long contigs. 500+ bases is my usual cutoff. That will get rid of the outliers and make your back-mapping better.


              Looking at one of my recent bacterial projects I am able to find 100% mapping to the 500+bp contigs. Some of the contigs have very low number of reads back-mapping but at least all were found. This is at around 200x coverage.

              Looking at an avian project (where my cutoff was 200bp contigs) with about 15x coverage I am able to get around 98% of the contigs to have reads back-mapped to them.

              This was using Bowtie2. BWA would be similar.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Working...
              X