Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools flagstat - low % reads mapping

    Hi,

    I'm working with RNA-Seq and using bowtie and tophat to align 65bp PE reads to a reference genome. My reads were sequenced from X.laevis and I'm attempting to first map to X.tropicalis (X.laevis genome is still draft version).

    After trimming and filtering my reads I am left with 31*2 = 62M reads but running samtools on my accepted_hits.bam file shows that only 12M reads have mapped in total. I'm completely confused about why the number of reads mapping is so low - I've tried fine tuning the options in tophat (-r value, -N value) and using differently trimmed reads - but have seen little improvement on 20% mapping success.

    In addition almost none of my reads pair properly (samtools flagstat 'properly paired' = 0.01%).

    Any help would be hugely appreciated,

    Thanks

  • #2
    How have you trimmed your reads? Have you looked for adaptor sequence in your reads?

    Comment


    • #3
      I've trimmed the reads using fastq_quality_trimmer & filter and fastx_trimmer.

      One of the problems I've had is that the RNA fragment size is ~130 bp (post adapter removal) and my 100bp reads therefore overlap considerably. I've been using fastx_trimmer to cut the reads to 65bp to ensure no overlap - but they don't seem to be pairing properly in mapping.

      I haven't checked for adapters - I ran the .txt files through fastqc and there were no over-represented sequences.

      N

      Comment


      • #4
        Thats what I thought.

        Even at 65 bp you may still have overlap and/or adaptor sequence.

        Is it critical that you have paired end data? I had a similar situation with some paired end data. I simply dispensed with the second set of reads and treated it as single end reads. With that amount of overlap, its probably going to be impossible for tophat to get the insert size right.

        Also try adaptor trimming with a trimmer that can handle variable lengths of adaptor sequence, I have used cutadapt with great success. Then try realigning without your paired end and you should have better results.

        Otherwise....make a new library.

        Comment


        • #5
          If your reads are overlapping significantly you may want to try this as an alternative to stitch the two ends together.



          Updated citation:

          Tanja Magoč and Steven L. Salzberg

          FLASH: fast length adjustment of short reads to improve genome assemblies Bioinformatics (2011) 27(21): 2957-2963
          Last edited by GenoMax; 11-01-2012, 08:04 AM.

          Comment


          • #6
            I ran the 65bp trimmed reads through FLASH (http://genomics.jhu.edu/software/FLASH/index.shtml) to confirm that, post trim, there's no overlap.

            As I understand it bowtie and tophat map the pairs independently, so I would expect that dispensing of 1/2 of my reads would result in the same % mapped reads, maybe I'm wrong though?

            My primary concern is that the % of reads mapped is so low, I'm less concerned about the pairing of the reads (I'm interested in differential expression rather than resolving isoforms etc) but can't help but feel that the two are linked...

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            66 views
            0 likes
            Last Post seqadmin  
            Working...
            X