Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GAII low number of mapped reads

    Hi everyone,

    I tried a rather ambitious experiment in which I tried barcoding several samples of human DNA using a homemade barcodes, target selecting for a few genes by microarray followed by sequencing on the illumina GAII. I used 100bp paired end reads with an index cycle. I could parse my barcodes just fine but when I tried mapping my reads, I got a very low number that mapped back to the human genome (60%) and only 25% to my targeted region. I tried using both ELAND and BWA default settings for paired end reads (actually I added the -q15 in BWA). Is there anything I can do to "salvage" this experiment? Are there different parameters in BWA and Illumina that I could try or is my read quality just that bad. What is odd is that when I look at the quality score of my reads, I don't think they are that bad so I'm confused as to why so few would map back. Any help would be greatly appreciated!!

    Cheers,
    Ali

  • #2
    Have you done any QC on your data to see if there are obvious biases or quality problems?

    Have you trimmed adapters off your reads? At 100bp you might be getting a reasonable portion of your library reading through into adapter, and this will mess up your ability to map your reads.

    Comment


    • #3
      Originally posted by simonandrews View Post
      Have you done any QC on your data to see if there are obvious biases or quality problems?

      Have you trimmed adapters off your reads? At 100bp you might be getting a reasonable portion of your library reading through into adapter, and this will mess up your ability to map your reads.
      I've looked with FastQC and it does seem that my quality score begins to drop off toward the middle of the read. Trimming by quality score in BWA does help but I still have a lot that don't map. My guess is that I have a library prep issue?

      Comment


      • #4
        If you have decent quality reads then if they're failing to map that's going to be due to one of:
        1. Your library is contaminated with DNA from a different source (Ecoli etc)
        2. Your library is partially contaminated with adapters or some part of your vector
        3. Your sequences come from repetitive sequence which doesn't allow them to map uniquely


        You say you're getting 60% of your reads mapping, so the library isn't a complete disaster, so it's just a case of figuring out where the rest went.

        If you have a contamination from another DNA source you could try to screen for it. We routinely put all of our libraries through a screen to see if they contain what they should.

        If you have partial conatmination with adapter or improperly removed barcodes then you should see this in your FastQC reports. Such biases would show up either in the per-base sequence content plot or the Kmer plots. Any non-insert sequence still in your library would mess up your mapping efficiency.

        If your sequences aren't mapping uniquely - but could map well in many places then you should be able to alter your mapping parameters to see this. I don't use BWA personally but I'm sure there will be an option to return a hit even if a sequence could have mapped in many places with high identity. This won't necessarily help your downstream analysis, but it will at least let you know why your sequences wouldn't map.

        If all else fails what we've done before is to remove from our library all of the sequences which we were able to map successfully and then do an assembly of whatever is left (we used velvet). This has worked well for us on a couple of occasions to identify sources of contamination which we'd been unable to identify in any other way.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X