Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GAII low number of mapped reads

    Hi everyone,

    I tried a rather ambitious experiment in which I tried barcoding several samples of human DNA using a homemade barcodes, target selecting for a few genes by microarray followed by sequencing on the illumina GAII. I used 100bp paired end reads with an index cycle. I could parse my barcodes just fine but when I tried mapping my reads, I got a very low number that mapped back to the human genome (60%) and only 25% to my targeted region. I tried using both ELAND and BWA default settings for paired end reads (actually I added the -q15 in BWA). Is there anything I can do to "salvage" this experiment? Are there different parameters in BWA and Illumina that I could try or is my read quality just that bad. What is odd is that when I look at the quality score of my reads, I don't think they are that bad so I'm confused as to why so few would map back. Any help would be greatly appreciated!!

    Cheers,
    Ali

  • #2
    Have you done any QC on your data to see if there are obvious biases or quality problems?

    Have you trimmed adapters off your reads? At 100bp you might be getting a reasonable portion of your library reading through into adapter, and this will mess up your ability to map your reads.

    Comment


    • #3
      Originally posted by simonandrews View Post
      Have you done any QC on your data to see if there are obvious biases or quality problems?

      Have you trimmed adapters off your reads? At 100bp you might be getting a reasonable portion of your library reading through into adapter, and this will mess up your ability to map your reads.
      I've looked with FastQC and it does seem that my quality score begins to drop off toward the middle of the read. Trimming by quality score in BWA does help but I still have a lot that don't map. My guess is that I have a library prep issue?

      Comment


      • #4
        If you have decent quality reads then if they're failing to map that's going to be due to one of:
        1. Your library is contaminated with DNA from a different source (Ecoli etc)
        2. Your library is partially contaminated with adapters or some part of your vector
        3. Your sequences come from repetitive sequence which doesn't allow them to map uniquely


        You say you're getting 60% of your reads mapping, so the library isn't a complete disaster, so it's just a case of figuring out where the rest went.

        If you have a contamination from another DNA source you could try to screen for it. We routinely put all of our libraries through a screen to see if they contain what they should.

        If you have partial conatmination with adapter or improperly removed barcodes then you should see this in your FastQC reports. Such biases would show up either in the per-base sequence content plot or the Kmer plots. Any non-insert sequence still in your library would mess up your mapping efficiency.

        If your sequences aren't mapping uniquely - but could map well in many places then you should be able to alter your mapping parameters to see this. I don't use BWA personally but I'm sure there will be an option to return a hit even if a sequence could have mapped in many places with high identity. This won't necessarily help your downstream analysis, but it will at least let you know why your sequences wouldn't map.

        If all else fails what we've done before is to remove from our library all of the sequences which we were able to map successfully and then do an assembly of whatever is left (we used velvet). This has worked well for us on a couple of occasions to identify sources of contamination which we'd been unable to identify in any other way.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        58 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X