SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Calculate number of multi-mapped reads? KAP Bioinformatics 13 02-17-2017 06:07 AM
Number of Mapped reads ninad Genomic Resequencing 2 08-29-2013 04:15 AM
RNAseq and number of mapped reads seqfan RNA Sequencing 7 06-30-2011 02:01 PM
low percentage of reads mapped rahilsethi SOLiD 3 09-13-2010 06:01 AM
SOLiD SAGE: low percentage of reads mapped rahilsethi SOLiD 0 09-09-2010 11:04 AM

Reply
 
Thread Tools
Old 06-20-2011, 05:05 PM   #1
aligenie
Member
 
Location: San Diego

Join Date: Feb 2011
Posts: 13
Thumbs down GAII low number of mapped reads

Hi everyone,

I tried a rather ambitious experiment in which I tried barcoding several samples of human DNA using a homemade barcodes, target selecting for a few genes by microarray followed by sequencing on the illumina GAII. I used 100bp paired end reads with an index cycle. I could parse my barcodes just fine but when I tried mapping my reads, I got a very low number that mapped back to the human genome (60%) and only 25% to my targeted region. I tried using both ELAND and BWA default settings for paired end reads (actually I added the -q15 in BWA). Is there anything I can do to "salvage" this experiment? Are there different parameters in BWA and Illumina that I could try or is my read quality just that bad. What is odd is that when I look at the quality score of my reads, I don't think they are that bad so I'm confused as to why so few would map back. Any help would be greatly appreciated!!

Cheers,
Ali
aligenie is offline   Reply With Quote
Old 06-20-2011, 11:06 PM   #2
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Have you done any QC on your data to see if there are obvious biases or quality problems?

Have you trimmed adapters off your reads? At 100bp you might be getting a reasonable portion of your library reading through into adapter, and this will mess up your ability to map your reads.
simonandrews is offline   Reply With Quote
Old 06-21-2011, 01:13 PM   #3
aligenie
Member
 
Location: San Diego

Join Date: Feb 2011
Posts: 13
Default

Quote:
Originally Posted by simonandrews View Post
Have you done any QC on your data to see if there are obvious biases or quality problems?

Have you trimmed adapters off your reads? At 100bp you might be getting a reasonable portion of your library reading through into adapter, and this will mess up your ability to map your reads.
I've looked with FastQC and it does seem that my quality score begins to drop off toward the middle of the read. Trimming by quality score in BWA does help but I still have a lot that don't map. My guess is that I have a library prep issue?
aligenie is offline   Reply With Quote
Old 06-21-2011, 11:55 PM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

If you have decent quality reads then if they're failing to map that's going to be due to one of:
  1. Your library is contaminated with DNA from a different source (Ecoli etc)
  2. Your library is partially contaminated with adapters or some part of your vector
  3. Your sequences come from repetitive sequence which doesn't allow them to map uniquely

You say you're getting 60% of your reads mapping, so the library isn't a complete disaster, so it's just a case of figuring out where the rest went.

If you have a contamination from another DNA source you could try to screen for it. We routinely put all of our libraries through a screen to see if they contain what they should.

If you have partial conatmination with adapter or improperly removed barcodes then you should see this in your FastQC reports. Such biases would show up either in the per-base sequence content plot or the Kmer plots. Any non-insert sequence still in your library would mess up your mapping efficiency.

If your sequences aren't mapping uniquely - but could map well in many places then you should be able to alter your mapping parameters to see this. I don't use BWA personally but I'm sure there will be an option to return a hit even if a sequence could have mapped in many places with high identity. This won't necessarily help your downstream analysis, but it will at least let you know why your sequences wouldn't map.

If all else fails what we've done before is to remove from our library all of the sequences which we were able to map successfully and then do an assembly of whatever is left (we used velvet). This has worked well for us on a couple of occasions to identify sources of contamination which we'd been unable to identify in any other way.
simonandrews is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO