SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
reason for low mapping rate?? miaom RNA Sequencing 3 05-10-2014 07:25 AM
Very low map rate while mapping to denovo assebly flyingoyster RNA Sequencing 6 11-19-2013 05:12 PM
Low mapping rate for RNAseq 2x150 trimmed data Markovia Bioinformatics 0 08-30-2013 06:55 PM
The low mapping rate vivienne_lovely Bioinformatics 7 06-05-2013 05:45 PM
General mapping rate of human resequencing data against reference in GAiix/Hiseq cybog337 Illumina/Solexa 2 01-12-2011 08:43 AM

Reply
 
Thread Tools
Old 11-05-2014, 04:40 AM   #1
zinky
Member
 
Location: china

Join Date: Dec 2011
Posts: 48
Question Human whole-genome sequencing data analysis with low mapping rate

Hi, everyone
I got four samples' Human WGS data few days before to identify the variants as well as CNVs. After QC and mapping steps of my analysis workflow, I found each sample's mapping rate is in a vary low level which listed as follows:

samples mapping rate

sample1_H04C3ALXX_L4 57.62%
sample1_H04C3ALXX_L5 8.67%
sample1_H04C3ALXX_L6 13.68%
sample1_H04C3ALXX_L7 26.78%
sample1_H04C3ALXX_L8 28.19%

sample2_H04C3ALXX_L1 2.49%
sample2_H04C3ALXX_L2 2.17%
sample2_H04C3ALXX_L3 31.80%
sample2_H04C3ALXX_L4 32.57%
sample2_H04C3ALXX_L5 31.81%
sample2_H04C3ALXX_L6 31.63%
sample2_H04C3ALXX_L7 31.87%
sample2_H04C3ALXX_L8 31.81%

sample3_H04B1ALXX_L3 4.36%
sample3_H04B1ALXX_L4 59.36%
sample3_H04B1ALXX_L5 2.49%
sample3_H04B1ALXX_L6 3.21%

sample4_H04C3ALXX_L5 27.06%
sample4_H04C3ALXX_L6 26.67%
sample4_H04C3ALXX_L7 27.52%
sample4_H04C3ALXX_L8 27.79%
sample4_H04C3ALXX_L1 14.82%
sample4_H04C3ALXX_L2 13.96%
sample4_H04C3ALXX_L3 24.75%
sample4_H04C3ALXX_L4 24.75%

The mapping software was BWA with its version 0.7.10-r789

To figure out why so little rate was generated, I randomly picked 1000 unmaped reads and performed a blast analysis against nt library. Each read output a best hit result, and most aligned sequences are human clone fragments like:

Human DNA sequence from clone RP3-376K6, complete sequence
Homo sapiens Chromosome 16 BAC clone CIT987SK-A-926E7, complete sequence
Homo sapiens chromosome 18, clone RP11-529J17, complete sequence
Homo sapiens chromosome 18, clone CTD-2504O24, complete sequence
...

So my question is :
what are these sequences?(cds or genome seq?)
Are my samples contaminated?


what causes the extreme low mapping rate from sample
sample2_H04C3ALXX_L1 2.49%
sample2_H04C3ALXX_L2 2.17%
sample3_H04B1ALXX_L5 2.49%
sample3_H04B1ALXX_L6 3.21%
, samples or software?

Any comment will be greatly appreciated, thank you very much!

Last edited by zinky; 11-05-2014 at 04:43 AM.
zinky is offline   Reply With Quote
Old 11-05-2014, 08:34 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It would help if you run FastQC and post the output, as well as your QC steps, and mapping command line. As it stands, the reason could be anything.
Brian Bushnell is offline   Reply With Quote
Old 11-05-2014, 05:59 PM   #3
zinky
Member
 
Location: china

Join Date: Dec 2011
Posts: 48
Default

I use NGS QC Toolkit to do QC, and the result shows that more than 80% of reads are high quality filtered reads. So I do the mapping step. My mapping commond lines are:
bwa aln -t 5 genome.fa file_1.fastq > file_1.fastq.sai
bwa aln -t 5 genome.fa file_2.fastq > file_2.fastq.sai
bwa sampe -A -a 600 -r '@RG\tID:noID\tPL:ILLUMINA\tLB:noLB\tSM:"file"' genome file_1.fastq.sai file_2.fastq.sai file_1.fastq file_2.fastq > file.sam
zinky is offline   Reply With Quote
Old 11-05-2014, 06:07 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

You may have short inserts and thus high adapter contamination. You can get an insert size distribution with BBMerge, like this:

bbmerge.sh in1=file_1.fastq in2=file_2.fastq ihist=ihist.txt

If a lot of reads have insert sizes shorter than read length, that will indicate adapter contamination which needs to be removed (e.g. with BBDuk).

Also, I don't recommend bwa aln, particularly in recent versions of bwa. You will achieve higher speed and accuracy with bwa mem or BBMap, which can also generate some useful diagnostic plots (such as mhist).

But I still recommend you post FastQC results.
Brian Bushnell is offline   Reply With Quote
Old 11-05-2014, 06:39 PM   #5
zinky
Member
 
Location: china

Join Date: Dec 2011
Posts: 48
Default

thanks for your suggestion,I have asked the sequence stuff and got insert size information : 350bp .so my parameter -a was set 600 to tolerate extra larger insert size aiming improve mapping rate. before that,i used fastQc to estimate reads quality either. the qc report was good,which suggested no index contamination(green kmer distribution and green overrepresent sequence)and high sequencing quality.
ps:i don't know why mypictures can not be uploaded here.

so i doubt whether the sample was mixed with none human-soured DNA as i metioned above(actually,i don't what they are).
Also, i will try the tools you suggested,thanks Brain .
zinky is offline   Reply With Quote
Reply

Tags
wgs

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:28 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO