Hi, everyone
I got four samples' Human WGS data few days before to identify the variants as well as CNVs. After QC and mapping steps of my analysis workflow, I found each sample's mapping rate is in a vary low level which listed as follows:
samples mapping rate
sample1_H04C3ALXX_L4 57.62%
sample1_H04C3ALXX_L5 8.67%
sample1_H04C3ALXX_L6 13.68%
sample1_H04C3ALXX_L7 26.78%
sample1_H04C3ALXX_L8 28.19%
sample2_H04C3ALXX_L1 2.49%
sample2_H04C3ALXX_L2 2.17%
sample2_H04C3ALXX_L3 31.80%
sample2_H04C3ALXX_L4 32.57%
sample2_H04C3ALXX_L5 31.81%
sample2_H04C3ALXX_L6 31.63%
sample2_H04C3ALXX_L7 31.87%
sample2_H04C3ALXX_L8 31.81%
sample3_H04B1ALXX_L3 4.36%
sample3_H04B1ALXX_L4 59.36%
sample3_H04B1ALXX_L5 2.49%
sample3_H04B1ALXX_L6 3.21%
sample4_H04C3ALXX_L5 27.06%
sample4_H04C3ALXX_L6 26.67%
sample4_H04C3ALXX_L7 27.52%
sample4_H04C3ALXX_L8 27.79%
sample4_H04C3ALXX_L1 14.82%
sample4_H04C3ALXX_L2 13.96%
sample4_H04C3ALXX_L3 24.75%
sample4_H04C3ALXX_L4 24.75%
The mapping software was BWA with its version 0.7.10-r789
To figure out why so little rate was generated, I randomly picked 1000 unmaped reads and performed a blast analysis against nt library. Each read output a best hit result, and most aligned sequences are human clone fragments like:
Human DNA sequence from clone RP3-376K6, complete sequence
Homo sapiens Chromosome 16 BAC clone CIT987SK-A-926E7, complete sequence
Homo sapiens chromosome 18, clone RP11-529J17, complete sequence
Homo sapiens chromosome 18, clone CTD-2504O24, complete sequence
...
So my question is :
what are these sequences?(cds or genome seq?)
Are my samples contaminated?
what causes the extreme low mapping rate from sample
sample2_H04C3ALXX_L1 2.49%
sample2_H04C3ALXX_L2 2.17%
sample3_H04B1ALXX_L5 2.49%
sample3_H04B1ALXX_L6 3.21%
, samples or software?
Any comment will be greatly appreciated, thank you very much!
I got four samples' Human WGS data few days before to identify the variants as well as CNVs. After QC and mapping steps of my analysis workflow, I found each sample's mapping rate is in a vary low level which listed as follows:
samples mapping rate
sample1_H04C3ALXX_L4 57.62%
sample1_H04C3ALXX_L5 8.67%
sample1_H04C3ALXX_L6 13.68%
sample1_H04C3ALXX_L7 26.78%
sample1_H04C3ALXX_L8 28.19%
sample2_H04C3ALXX_L1 2.49%
sample2_H04C3ALXX_L2 2.17%
sample2_H04C3ALXX_L3 31.80%
sample2_H04C3ALXX_L4 32.57%
sample2_H04C3ALXX_L5 31.81%
sample2_H04C3ALXX_L6 31.63%
sample2_H04C3ALXX_L7 31.87%
sample2_H04C3ALXX_L8 31.81%
sample3_H04B1ALXX_L3 4.36%
sample3_H04B1ALXX_L4 59.36%
sample3_H04B1ALXX_L5 2.49%
sample3_H04B1ALXX_L6 3.21%
sample4_H04C3ALXX_L5 27.06%
sample4_H04C3ALXX_L6 26.67%
sample4_H04C3ALXX_L7 27.52%
sample4_H04C3ALXX_L8 27.79%
sample4_H04C3ALXX_L1 14.82%
sample4_H04C3ALXX_L2 13.96%
sample4_H04C3ALXX_L3 24.75%
sample4_H04C3ALXX_L4 24.75%
The mapping software was BWA with its version 0.7.10-r789
To figure out why so little rate was generated, I randomly picked 1000 unmaped reads and performed a blast analysis against nt library. Each read output a best hit result, and most aligned sequences are human clone fragments like:
Human DNA sequence from clone RP3-376K6, complete sequence
Homo sapiens Chromosome 16 BAC clone CIT987SK-A-926E7, complete sequence
Homo sapiens chromosome 18, clone RP11-529J17, complete sequence
Homo sapiens chromosome 18, clone CTD-2504O24, complete sequence
...
So my question is :
what are these sequences?(cds or genome seq?)
Are my samples contaminated?
what causes the extreme low mapping rate from sample
sample2_H04C3ALXX_L1 2.49%
sample2_H04C3ALXX_L2 2.17%
sample3_H04B1ALXX_L5 2.49%
sample3_H04B1ALXX_L6 3.21%
, samples or software?
Any comment will be greatly appreciated, thank you very much!
Comment