![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
reason for low mapping rate?? | miaom | RNA Sequencing | 3 | 05-10-2014 08:25 AM |
Very low map rate while mapping to denovo assebly | flyingoyster | RNA Sequencing | 6 | 11-19-2013 06:12 PM |
Low mapping rate for RNAseq 2x150 trimmed data | Markovia | Bioinformatics | 0 | 08-30-2013 07:55 PM |
The low mapping rate | vivienne_lovely | Bioinformatics | 7 | 06-05-2013 06:45 PM |
General mapping rate of human resequencing data against reference in GAiix/Hiseq | cybog337 | Illumina/Solexa | 2 | 01-12-2011 09:43 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: china Join Date: Dec 2011
Posts: 48
|
![]()
Hi, everyone
I got four samples' Human WGS data few days before to identify the variants as well as CNVs. After QC and mapping steps of my analysis workflow, I found each sample's mapping rate is in a vary low level which listed as follows: samples mapping rate sample1_H04C3ALXX_L4 57.62% sample1_H04C3ALXX_L5 8.67% sample1_H04C3ALXX_L6 13.68% sample1_H04C3ALXX_L7 26.78% sample1_H04C3ALXX_L8 28.19% sample2_H04C3ALXX_L1 2.49% sample2_H04C3ALXX_L2 2.17% sample2_H04C3ALXX_L3 31.80% sample2_H04C3ALXX_L4 32.57% sample2_H04C3ALXX_L5 31.81% sample2_H04C3ALXX_L6 31.63% sample2_H04C3ALXX_L7 31.87% sample2_H04C3ALXX_L8 31.81% sample3_H04B1ALXX_L3 4.36% sample3_H04B1ALXX_L4 59.36% sample3_H04B1ALXX_L5 2.49% sample3_H04B1ALXX_L6 3.21% sample4_H04C3ALXX_L5 27.06% sample4_H04C3ALXX_L6 26.67% sample4_H04C3ALXX_L7 27.52% sample4_H04C3ALXX_L8 27.79% sample4_H04C3ALXX_L1 14.82% sample4_H04C3ALXX_L2 13.96% sample4_H04C3ALXX_L3 24.75% sample4_H04C3ALXX_L4 24.75% The mapping software was BWA with its version 0.7.10-r789 To figure out why so little rate was generated, I randomly picked 1000 unmaped reads and performed a blast analysis against nt library. Each read output a best hit result, and most aligned sequences are human clone fragments like: Human DNA sequence from clone RP3-376K6, complete sequence Homo sapiens Chromosome 16 BAC clone CIT987SK-A-926E7, complete sequence Homo sapiens chromosome 18, clone RP11-529J17, complete sequence Homo sapiens chromosome 18, clone CTD-2504O24, complete sequence ... So my question is : what are these sequences?(cds or genome seq?) Are my samples contaminated? what causes the extreme low mapping rate from sample sample2_H04C3ALXX_L1 2.49% sample2_H04C3ALXX_L2 2.17% sample3_H04B1ALXX_L5 2.49% sample3_H04B1ALXX_L6 3.21% , samples or software? Any comment will be greatly appreciated, thank you very much! ![]() Last edited by zinky; 11-05-2014 at 05:43 AM. |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
It would help if you run FastQC and post the output, as well as your QC steps, and mapping command line. As it stands, the reason could be anything.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: china Join Date: Dec 2011
Posts: 48
|
![]()
I use NGS QC Toolkit to do QC, and the result shows that more than 80% of reads are high quality filtered reads. So I do the mapping step. My mapping commond lines are:
bwa aln -t 5 genome.fa file_1.fastq > file_1.fastq.sai bwa aln -t 5 genome.fa file_2.fastq > file_2.fastq.sai bwa sampe -A -a 600 -r '@RG\tID:noID\tPL:ILLUMINA\tLB:noLB\tSM:"file"' genome file_1.fastq.sai file_2.fastq.sai file_1.fastq file_2.fastq > file.sam |
![]() |
![]() |
![]() |
#4 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
You may have short inserts and thus high adapter contamination. You can get an insert size distribution with BBMerge, like this:
bbmerge.sh in1=file_1.fastq in2=file_2.fastq ihist=ihist.txt If a lot of reads have insert sizes shorter than read length, that will indicate adapter contamination which needs to be removed (e.g. with BBDuk). Also, I don't recommend bwa aln, particularly in recent versions of bwa. You will achieve higher speed and accuracy with bwa mem or BBMap, which can also generate some useful diagnostic plots (such as mhist). But I still recommend you post FastQC results. |
![]() |
![]() |
![]() |
#5 |
Member
Location: china Join Date: Dec 2011
Posts: 48
|
![]()
thanks for your suggestion,I have asked the sequence stuff and got insert size information : 350bp .so my parameter -a was set 600 to tolerate extra larger insert size aiming improve mapping rate. before that,i used fastQc to estimate reads quality either. the qc report was good,which suggested no index contamination(green kmer distribution and green overrepresent sequence)and high sequencing quality.
ps:i don't know why mypictures can not be uploaded here. so i doubt whether the sample was mixed with none human-soured DNA as i metioned above(actually,i don't what they are). Also, i will try the tools you suggested,thanks Brain . |
![]() |
![]() |
![]() |
Tags |
wgs |
Thread Tools | |
|
|