Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Human whole-genome sequencing data analysis with low mapping rate

    Hi, everyone
    I got four samples' Human WGS data few days before to identify the variants as well as CNVs. After QC and mapping steps of my analysis workflow, I found each sample's mapping rate is in a vary low level which listed as follows:

    samples mapping rate

    sample1_H04C3ALXX_L4 57.62%
    sample1_H04C3ALXX_L5 8.67%
    sample1_H04C3ALXX_L6 13.68%
    sample1_H04C3ALXX_L7 26.78%
    sample1_H04C3ALXX_L8 28.19%

    sample2_H04C3ALXX_L1 2.49%
    sample2_H04C3ALXX_L2 2.17%
    sample2_H04C3ALXX_L3 31.80%
    sample2_H04C3ALXX_L4 32.57%
    sample2_H04C3ALXX_L5 31.81%
    sample2_H04C3ALXX_L6 31.63%
    sample2_H04C3ALXX_L7 31.87%
    sample2_H04C3ALXX_L8 31.81%

    sample3_H04B1ALXX_L3 4.36%
    sample3_H04B1ALXX_L4 59.36%
    sample3_H04B1ALXX_L5 2.49%
    sample3_H04B1ALXX_L6 3.21%

    sample4_H04C3ALXX_L5 27.06%
    sample4_H04C3ALXX_L6 26.67%
    sample4_H04C3ALXX_L7 27.52%
    sample4_H04C3ALXX_L8 27.79%
    sample4_H04C3ALXX_L1 14.82%
    sample4_H04C3ALXX_L2 13.96%
    sample4_H04C3ALXX_L3 24.75%
    sample4_H04C3ALXX_L4 24.75%

    The mapping software was BWA with its version 0.7.10-r789

    To figure out why so little rate was generated, I randomly picked 1000 unmaped reads and performed a blast analysis against nt library. Each read output a best hit result, and most aligned sequences are human clone fragments like:

    Human DNA sequence from clone RP3-376K6, complete sequence
    Homo sapiens Chromosome 16 BAC clone CIT987SK-A-926E7, complete sequence
    Homo sapiens chromosome 18, clone RP11-529J17, complete sequence
    Homo sapiens chromosome 18, clone CTD-2504O24, complete sequence
    ...

    So my question is :
    what are these sequences?(cds or genome seq?)
    Are my samples contaminated?


    what causes the extreme low mapping rate from sample
    sample2_H04C3ALXX_L1 2.49%
    sample2_H04C3ALXX_L2 2.17%
    sample3_H04B1ALXX_L5 2.49%
    sample3_H04B1ALXX_L6 3.21%
    , samples or software?

    Any comment will be greatly appreciated, thank you very much!
    Last edited by zinky; 11-05-2014, 05:43 AM.

  • #2
    It would help if you run FastQC and post the output, as well as your QC steps, and mapping command line. As it stands, the reason could be anything.

    Comment


    • #3
      I use NGS QC Toolkit to do QC, and the result shows that more than 80% of reads are high quality filtered reads. So I do the mapping step. My mapping commond lines are:
      bwa aln -t 5 genome.fa file_1.fastq > file_1.fastq.sai
      bwa aln -t 5 genome.fa file_2.fastq > file_2.fastq.sai
      bwa sampe -A -a 600 -r '@RG\tID:noID\tPL:ILLUMINA\tLB:noLB\tSM:"file"' genome file_1.fastq.sai file_2.fastq.sai file_1.fastq file_2.fastq > file.sam

      Comment


      • #4
        You may have short inserts and thus high adapter contamination. You can get an insert size distribution with BBMerge, like this:

        bbmerge.sh in1=file_1.fastq in2=file_2.fastq ihist=ihist.txt

        If a lot of reads have insert sizes shorter than read length, that will indicate adapter contamination which needs to be removed (e.g. with BBDuk).

        Also, I don't recommend bwa aln, particularly in recent versions of bwa. You will achieve higher speed and accuracy with bwa mem or BBMap, which can also generate some useful diagnostic plots (such as mhist).

        But I still recommend you post FastQC results.

        Comment


        • #5
          thanks for your suggestion,I have asked the sequence stuff and got insert size information : 350bp .so my parameter -a was set 600 to tolerate extra larger insert size aiming improve mapping rate. before that,i used fastQc to estimate reads quality either. the qc report was good,which suggested no index contamination(green kmer distribution and green overrepresent sequence)and high sequencing quality.
          ps:i don't know why mypictures can not be uploaded here.

          so i doubt whether the sample was mixed with none human-soured DNA as i metioned above(actually,i don't what they are).
          Also, i will try the tools you suggested,thanks Brain .

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Understanding Genetic Influence on Infectious Disease
            by seqadmin




            During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

            Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
            09-09-2024, 10:59 AM
          • seqadmin
            Addressing Off-Target Effects in CRISPR Technologies
            by seqadmin






            The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
            08-27-2024, 04:44 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 06:25 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 01:02 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-18-2024, 06:39 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-11-2024, 02:44 PM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Working...
          X