Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exome quality

    Hello,
    I'm trying to analyze my first result of exome sequencing and am having some problems.
    I ran one cloud analaysis (almost the default that I'm was reading here: FastQC, ngsqctookkit, bwa, samtools, Picard, GATK, NGSrich, ANNOVAR and Wesparser) via WEP's site. Although the software had informed me that the result was 200x coverage (I believe it had only considered the number of nucleotides sequenced divided by size of exome - ~ 6,000,000) some statistics not reported the same thing.

    The first impression of File FastQC (file1.pdf attached) was good, high phred, many reads etc, however its gave two flags: GC content and sequence duplication levels. What is the real impact of this second statistics?
    The Performance of Sample Enrichment file file3 (file2.pdf attached) told me that just 36.75% of the exons had coverage over than 30x. In the same file exists a table with several genes that were not covered. I would like some help, if these results are correct, the analysis may have been done wrong... in summary what can I do? By my calculations ~ 8% of total genes were not covered. With all this, I'm concerned about the confidence of my results.

    I appreciate your attention.
    Attached Files
    Thanks,
    Felipe

  • #2
    Hey famarques,

    file 1:-
    your sequencing fastqc report is pretty good.
    you need worry at all.
    all bases quality score ranges above 30 expect the last few.

    --> 1)Regarding GC content, that is ok. the symbol represents just a warning.
    --> 2) your duplication levels. that you need not worry. the fastqc by default cannot give 100 % confidence values on sequence duplication.

    --> when you perform the analysis, once you proceed with samtools, you can remove duplicates using either samtools rmdup
    or picard Markduplicates options etc...


    file.2:-
    i really have no much idea. on that.
    by the way - could you let me know. how do you get those statistics. which tool have you used for that.

    Comment


    • #3
      Hello vishnuamaram,
      Thanks for reply.

      File2:
      Those Statistics and metrics analysis were did using NGSrich (0.7.8) from BAM filtered files
      The main problem is: Why many genes were not coverage, since I had good quality in my sequence as well as a large amount of reads?

      I got from a cloud analysis in http://epigen.hpc.cineca.it/wep/index.php. They have a pipeline of exomes analysis.

      Look how they described his tool:

      "The WEP resource performs a complete whole-exome sequencing pipeline and provides easy access through interface to intermediate and final results.

      The pipeline is composed of several steps:
      Verification of input integrity, quality checks, read trimming and primer contamination removal;
      Gapped alignment;
      BAM conversion, sorting and indexing;
      Duplicates removal, as they result as PCR amplification bias;
      A local realignment around known IN-DELs position, carried on to delete the other artifacts;
      Quality score recalibration to refine some oddness caused by sequencing and mapping on quality scores;
      Variants (SNV and DIP) calling from the filtered mapping data obtained from the previous steps;
      Association of as many annotation as possible to the variant list (i.e. annotation stored in database like dbSNP, 1000 Genomes Project, etc.);
      Data post processing: raw outputs are parsed and stored into custom databases to allow cross-linking and intersections, statistics and much more.
      Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and user-friendly web pages for annotated variant visualization.

      Non-IT mastered users can access through WEP to the most updated and tested whole exome sequencing algorithms, ad-hoc tuned to maximize the quality of variants called while minimizing artifacts and false positives."
      Thanks,
      Felipe

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      69 views
      0 likes
      Last Post seqadmin  
      Working...
      X