SEQanswers

Go Back   SEQanswers > Introductions



Similar Threads
Thread Thread Starter Forum Replies Last Post
DNA quality for exome sequencing bruce01 Sample Prep / Library Generation 2 07-22-2013 06:02 AM
Exome sequencing quality filtering Kath Bioinformatics 1 07-20-2013 06:58 AM
filtering strategy in exome sequencing and quality control Groningen Bioinformatics 1 03-12-2013 12:40 PM
Is it necessary to trim the low quality ends before analyzing exome seq data? gary Bioinformatics 0 10-11-2012 07:00 PM
handy tool for exome sequencing quality control mrfox Bioinformatics 1 07-31-2012 01:48 PM

Reply
 
Thread Tools
Old 08-29-2013, 11:16 AM   #1
famarques
Junior Member
 
Location: Brasília, Brazil

Join Date: Mar 2013
Posts: 2
Default Exome quality

Hello,
I'm trying to analyze my first result of exome sequencing and am having some problems.
I ran one cloud analaysis (almost the default that I'm was reading here: FastQC, ngsqctookkit, bwa, samtools, Picard, GATK, NGSrich, ANNOVAR and Wesparser) via WEP's site. Although the software had informed me that the result was 200x coverage (I believe it had only considered the number of nucleotides sequenced divided by size of exome - ~ 6,000,000) some statistics not reported the same thing.

The first impression of File FastQC (file1.pdf attached) was good, high phred, many reads etc, however its gave two flags: GC content and sequence duplication levels. What is the real impact of this second statistics?
The Performance of Sample Enrichment file file3 (file2.pdf attached) told me that just 36.75% of the exons had coverage over than 30x. In the same file exists a table with several genes that were not covered. I would like some help, if these results are correct, the analysis may have been done wrong... in summary what can I do? By my calculations ~ 8% of total genes were not covered. With all this, I'm concerned about the confidence of my results.

I appreciate your attention.
Attached Files
File Type: pdf file1.pdf (305.8 KB, 37 views)
File Type: pdf file2.pdf (1.69 MB, 35 views)
__________________
Thanks,
Felipe
famarques is offline   Reply With Quote
Old 08-29-2013, 09:54 PM   #2
vishnuamaram
Member
 
Location: india

Join Date: Jun 2013
Posts: 42
Default

Hey famarques,

file 1:-
your sequencing fastqc report is pretty good.
you need worry at all.
all bases quality score ranges above 30 expect the last few.

--> 1)Regarding GC content, that is ok. the symbol represents just a warning.
--> 2) your duplication levels. that you need not worry. the fastqc by default cannot give 100 % confidence values on sequence duplication.

--> when you perform the analysis, once you proceed with samtools, you can remove duplicates using either samtools rmdup
or picard Markduplicates options etc...


file.2:-
i really have no much idea. on that.
by the way - could you let me know. how do you get those statistics. which tool have you used for that.
vishnuamaram is offline   Reply With Quote
Old 08-30-2013, 04:53 AM   #3
famarques
Junior Member
 
Location: Brasília, Brazil

Join Date: Mar 2013
Posts: 2
Default

Hello vishnuamaram,
Thanks for reply.

File2:
Those Statistics and metrics analysis were did using NGSrich (0.7.8) from BAM filtered files
The main problem is: Why many genes were not coverage, since I had good quality in my sequence as well as a large amount of reads?

I got from a cloud analysis in http://epigen.hpc.cineca.it/wep/index.php. They have a pipeline of exomes analysis.

Look how they described his tool:

"The WEP resource performs a complete whole-exome sequencing pipeline and provides easy access through interface to intermediate and final results.

The pipeline is composed of several steps:
Verification of input integrity, quality checks, read trimming and primer contamination removal;
Gapped alignment;
BAM conversion, sorting and indexing;
Duplicates removal, as they result as PCR amplification bias;
A local realignment around known IN-DELs position, carried on to delete the other artifacts;
Quality score recalibration to refine some oddness caused by sequencing and mapping on quality scores;
Variants (SNV and DIP) calling from the filtered mapping data obtained from the previous steps;
Association of as many annotation as possible to the variant list (i.e. annotation stored in database like dbSNP, 1000 Genomes Project, etc.);
Data post processing: raw outputs are parsed and stored into custom databases to allow cross-linking and intersections, statistics and much more.
Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and user-friendly web pages for annotated variant visualization.

Non-IT mastered users can access through WEP to the most updated and tested whole exome sequencing algorithms, ad-hoc tuned to maximize the quality of variants called while minimizing artifacts and false positives."
__________________
Thanks,
Felipe
famarques is offline   Reply With Quote
Reply

Tags
exome analsys, exome quality, hiseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO