Seqanswers Leaderboard Ad

**vishnuamaram** · 08-29-2013, 09:54 PM

Hey famarques,

file 1:-
your sequencing fastqc report is pretty good.
you need worry at all.
all bases quality score ranges above 30 expect the last few.

--> 1)Regarding GC content, that is ok. the symbol represents just a warning.
--> 2) your duplication levels. that you need not worry. the fastqc by default cannot give 100 % confidence values on sequence duplication.

--> when you perform the analysis, once you proceed with samtools, you can remove duplicates using either samtools rmdup
or picard Markduplicates options etc...

file.2:-
i really have no much idea. on that.
by the way - could you let me know. how do you get those statistics. which tool have you used for that.

**famarques** · 08-30-2013, 04:53 AM

Hello vishnuamaram,
Thanks for reply.

File2:
Those Statistics and metrics analysis were did using NGSrich (0.7.8) from BAM filtered files
The main problem is: Why many genes were not coverage, since I had good quality in my sequence as well as a large amount of reads?

I got from a cloud analysis in http://epigen.hpc.cineca.it/wep/index.php. They have a pipeline of exomes analysis.

Look how they described his tool:

"The WEP resource performs a complete whole-exome sequencing pipeline and provides easy access through interface to intermediate and final results.

The pipeline is composed of several steps:
Verification of input integrity, quality checks, read trimming and primer contamination removal;
Gapped alignment;
BAM conversion, sorting and indexing;
Duplicates removal, as they result as PCR amplification bias;
A local realignment around known IN-DELs position, carried on to delete the other artifacts;
Quality score recalibration to refine some oddness caused by sequencing and mapping on quality scores;
Variants (SNV and DIP) calling from the filtered mapping data obtained from the previous steps;
Association of as many annotation as possible to the variant list (i.e. annotation stored in database like dbSNP, 1000 Genomes Project, etc.);
Data post processing: raw outputs are parsed and stored into custom databases to allow cross-linking and intersections, statistics and much more.
Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and user-friendly web pages for annotated variant visualization.

Non-IT mastered users can access through WEP to the most updated and tested whole exome sequencing algorithms, ad-hoc tuned to maximize the quality of variants called while minimizing artifacts and false positives."

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Exome quality

Comment

Comment

Latest Articles

ad_right_rmr

News