SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > MGISEQ (FKA Complete Genomics)



Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparing raw fq files with already processed assemblies rduarte Bioinformatics 1 07-02-2015 04:09 AM
454 raw data laosha1110 454 Pyrosequencing 0 05-17-2012 10:09 AM
Combine PE and SE raw data nguyendofx Bioinformatics 3 01-03-2012 10:51 AM
how to evaluate raw data oceanxie Bioinformatics 1 04-15-2011 01:52 AM
PE data processed by bwa mingkunli Bioinformatics 2 03-26-2010 07:09 AM

Reply
 
Thread Tools
Old 12-13-2015, 06:59 AM   #1
khushal
Junior Member
 
Location: INDIA

Join Date: Nov 2015
Posts: 6
Default RAW and Processed data

I got (illumina next seq 500) 2x150 bp paired end sequenced mouse transcriptome data.
The vendor gave me a QC report of both Raw and processed data.
In RAW data the max. and min. read length is 151 bp(both) whereas in peocessed data (processed by vendor through NGS QC tool kit) the min and max. read length is 50 and 151 bp respectively.How ? my confusion is that raw data should have all the possible read lengths from 50 to 151 bp ?
Second, i did Fast QC of both, RAW and processed data. Against sequence length distribution window of FastQC,I found normal result with RAW data whereas warning was issued with processed data.

Should i go for down steam processing with RAW or processed data ?
khushal is offline   Reply With Quote
Old 12-13-2015, 10:14 AM   #2
arthurmelo
Member
 
Location: Durham, NH, US

Join Date: Jul 2012
Posts: 19
Default

Normally all reads has the same length on raw sequenced data. It explain the equal values of min and max read length. Generally the raw dataset is parsed by some bioinformatic tool like trimmomatic in order to eliminate nucleotides with low quality of base call using Phred scores. This Phred treshold could have different values, but Q30 (accepted one error by thousand of bp) is recommended to call variants and DGE analysis, for example. Its also common delete all reads based in a minimum length required and I supoosed this minimum length is 50bp in your case. So, after this initial filter the processed or parsed dataset has the quality data you should be used to forward analysis.
arthurmelo is offline   Reply With Quote
Old 12-13-2015, 10:16 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The raw data is always a single read length. The processing involves adapter and quality trimming, which is why there's then a range of lengths. You should use the processed results for most common downstream applications.
dpryan is offline   Reply With Quote
Old 12-13-2015, 10:22 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by arthurmelo View Post
Q30 (accepted one error by thousand of bp) is recommended to call variants and DGE analysis, for example.
FYI, this is no longer recommended. For DGE, trimming above Q5 is rarely beneficial. For variant calling the callers themselves take the phred score into account, so there's never a need to trim so stringently.
dpryan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO