SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pac Bio fastq file quality score encoding lankage Pacific Biosciences 7 06-19-2015 06:26 AM
base quality encoding changed after tophat2 mapping pengchy Bioinformatics 8 07-12-2014 06:55 AM
FASTQC guessing wrong quality encoding PFS Bioinformatics 14 05-21-2014 07:41 AM
Illumina quality score encoding for galaxy grooming Mshegrss General 2 03-14-2012 06:53 AM
Default quality encoding system of SAMTools&GATK dingxiaofan1 Bioinformatics 11 03-03-2011 11:27 PM

Reply
 
Thread Tools
Old 01-13-2016, 03:20 AM   #1
jordi
Member
 
Location: València, Spain

Join Date: Apr 2009
Posts: 48
Default Quality encoding PE reads

Hi!
I am just checking the quality of a PE sequencing run. What I found it's a little bit tricky since from a PE reads, R1 seems to use a Phred33 encoding system whereas R2 is using a Phred64. However both reads seem to have been sequenced at once.
Any idea? Do you know that it should considered for further steps like trimming?
Thanks a lot.

150807_SND405_A_L003_GZX-17_R1.fastq.gz
This file looks like Sanger/Illumina 1.8+ format.
@HISEQ:157:C6U61ANXX:3:1101:1680:2160 1:N:0:GTCCGCACTCTTTTCC

150807_SND405_A_L003_GZX-17_R2.fastq.gz
This file looks like Solexa/Illumina1.3+/Illumina1.5+ format.
@HISEQ:157:C6U61ANXX:3:1101:1680:2160 2:N:0:GTCCGCACTCTTTTCC
jordi is offline   Reply With Quote
Old 01-13-2016, 03:54 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

This looks like a recent run (based on the time stamp) so having two separate encodings for the two reads is highly unlikely (unless someone deliberately changed the encoding).

You can easily test the Q-score encoding format by using BBMap like this

Code:
$ testformat.sh in=seq.fq.gz
GenoMax is offline   Reply With Quote
Old 01-13-2016, 04:03 AM   #3
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Are all the reads in the R1 and R2 files the same length?

I think sometimes if the reads have been trimmed to remove low quality bases, symbols representing low values will be missing, and the file may look like one from a different quality encoding.

Have you run the file through FastQC, and is it FastQC that has decided the R1 and R2 files have different quality encoding?
mastal is offline   Reply With Quote
Old 01-13-2016, 04:20 AM   #4
jordi
Member
 
Location: València, Spain

Join Date: Apr 2009
Posts: 48
Default

Hi mastal and GenoMax,
Thank you for your answers. It seems a bug from the perl script fastqFormatDetect.pl I used to predict the encoding system of my raw sequences since FASTQC and bbmap agree setting a Phred33 enconding system (Illumina 1.8). I got the script from https://github.com/mel-astar/mel-ngs...aster/scripts/

150807_SND405_A_L003_GZX-17_R1.fastq.gz
sanger fastq gz single-ended 125bp

150807_SND405_A_L003_GZX-17_R2.fastq.gz
sanger fastq gz single-ended 125bp


GenoMax, just to know.. what do you mean by time stamp?

Sorry for any inconvenience and thank you very much.
Regards
jordi is offline   Reply With Quote
Old 01-13-2016, 04:33 AM   #5
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 503
Default

I believe that GenoMax means the date, which appears at the beginning of your file names in YYMMDD format.
HESmith is offline   Reply With Quote
Old 01-13-2016, 04:38 AM   #6
jordi
Member
 
Location: València, Spain

Join Date: Apr 2009
Posts: 48
Default

Fantastic!!
jordi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:05 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO