SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Quality encoding PE reads (http://seqanswers.com/forums/showthread.php?t=65531)

jordi 01-13-2016 03:20 AM

Quality encoding PE reads
 
Hi!
I am just checking the quality of a PE sequencing run. What I found it's a little bit tricky since from a PE reads, R1 seems to use a Phred33 encoding system whereas R2 is using a Phred64. However both reads seem to have been sequenced at once.
Any idea? Do you know that it should considered for further steps like trimming?
Thanks a lot.

150807_SND405_A_L003_GZX-17_R1.fastq.gz
This file looks like Sanger/Illumina 1.8+ format.
@HISEQ:157:C6U61ANXX:3:1101:1680:2160 1:N:0:GTCCGCACTCTTTTCC

150807_SND405_A_L003_GZX-17_R2.fastq.gz
This file looks like Solexa/Illumina1.3+/Illumina1.5+ format.
@HISEQ:157:C6U61ANXX:3:1101:1680:2160 2:N:0:GTCCGCACTCTTTTCC

GenoMax 01-13-2016 03:54 AM

This looks like a recent run (based on the time stamp) so having two separate encodings for the two reads is highly unlikely (unless someone deliberately changed the encoding).

You can easily test the Q-score encoding format by using BBMap like this

Code:

$ testformat.sh in=seq.fq.gz

mastal 01-13-2016 04:03 AM

Are all the reads in the R1 and R2 files the same length?

I think sometimes if the reads have been trimmed to remove low quality bases, symbols representing low values will be missing, and the file may look like one from a different quality encoding.

Have you run the file through FastQC, and is it FastQC that has decided the R1 and R2 files have different quality encoding?

jordi 01-13-2016 04:20 AM

Hi mastal and GenoMax,
Thank you for your answers. It seems a bug from the perl script fastqFormatDetect.pl I used to predict the encoding system of my raw sequences since FASTQC and bbmap agree setting a Phred33 enconding system (Illumina 1.8). I got the script from https://github.com/mel-astar/mel-ngs...aster/scripts/

150807_SND405_A_L003_GZX-17_R1.fastq.gz
sanger fastq gz single-ended 125bp

150807_SND405_A_L003_GZX-17_R2.fastq.gz
sanger fastq gz single-ended 125bp


GenoMax, just to know.. what do you mean by time stamp?

Sorry for any inconvenience and thank you very much.
Regards

HESmith 01-13-2016 04:33 AM

I believe that GenoMax means the date, which appears at the beginning of your file names in YYMMDD format.

jordi 01-13-2016 04:38 AM

:D Fantastic!!


All times are GMT -8. The time now is 01:16 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.