Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quality encoding PE reads

    Hi!
    I am just checking the quality of a PE sequencing run. What I found it's a little bit tricky since from a PE reads, R1 seems to use a Phred33 encoding system whereas R2 is using a Phred64. However both reads seem to have been sequenced at once.
    Any idea? Do you know that it should considered for further steps like trimming?
    Thanks a lot.

    150807_SND405_A_L003_GZX-17_R1.fastq.gz
    This file looks like Sanger/Illumina 1.8+ format.
    @HISEQ:157:C6U61ANXX:3:1101:1680:2160 1:N:0:GTCCGCACTCTTTTCC

    150807_SND405_A_L003_GZX-17_R2.fastq.gz
    This file looks like Solexa/Illumina1.3+/Illumina1.5+ format.
    @HISEQ:157:C6U61ANXX:3:1101:1680:2160 2:N:0:GTCCGCACTCTTTTCC

  • #2
    This looks like a recent run (based on the time stamp) so having two separate encodings for the two reads is highly unlikely (unless someone deliberately changed the encoding).

    You can easily test the Q-score encoding format by using BBMap like this

    Code:
    $ testformat.sh in=seq.fq.gz

    Comment


    • #3
      Are all the reads in the R1 and R2 files the same length?

      I think sometimes if the reads have been trimmed to remove low quality bases, symbols representing low values will be missing, and the file may look like one from a different quality encoding.

      Have you run the file through FastQC, and is it FastQC that has decided the R1 and R2 files have different quality encoding?

      Comment


      • #4
        Hi mastal and GenoMax,
        Thank you for your answers. It seems a bug from the perl script fastqFormatDetect.pl I used to predict the encoding system of my raw sequences since FASTQC and bbmap agree setting a Phred33 enconding system (Illumina 1.8). I got the script from https://github.com/mel-astar/mel-ngs...aster/scripts/

        150807_SND405_A_L003_GZX-17_R1.fastq.gz
        sanger fastq gz single-ended 125bp

        150807_SND405_A_L003_GZX-17_R2.fastq.gz
        sanger fastq gz single-ended 125bp


        GenoMax, just to know.. what do you mean by time stamp?

        Sorry for any inconvenience and thank you very much.
        Regards

        Comment


        • #5
          I believe that GenoMax means the date, which appears at the beginning of your file names in YYMMDD format.

          Comment


          • #6
            Fantastic!!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            46 views
            0 likes
            Last Post seqadmin  
            Working...
            X