Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • "#" in illumina reads fastq quality line

    Hi all,

    I am using prinseq to trim low quality tail of illumina reads. From the manual, I found -trim_qual_left and -trim_qual_right can trim seq by quality score from the 5' or 3' -end with certain threshold score. So, here are parameters I used:

    Code:
    #trim 
    -trim_qual_right 33
    -trim_qual_left 33
    I am not sure whether I chose correct parameter and whether 33 is too high.

    In addition, I also used below two to trim polyA/T
    Code:
     
    -trim_tail_right 10
    -trim_tail_left 10
    When program was running, I checked the filtered out reads in so called 'bad_reads" file, I found the quality lines of most of reads contain a very long "#########", some of them even have entire line of #, e.g:

    Code:
    @HWI-ST538:217:C0NFWACXX:4:1101:19625:1943 1:Y:0:GAGTGG
    CNCGTCCCTTGATATGTTGTAATTCGTCTTTCATTTCCATTATGATGGCATCTGCAGCATCCTGCCAGAGACCTTTCAGATGAATATTTTCTTGCTGCAA
    +
    ####################################################################################################
    @HWI-ST538:217:C0NFWACXX:4:1101:20481:1941 1:Y:0:GAGTGG
    TNCATACTTTCGTTCCTTTCTCTTTATACGGATCGACTTCGTTCCAAGCTGTGGGAATCTTGACCGTGTTGTGCATCAGGGGTCATCTGCTTCGGTCATT
    +
    3#02===@8<@?@7:=@?)>:>><>>@?9???8?4((--<(97<;):)7>7>???9?>???>)<>=99=?##############################
    @HWI-ST538:217:C0NFWACXX:4:1101:20349:1946 1:Y:0:GAGTGG
    CNGCGCTGCTGCCAACTAGTAAAGGAAGTATTCATTAAAATGCAGGGAGACCGCAGGAATGGGGACATGTTCCCCTTTGGGGACCCTTTTGGCAGCTTCG
    +
    ;#0@-55=?<>>>??9??>.8=9>@>@<?>?=?>?>????>?>?<<=5=???<??<?9>?########################################
    I am not quite clear the meaning of symbols in quality. Does multiple "#" really mean these sequences are bad? In "good reads" file, none of read contain "#" in its quality. I am afraid that I did anything wrongly and discard read which should be kept.

    The version is illumina 1.9 based on fastQC.

    Any advice is highly appreciated.

    -alice
    Last edited by doublealice; 06-09-2012, 03:18 PM.

  • #2
    Since you have data from Illumina 1.9 this is using the Sanger FASTQ encoding, so '#' (ASCII 66) means PHRED quality 2 which has a special meaning with Illumina as the "Read Segment Quality Control Indicator". Under the old Illumina FASTQ encoding Q2 was a 'B'.

    See http://seqanswers.com/forums/showpos...91&postcount=3

    i.e. the run of PHRED 2 means the read failed in some specific way according to the Illumina software. Even without this knowledge, PHRED quality 2 is very bad and should be clipped/discarded.

    Comment


    • #3
      maubp, thanks very much. This is very helpful.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X