Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lovenlong
    Member
    • Jan 2013
    • 16

    SOS, Question about FastQC report!

    Hi all,
    I'm newcomer to NGS field, and I would be grateful if someone could take a quick look at the FASTQC result of my NGS data (rice genome, 100 bp Paired-end, ~ 30x coverage) generated by Hiseq2000.
    For the most part of the FASTQC result seem normal, except unnormal distribution for the per sequence GC content, the per base GC content, the sequence Duplication Levels and the Kmer Content.
    Could this be due to PCR bias? Or be contaminated?
    If conduct quality filtering with Fastx_toolkit, how do I set parameters?

    Thanks a lot,
    Attached Files
  • jgibbons1
    Senior Member
    • Oct 2009
    • 135

    #2
    For the KMER and GC plots, things start to normalize around 10 bp you may want to trim the 1st 10 bp.

    As a rule of thumb I usually do the following QC and THEN run my data through FastQC:
    (1) Remove duplicate reads with Fastx_toolkit
    (2) Remove low quality bases from read set (I usually use trim_galore with a Q30 (--quality 30) filter and retain a minimum of 75% read length (so for your data --length 75))
    (3) An output option of trim_galore is to pipe directly into FastQC, so I do this, and see if my plots are still "off"
    (4) If they are, I trim the entire read set by X bp (depending on the plots)

    This should leave you with a nice quality data set regardless of your downstream analysis.

    Comment

    • lovenlong
      Member
      • Jan 2013
      • 16

      #3
      Originally posted by jgibbons1 View Post
      For the KMER and GC plots, things start to normalize around 10 bp you may want to trim the 1st 10 bp.

      As a rule of thumb I usually do the following QC and THEN run my data through FastQC:
      (1) Remove duplicate reads with Fastx_toolkit
      (2) Remove low quality bases from read set (I usually use trim_galore with a Q30 (--quality 30) filter and retain a minimum of 75% read length (so for your data --length 75))
      (3) An output option of trim_galore is to pipe directly into FastQC, so I do this, and see if my plots are still "off"
      (4) If they are, I trim the entire read set by X bp (depending on the plots)

      This should leave you with a nice quality data set regardless of your downstream analysis.
      Thanks for your suggestions.
      I've made reads filtering and trimming as you suggested, and the data looks nice now.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      16 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      34 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      36 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      24 views
      0 reactions
      Last Post SEQadmin2  
      Working...