Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Asaf
    Member
    • Jul 2014
    • 20

    poly-G in NextSeq

    Hi,
    I just received NextSeq paired-end results (45 bp 1st read and 40 bp second read) and I noticed (using fastQC) that about 1-2% of the second read is poly-G. I known that G has no "color" so it probably means that these spots are not detected in the paired run but what is the cause for that? Is it common to get this number of failing paired reads? Have someone ran into this before?
    Thanks
    By the way, the first read also contains poly-G but for very few reads.
  • Risha
    Junior Member
    • Aug 2010
    • 4

    #2
    Hi Asaf

    I am also noticing this in our databasets. This is my first time analysing data from NextSeq and FastQC says that in Read 2, there is overrepresented poly G sequences.

    Did you figure out what was going on?

    Comment

    • Asaf
      Member
      • Jul 2014
      • 20

      #3
      I emailed Illumina's representatives here in Israel but didn't get an answer. I think that the explanation I gave above is reasonable (maybe low efficiency of RT in the cluster?). With v.2 chemistry we had better results but we only ran 1 sample so I can't tell for sure.
      What I do is remove reads that have more than 80% G's and/or use DUST filter to remove low complexity reads. Beware that besides poly-G you'll probably have poly-G with some other nucleotides randomly appearing in the sequence (which might even map to the genome) this is why I remove them before mapping.

      Comment

      • chen@haplox.com
        Member
        • Aug 2015
        • 16

        #4
        Such tool is available on github

        There is a tool available on Github for removing PolyA, PolyT, PolyC, PolyG

        Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data - OpenGene/AfterQC


        Automatic Filtering, Trimming, and Error Removing for fastq data
        Currently it supports Illumina 1.8 or newer format
        AFTER can simply go through all fastq files in a folder and then output a good folder and a bad folder, which contains good reads and bad reads of each fastq file

        Besides remove PolyX, it also can do:
        Trim reads at front and tail according to bad per base sequence content
        Detect and eliminate bubble artifact caused by sequencer due to fluid dynamics issue
        Filter low-quality reads
        Last edited by [email protected]; 12-10-2015, 12:50 AM.
        OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
        FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

        Comment

        • chen@haplox.com
          Member
          • Aug 2015
          • 16

          #5
          Use AFTER to do filtering

          AFTER works well with nextseq500 data
          Last edited by [email protected]; 08-05-2015, 12:17 AM. Reason: duplicate
          OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
          FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

          Comment

          • Holinder
            Junior Member
            • Dec 2014
            • 6

            #6
            I have noticed the same thing with NextSeq data. Mostly poly-G, but some other homopolymers as well (even poly-N). I tried this tool After to remove these reads, but it doesn't seem to work. What other program can work with paired-end reads and remove poly-X reads?

            Comment

            • chen@haplox.com
              Member
              • Aug 2015
              • 16

              #7
              Originally posted by Holinder View Post
              I have noticed the same thing with NextSeq data. Mostly poly-G, but some other homopolymers as well (even poly-N). I tried this tool After to remove these reads, but it doesn't seem to work. What other program can work with paired-end reads and remove poly-X reads?
              What's the error did you meet when using AFTER? Let me know that and I will help you to fix it.
              OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
              FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

              Comment

              • Holinder
                Junior Member
                • Dec 2014
                • 6

                #8
                With default settings it marked almost all the reads as bad. And good reads had a minimum length of 24 bp, however the default should have been 35 bp.

                Comment

                • chen@haplox.com
                  Member
                  • Aug 2015
                  • 16

                  #9
                  Originally posted by Holinder View Post
                  With default settings it marked almost all the reads as bad. And good reads had a minimum length of 24 bp, however the default should have been 35 bp.
                  cd to the folder contains your fastq files, and try to run with:

                  Code:
                  python after.py -f0 -t0 -s24
                  -f0 means no trimming in the front
                  -t0 means no trimming in the tail
                  -s24 means set the min read length to 24 bp
                  OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
                  FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

                  Comment

                  • chen@haplox.com
                    Member
                    • Aug 2015
                    • 16

                    #10
                    And because your read length is extreme short, you shoud set following parameters:

                    -p POLY_SIZE_LIMIT, --poly_size_limit=POLY_SIZE_LIMIT
                    if exists one polyX(polyG means GGGGGGGGG...), and its length is >= POLY_SIZE_LIMIT, then this read/pair is bad. Default is 40
                    -a ALLOW_MISMATCH_IN_POLY, --allow_mismatch_in_poly=ALLOW_MISMATCH_IN_POLY
                    the count of allowed mismatches when evaluating poly_X. Default 5 means disallow any mismatches

                    following options may work:

                    python after.py -f0 -t0 -s24 -p15 -a2

                    that means any read has a 15bp polyX, in the poly it has no more than 2 other bases, will be discarded.

                    i.e.
                    ******AAAAAAAAAATACAA****** will be treated as BAD
                    ******AAACAAAAAATACAA****** will be treated as GOOD
                    Last edited by [email protected]; 12-10-2015, 05:14 PM.
                    OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
                    FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

                    Comment

                    Latest Articles

                    Collapse

                    • GATTACAT
                      Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by GATTACAT
                      Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                      07-01-2026, 11:43 AM
                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 07-02-2026, 11:08 AM
                    0 responses
                    13 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-30-2026, 05:37 AM
                    0 responses
                    15 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    20 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    54 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...