Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • poly-G in NextSeq

    Hi,
    I just received NextSeq paired-end results (45 bp 1st read and 40 bp second read) and I noticed (using fastQC) that about 1-2% of the second read is poly-G. I known that G has no "color" so it probably means that these spots are not detected in the paired run but what is the cause for that? Is it common to get this number of failing paired reads? Have someone ran into this before?
    Thanks
    By the way, the first read also contains poly-G but for very few reads.

  • #2
    Hi Asaf

    I am also noticing this in our databasets. This is my first time analysing data from NextSeq and FastQC says that in Read 2, there is overrepresented poly G sequences.

    Did you figure out what was going on?

    Comment


    • #3
      I emailed Illumina's representatives here in Israel but didn't get an answer. I think that the explanation I gave above is reasonable (maybe low efficiency of RT in the cluster?). With v.2 chemistry we had better results but we only ran 1 sample so I can't tell for sure.
      What I do is remove reads that have more than 80% G's and/or use DUST filter to remove low complexity reads. Beware that besides poly-G you'll probably have poly-G with some other nucleotides randomly appearing in the sequence (which might even map to the genome) this is why I remove them before mapping.

      Comment


      • #4
        Such tool is available on github

        There is a tool available on Github for removing PolyA, PolyT, PolyC, PolyG

        Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data - OpenGene/AfterQC


        Automatic Filtering, Trimming, and Error Removing for fastq data
        Currently it supports Illumina 1.8 or newer format
        AFTER can simply go through all fastq files in a folder and then output a good folder and a bad folder, which contains good reads and bad reads of each fastq file

        Besides remove PolyX, it also can do:
        Trim reads at front and tail according to bad per base sequence content
        Detect and eliminate bubble artifact caused by sequencer due to fluid dynamics issue
        Filter low-quality reads
        Last edited by [email protected]; 12-10-2015, 12:50 AM.
        OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
        FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

        Comment


        • #5
          Use AFTER to do filtering

          AFTER works well with nextseq500 data
          Last edited by [email protected]; 08-05-2015, 12:17 AM. Reason: duplicate
          OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
          FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

          Comment


          • #6
            I have noticed the same thing with NextSeq data. Mostly poly-G, but some other homopolymers as well (even poly-N). I tried this tool After to remove these reads, but it doesn't seem to work. What other program can work with paired-end reads and remove poly-X reads?

            Comment


            • #7
              Originally posted by Holinder View Post
              I have noticed the same thing with NextSeq data. Mostly poly-G, but some other homopolymers as well (even poly-N). I tried this tool After to remove these reads, but it doesn't seem to work. What other program can work with paired-end reads and remove poly-X reads?
              What's the error did you meet when using AFTER? Let me know that and I will help you to fix it.
              OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
              FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

              Comment


              • #8
                With default settings it marked almost all the reads as bad. And good reads had a minimum length of 24 bp, however the default should have been 35 bp.

                Comment


                • #9
                  Originally posted by Holinder View Post
                  With default settings it marked almost all the reads as bad. And good reads had a minimum length of 24 bp, however the default should have been 35 bp.
                  cd to the folder contains your fastq files, and try to run with:

                  Code:
                  python after.py -f0 -t0 -s24
                  -f0 means no trimming in the front
                  -t0 means no trimming in the tail
                  -s24 means set the min read length to 24 bp
                  OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
                  FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

                  Comment


                  • #10
                    And because your read length is extreme short, you shoud set following parameters:

                    -p POLY_SIZE_LIMIT, --poly_size_limit=POLY_SIZE_LIMIT
                    if exists one polyX(polyG means GGGGGGGGG...), and its length is >= POLY_SIZE_LIMIT, then this read/pair is bad. Default is 40
                    -a ALLOW_MISMATCH_IN_POLY, --allow_mismatch_in_poly=ALLOW_MISMATCH_IN_POLY
                    the count of allowed mismatches when evaluating poly_X. Default 5 means disallow any mismatches

                    following options may work:

                    python after.py -f0 -t0 -s24 -p15 -a2

                    that means any read has a 15bp polyX, in the poly it has no more than 2 other bases, will be discarded.

                    i.e.
                    ******AAAAAAAAAATACAA****** will be treated as BAD
                    ******AAACAAAAAATACAA****** will be treated as GOOD
                    Last edited by [email protected]; 12-10-2015, 05:14 PM.
                    OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
                    FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM
                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-14-2024, 06:13 AM
                    0 responses
                    34 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-08-2024, 08:03 AM
                    0 responses
                    72 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-07-2024, 08:13 AM
                    0 responses
                    82 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-06-2024, 09:51 AM
                    0 responses
                    68 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X