SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
A first look at Illumina’s new NextSeq 500 AllSeq Vendor Forum 105 03-13-2017 01:39 PM
no dual indices on NextSeq 500 (yet) SeqNerd Illumina/Solexa 9 10-20-2014 11:06 AM
Picard failure on NextSeq data TonyBrooks Bioinformatics 4 09-23-2014 03:01 AM
Dual indexing on NextSeq bryanbriney Illumina/Solexa 1 06-19-2014 06:59 AM
NextSeq 500 and HiSeq X Ten Services Coming Soon to Genohub.com Genohub Vendor Forum 11 04-22-2014 08:46 AM

Reply
 
Thread Tools
Old 07-20-2014, 10:28 AM   #1
Asaf
Member
 
Location: Jerusalem

Join Date: Jul 2014
Posts: 20
Default poly-G in NextSeq

Hi,
I just received NextSeq paired-end results (45 bp 1st read and 40 bp second read) and I noticed (using fastQC) that about 1-2% of the second read is poly-G. I known that G has no "color" so it probably means that these spots are not detected in the paired run but what is the cause for that? Is it common to get this number of failing paired reads? Have someone ran into this before?
Thanks
By the way, the first read also contains poly-G but for very few reads.
Asaf is offline   Reply With Quote
Old 08-03-2015, 04:53 AM   #2
Risha
Junior Member
 
Location: London

Join Date: Aug 2010
Posts: 4
Default

Hi Asaf

I am also noticing this in our databasets. This is my first time analysing data from NextSeq and FastQC says that in Read 2, there is overrepresented poly G sequences.

Did you figure out what was going on?
Risha is offline   Reply With Quote
Old 08-03-2015, 11:38 PM   #3
Asaf
Member
 
Location: Jerusalem

Join Date: Jul 2014
Posts: 20
Default

I emailed Illumina's representatives here in Israel but didn't get an answer. I think that the explanation I gave above is reasonable (maybe low efficiency of RT in the cluster?). With v.2 chemistry we had better results but we only ran 1 sample so I can't tell for sure.
What I do is remove reads that have more than 80% G's and/or use DUST filter to remove low complexity reads. Beware that besides poly-G you'll probably have poly-G with some other nucleotides randomly appearing in the sequence (which might even map to the genome) this is why I remove them before mapping.
Asaf is offline   Reply With Quote
Old 08-05-2015, 12:07 AM   #4
Member
 
Location: Shenzhen, China

Join Date: Aug 2015
Posts: 11
Default Such tool is available on github

There is a tool available on Github for removing PolyA, PolyT, PolyC, PolyG

https://github.com/OpenGene/after

Automatic Filtering, Trimming, and Error Removing for fastq data
Currently it supports Illumina 1.8 or newer format
AFTER can simply go through all fastq files in a folder and then output a good folder and a bad folder, which contains good reads and bad reads of each fastq file

Besides remove PolyX, it also can do:
Trim reads at front and tail according to bad per base sequence content
Detect and eliminate bubble artifact caused by sequencer due to fluid dynamics issue
Filter low-quality reads

Last edited by [email protected]; 12-09-2015 at 11:50 PM.
chen@haplox.com is offline   Reply With Quote
Old 08-05-2015, 12:15 AM   #5
Member
 
Location: Shenzhen, China

Join Date: Aug 2015
Posts: 11
Default Use AFTER to do filtering

AFTER works well with nextseq500 data

Last edited by [email protected]; 08-05-2015 at 12:17 AM. Reason: duplicate
chen@haplox.com is offline   Reply With Quote
Old 10-29-2015, 12:21 AM   #6
Holinder
Junior Member
 
Location: Estonia

Join Date: Dec 2014
Posts: 6
Default

I have noticed the same thing with NextSeq data. Mostly poly-G, but some other homopolymers as well (even poly-N). I tried this tool After to remove these reads, but it doesn't seem to work. What other program can work with paired-end reads and remove poly-X reads?
Holinder is offline   Reply With Quote
Old 10-29-2015, 12:28 AM   #7
Member
 
Location: Shenzhen, China

Join Date: Aug 2015
Posts: 11
Default

Quote:
Originally Posted by Holinder View Post
I have noticed the same thing with NextSeq data. Mostly poly-G, but some other homopolymers as well (even poly-N). I tried this tool After to remove these reads, but it doesn't seem to work. What other program can work with paired-end reads and remove poly-X reads?
What's the error did you meet when using AFTER? Let me know that and I will help you to fix it.
chen@haplox.com is offline   Reply With Quote
Old 10-29-2015, 12:41 AM   #8
Holinder
Junior Member
 
Location: Estonia

Join Date: Dec 2014
Posts: 6
Default

With default settings it marked almost all the reads as bad. And good reads had a minimum length of 24 bp, however the default should have been 35 bp.
Holinder is offline   Reply With Quote
Old 10-29-2015, 01:03 AM   #9
Member
 
Location: Shenzhen, China

Join Date: Aug 2015
Posts: 11
Default

Quote:
Originally Posted by Holinder View Post
With default settings it marked almost all the reads as bad. And good reads had a minimum length of 24 bp, however the default should have been 35 bp.
cd to the folder contains your fastq files, and try to run with:

Code:
python after.py -f0 -t0 -s24
-f0 means no trimming in the front
-t0 means no trimming in the tail
-s24 means set the min read length to 24 bp
chen@haplox.com is offline   Reply With Quote
Old 10-29-2015, 01:08 AM   #10
Member
 
Location: Shenzhen, China

Join Date: Aug 2015
Posts: 11
Default

And because your read length is extreme short, you shoud set following parameters:

-p POLY_SIZE_LIMIT, --poly_size_limit=POLY_SIZE_LIMIT
if exists one polyX(polyG means GGGGGGGGG...), and its length is >= POLY_SIZE_LIMIT, then this read/pair is bad. Default is 40
-a ALLOW_MISMATCH_IN_POLY, --allow_mismatch_in_poly=ALLOW_MISMATCH_IN_POLY
the count of allowed mismatches when evaluating poly_X. Default 5 means disallow any mismatches

following options may work:

python after.py -f0 -t0 -s24 -p15 -a2

that means any read has a 15bp polyX, in the poly it has no more than 2 other bases, will be discarded.

i.e.
******AAAAAAAAAATACAA****** will be treated as BAD
******AAACAAAAAATACAA****** will be treated as GOOD

Last edited by [email protected]; 12-10-2015 at 04:14 PM.
chen@haplox.com is offline   Reply With Quote
Reply

Tags
nextseq, poly-g

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:54 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO