![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
A first look at Illumina’s new NextSeq 500 | AllSeq | Vendor Forum | 111 | 03-12-2020 03:25 AM |
no dual indices on NextSeq 500 (yet) | SeqNerd | Illumina/Solexa | 9 | 10-20-2014 12:06 PM |
Picard failure on NextSeq data | TonyBrooks | Bioinformatics | 4 | 09-23-2014 04:01 AM |
Dual indexing on NextSeq | bryanbriney | Illumina/Solexa | 1 | 06-19-2014 07:59 AM |
NextSeq 500 and HiSeq X Ten Services Coming Soon to Genohub.com | Genohub | Vendor Forum | 11 | 04-22-2014 09:46 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Jerusalem Join Date: Jul 2014
Posts: 20
|
![]()
Hi,
I just received NextSeq paired-end results (45 bp 1st read and 40 bp second read) and I noticed (using fastQC) that about 1-2% of the second read is poly-G. I known that G has no "color" so it probably means that these spots are not detected in the paired run but what is the cause for that? Is it common to get this number of failing paired reads? Have someone ran into this before? Thanks By the way, the first read also contains poly-G but for very few reads. |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: London Join Date: Aug 2010
Posts: 4
|
![]()
Hi Asaf
I am also noticing this in our databasets. This is my first time analysing data from NextSeq and FastQC says that in Read 2, there is overrepresented poly G sequences. Did you figure out what was going on? |
![]() |
![]() |
![]() |
#3 |
Member
Location: Jerusalem Join Date: Jul 2014
Posts: 20
|
![]()
I emailed Illumina's representatives here in Israel but didn't get an answer. I think that the explanation I gave above is reasonable (maybe low efficiency of RT in the cluster?). With v.2 chemistry we had better results but we only ran 1 sample so I can't tell for sure.
What I do is remove reads that have more than 80% G's and/or use DUST filter to remove low complexity reads. Beware that besides poly-G you'll probably have poly-G with some other nucleotides randomly appearing in the sequence (which might even map to the genome) this is why I remove them before mapping. |
![]() |
![]() |
![]() |
#4 |
Member
Location: Shenzhen, China Join Date: Aug 2015
Posts: 15
|
![]()
There is a tool available on Github for removing PolyA, PolyT, PolyC, PolyG
https://github.com/OpenGene/after Automatic Filtering, Trimming, and Error Removing for fastq data Currently it supports Illumina 1.8 or newer format AFTER can simply go through all fastq files in a folder and then output a good folder and a bad folder, which contains good reads and bad reads of each fastq file Besides remove PolyX, it also can do: Trim reads at front and tail according to bad per base sequence content Detect and eliminate bubble artifact caused by sequencer due to fluid dynamics issue Filter low-quality reads Last edited by chen@haplox.com; 12-10-2015 at 12:50 AM. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Shenzhen, China Join Date: Aug 2015
Posts: 15
|
![]()
AFTER works well with nextseq500 data
Last edited by chen@haplox.com; 08-05-2015 at 01:17 AM. Reason: duplicate |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Estonia Join Date: Dec 2014
Posts: 6
|
![]()
I have noticed the same thing with NextSeq data. Mostly poly-G, but some other homopolymers as well (even poly-N). I tried this tool After to remove these reads, but it doesn't seem to work. What other program can work with paired-end reads and remove poly-X reads?
|
![]() |
![]() |
![]() |
#7 | |
Member
Location: Shenzhen, China Join Date: Aug 2015
Posts: 15
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: Estonia Join Date: Dec 2014
Posts: 6
|
![]()
With default settings it marked almost all the reads as bad. And good reads had a minimum length of 24 bp, however the default should have been 35 bp.
|
![]() |
![]() |
![]() |
#9 | |
Member
Location: Shenzhen, China Join Date: Aug 2015
Posts: 15
|
![]() Quote:
Code:
python after.py -f0 -t0 -s24 -t0 means no trimming in the tail -s24 means set the min read length to 24 bp |
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Shenzhen, China Join Date: Aug 2015
Posts: 15
|
![]()
And because your read length is extreme short, you shoud set following parameters:
-p POLY_SIZE_LIMIT, --poly_size_limit=POLY_SIZE_LIMIT if exists one polyX(polyG means GGGGGGGGG...), and its length is >= POLY_SIZE_LIMIT, then this read/pair is bad. Default is 40 -a ALLOW_MISMATCH_IN_POLY, --allow_mismatch_in_poly=ALLOW_MISMATCH_IN_POLY the count of allowed mismatches when evaluating poly_X. Default 5 means disallow any mismatches following options may work: python after.py -f0 -t0 -s24 -p15 -a2 that means any read has a 15bp polyX, in the poly it has no more than 2 other bases, will be discarded. i.e. ******AAAAAAAAAATACAA****** will be treated as BAD ******AAACAAAAAATACAA****** will be treated as GOOD Last edited by chen@haplox.com; 12-10-2015 at 05:14 PM. |
![]() |
![]() |
![]() |
Tags |
nextseq, poly-g |
Thread Tools | |
|
|