Seqanswers Leaderboard Ad

**nucacidhunter** · 03-02-2018, 05:28 PM

Could you post the whole sequences of a few full reads (both pairs if paired-end) with the G/A string.

**GenoMax** · 03-03-2018, 08:47 AM

@ksw9: Take a look at this blog post.

**ksw9** · 03-07-2018, 11:39 AM

Thank you both for fast replies!

@GenoMax, yes this is consistent with my issue. My data have high-Q poly-G strings. However, they're additionally preceded by poly-A strings. I see there's an option to trim high-Q poly-G strings with cutadapt (trim-nextseq), but this does not remove the poly-A strings. I plan to run an additional cutadapt run to trim these sequences - does that make sense? Is this implemented in trim_galore?

@nucacidhunter, posting example paired-end reads below.

Code:

# R1
@NB501727:39:HWKGCAFXX:1:11101:4074:1056 1:N:0:TTGGTATG+NAGGACAC
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCTTGGTATGATCTCGTATGCCGTCTACTGCTTGAAAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
AAAAAEEEEEEEEEEEAEEEEEEEAE6/EAEEEE<EEEE//EEE//EE/EE/EA//E6EE/EEEE//////E/EEAEEEEEEE/AE/EEEA/E6AAE/A/66/////////////EAA/E//////AA/E//EAA<E/<EE/EEAEE///A
--
@NB501727:39:HWKGCAFXX:1:11101:21440:1247 1:N:0:TTGGTATG+AAGGACAC
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCTTGGTATGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6AEEAEEEEEEEEEEEEEAEEEEEEEEEEEE/E/EEEEEEEAEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEAEEEEEEEE<EEE<EA/EEAEEE<EEE
# R2
@NB501727:39:HWKGCAFXX:1:11101:4074:1056 2:N:0:TTGGTATG+NAGGACAC
GGGGGGGGGAAGGGGGGGGGAGGGGAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
AA//A6/A////AA//A//6////E///////E/6/E<EE6/A//E////E///E/A///</6/A/A//<E/E/EEAEEE/E/AAE/<//A//66//6/////<//AE//E//E//EA///A////EE/<</</EEE///AE/<<E////<
--
@NB501727:39:HWKGCAFXX:1:11101:21440:1247 2:N:0:TTGGTATG+AAGGACAC
GGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
AAA6A6EE6A6A/66AA66////////EA//EE/EEEAE/EAA//EEEE/E/E//A////A///A//AEEE/EEAEEEEEEEEE/////AAE/EEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEE<A

**ksw9** · 03-07-2018, 12:07 PM

I've found that cutadapt works well in filtering poly-G strings, however, there are some ~200) remaining reads with poly-G strings longer than found in my reference genome (length = 10). I am wondering if I should remove any reads with these added G strings before mapping because of this known issue with NextSeq data?

Thank you!

**GenoMax** · 03-07-2018, 02:02 PM

Most aligners should soft clip data that does not match so you could try doing the alignments to see what you get.

**ksw9** · 03-07-2018, 02:05 PM

Good point, I will use cutadapt and then map, thank you!

**nucacidhunter** · 03-07-2018, 11:24 PM

Both R1s are adapter sequences (I can see the index as well) suggesting that these are from adapter-dimers. After reading through adapter, polyA spacer has been sequenced and then there is no base so lack of signal has been translated to Gs.

I am not sure how to explain R2 sequences.

**ksw9** · 03-08-2018, 12:10 PM

Thank you for checking. I've used trim_galore to remove adapter sequences and cutadapt to filter for poly-G strings. Even after these steps, the poly-G strings are included in about 0.1 % of my reads. I am planning to continue on to mapping unless there is a better way to deal with this issue that you'd recommend?

Thanks!

Topics	Statistics	Last Post
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 12 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 22 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM

Seqanswers Leaderboard Ad

Announcement

Illumina NextSeq: Overrepresented sequences, strings of G's, A's

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News