Seqanswers Leaderboard Ad

**nucacidhunter** · 03-02-2018, 05:28 PM

Could you post the whole sequences of a few full reads (both pairs if paired-end) with the G/A string.

**GenoMax** · 03-03-2018, 08:47 AM

@ksw9: Take a look at this blog post.

**ksw9** · 03-07-2018, 11:39 AM

Thank you both for fast replies!

@GenoMax, yes this is consistent with my issue. My data have high-Q poly-G strings. However, they're additionally preceded by poly-A strings. I see there's an option to trim high-Q poly-G strings with cutadapt (trim-nextseq), but this does not remove the poly-A strings. I plan to run an additional cutadapt run to trim these sequences - does that make sense? Is this implemented in trim_galore?

@nucacidhunter, posting example paired-end reads below.

Code:

# R1
@NB501727:39:HWKGCAFXX:1:11101:4074:1056 1:N:0:TTGGTATG+NAGGACAC
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCTTGGTATGATCTCGTATGCCGTCTACTGCTTGAAAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
AAAAAEEEEEEEEEEEAEEEEEEEAE6/EAEEEE<EEEE//EEE//EE/EE/EA//E6EE/EEEE//////E/EEAEEEEEEE/AE/EEEA/E6AAE/A/66/////////////EAA/E//////AA/E//EAA<E/<EE/EEAEE///A
--
@NB501727:39:HWKGCAFXX:1:11101:21440:1247 1:N:0:TTGGTATG+AAGGACAC
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCTTGGTATGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6AEEAEEEEEEEEEEEEEAEEEEEEEEEEEE/E/EEEEEEEAEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEAEEEEEEEE<EEE<EA/EEAEEE<EEE
# R2
@NB501727:39:HWKGCAFXX:1:11101:4074:1056 2:N:0:TTGGTATG+NAGGACAC
GGGGGGGGGAAGGGGGGGGGAGGGGAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
AA//A6/A////AA//A//6////E///////E/6/E<EE6/A//E////E///E/A///</6/A/A//<E/E/EEAEEE/E/AAE/<//A//66//6/////<//AE//E//E//EA///A////EE/<</</EEE///AE/<<E////<
--
@NB501727:39:HWKGCAFXX:1:11101:21440:1247 2:N:0:TTGGTATG+AAGGACAC
GGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
AAA6A6EE6A6A/66AA66////////EA//EE/EEEAE/EAA//EEEE/E/E//A////A///A//AEEE/EEAEEEEEEEEE/////AAE/EEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEE<A

**ksw9** · 03-07-2018, 12:07 PM

I've found that cutadapt works well in filtering poly-G strings, however, there are some ~200) remaining reads with poly-G strings longer than found in my reference genome (length = 10). I am wondering if I should remove any reads with these added G strings before mapping because of this known issue with NextSeq data?

Thank you!

**GenoMax** · 03-07-2018, 02:02 PM

Most aligners should soft clip data that does not match so you could try doing the alignments to see what you get.

**ksw9** · 03-07-2018, 02:05 PM

Good point, I will use cutadapt and then map, thank you!

**nucacidhunter** · 03-07-2018, 11:24 PM

Both R1s are adapter sequences (I can see the index as well) suggesting that these are from adapter-dimers. After reading through adapter, polyA spacer has been sequenced and then there is no base so lack of signal has been translated to Gs.

I am not sure how to explain R2 sequences.

**ksw9** · 03-08-2018, 12:10 PM

Thank you for checking. I've used trim_galore to remove adapter sequences and cutadapt to filter for poly-G strings. Even after these steps, the poly-G strings are included in about 0.1 % of my reads. I am planning to continue on to mapping unless there is a better way to deal with this issue that you'd recommend?

Thanks!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Illumina NextSeq: Overrepresented sequences, strings of G's, A's

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News