SeqPrep missing merged reads between 40-50bp - any suggestions?

jessicaathomas

Junior Member

Join Date: Jan 2013

Posts: 2
- Share
- Tweet
#1

SeqPrep missing merged reads between 40-50bp - any suggestions?

04-07-2016, 04:36 AM

Hello, I was wondering if anyone could help me?

I've been trying to adapter trim and merge my dataset using Seqprep, but when I plot the read lengths after merging, I'm missing most of the reads between 40 and 50bp. I can't work out why, or whether I'm doing something wrong!

So: read length plots resemble this:

()

I'm running SeqPrep as follows:

SeqPrep -f L120_1.qual.fastq -r L120_2_.qual.fastq -1 L120-R1.qual.unmerged.fastq -2 L120-R2.qual.unmerged.fastq -3 L120_NeutCap_2-R1.qual.discarded.fastq -4 L120_NeutCap_2-R2.qual.discarded.fastq -L 30 -q 15 -A AGATCGGAAGAGCACACGTC -B GGAAGAGCGTCGTGTAGGGA -s L120_NeutCap_2.qual.merged.fastq -E L120_NeutCap_2.qual.readable_alignment.txt -o 10

You'll notice that while the first adapter is the standard illumina one, but the second is a modified one, missing the first 5 bp. You can see both adapters present in the file if you grep the sequences (indicated below in bold)…

Read1 quality trimmed, L120_2 above:

@HISEQ:268:C8TMGANXX:2:1101:1430:1965 1:N:0:NTCGTCGGNCGCAACG
CAGGCACTCCCTGGAAACTCTAAGGGGCAGTTCTACTCTAGATCGGAAGA
+
A@B0BGGGGGGGCFGGGGGGGGGGGEGGGGGGGGGGCGG@1E@FGD/CEF
@HISEQ:268:C8TMGANXX:2:1101:1457:1992 1:N:0:TTCGTCGGNCGCAACG
CTAGACCGCGAATACACACAAGATCGGAAGAGCACACGTCTGAACTCCAG
+
33<<BGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGBGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1684:1955 1:N:0:TTCGTCGGCCGCAACG
NTGATATGTCCGGAGTGCATCGTATGGCGCTTTCAATGAATTTGAGATCG
+
#3<<@EGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1619:1977 1:N:0:TTCGTCGGCCGCAACG
CGGTGCCATCGAGCCTGTTCTGTCTCATAGTGACCCTAGATCGGAAGAGC
+
33@>@GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1574:1983 1:N:0:TTCGTCGGCCGCAACG
CCATCCTAGTGGGGGGAAATAGATCGGAAGAGCACACGTCTGAACTCCAA
+
<330<E1EFFCGGGGGFGECDGEGGFGBDCDDGEGGGGCD0DDCDG=EBC

Read 2, quality trimmed, for L120_2 above.

@HISEQ:268:C8TMGANXX:2:1101:1430:1965 2:N:0:NTCGTCGGNCGCAACG
AGAGTAGAACTGCCCCNNNNAGTTTCCAGGGAGTGCCTGGGAAGAGCGTC
+
BB@BBGGDFGGGGGGG####==EFGDFFGGGGGGGGGGGGEGGGGGGGGF
@HISEQ:268:C8TMGANXX:2:1101:1457:1992 2:N:0:TTCGTCGGNCGCAACG
TGTGTGTATTCGCGGTCTATGGAAGAGCGTCGTGTAGGGAAAGAGTGTCG
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1684:1955 2:N:0:TTCGTCGGCCGCAACG
CAAATTCATTGAAAGNNNNNTACGATGCACTCCGGACATATCATGGAAGA
+
CCCCCGGGGGGGGGG#####@=EFGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1619:1977 2:N:0:TTCGTCGGCCGCAACG
AGGGTCACTATGAGACAGAACAGGCTCGATGGCACCTGGAAGAGCGTCGT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1574:1983 2:N:0:TTCGTCGGCCGCAACG
ATTTCCCCCCACTAGGATGTGGAAGAGCGTCGTGTAGGGAAAGAGTGTCG
+
BCCCCGGGGGDGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGFG

The only time I've seen such a dip is when I got the adapter sequences wrong in the SeqPrep command. When I corrected them it went away. But I think the adapter sequences are correct, so I can't explain why there's a dip in the read length frequency. Is this a quirk of SeqPrep? Can anyone offer any explanation?

I'd be very grateful of any help!
Many thanks.
Tags: None
jessicaathomas

Junior Member

Join Date: Jan 2013

Posts: 2
- Share
- Tweet
#2

04-07-2016, 04:50 AM

I should also add, that the depth of this dip differs between samples (i.e. some sample have barely any reads between 40 and 50bp, whereas some have hardly any missing). The only thing which differs between samples is the 8bp index, found within the adapter sequence. I'm not sure how Seqprep removes the adapter sequence, but I don't think this should affect it? Again, any thoughts welcome.
Comment

Previous template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, Yesterday, 07:03 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 36 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 43 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 38 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

SeqPrep missing merged reads between 40-50bp - any suggestions?

Comment

Latest Articles

ad_right_rmr

News