SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Merging paired end reads for BLAST JJenks Bioinformatics 9 11-05-2018 10:40 AM
Merging non-overlapping paired end reads karenr Illumina/Solexa 9 12-16-2016 07:02 PM
simulating illumina paired end reads Splinter479 Bioinformatics 3 10-29-2013 06:59 AM
Merging paired end reads (R1 and R2 files) vectorborne5 Illumina/Solexa 2 10-09-2013 11:13 AM
Illumina paired-end reads... naragam General 3 06-28-2012 06:51 AM

Reply
 
Thread Tools
Old 09-09-2016, 04:24 AM   #1
rEDI
Member
 
Location: United Kingdom

Join Date: Apr 2016
Posts: 14
Default Merging illumina V4 paired end reads

Hi

I am having difficulty understanding how my merged reads are producing a certain amplicon size.

Basically, I have 2 x 251bp reads. This 251bp includes primer sequence, of as far as I understand, 20bp.

When these 251bp reads are merged, they produce an amplicon size of 291bp.

Here is an example of a merged read with amplicon length of 291bp (there are two N's in the sequence as haven't run screen.seqs yet)

>M01822_319_000000000-AG4CF_1_1101_16954_1171
GTGCCAGCCGCCGCGGTAATACATAGGATGCAAGCGTTATCCGGATTTACTGGGCGTAAAGCGAGCGCAGGCGGATTTACAAGTCTGATGTTAAAGACAACTGCTTAACGGTTGTTTGCATTGGAAACTGTAAGTCTAGAGTATAGTAGAGAGTTTTGGAACTCCATGTGGAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGAGGCGAAGGCGAAAACTTAGGCTATAACTGACGCTTAGGCTCGAAAGTGTGGGNAGCAAATAGGATTAGATACCCCGGTAGTCN

I have looked at the make.contigs report file and it seems to report that the following (if I am understanding correctly);

Length = 291bp
Overlap length = 211 bp
Total primers = 40bp

Therefore, is the read length 251bp, but merged read length 291bp (as forward and reverse primers included)?

What I don't understand is that each primer length is 20bp, so should the amplicon not be 271bp?

I know I have to remove the primers, but just trying to understand this.

Any help would be greatly appreciated.

Thank you
rEDI is offline   Reply With Quote
Old 09-09-2016, 08:29 AM   #2
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

What are your primer sequences? Are they on each end of the merged 291bp contig? From what you're saying, it sounds to me like 291 - 20 - 20 = 251...

Relatedly, you really should only have 2x250, not 2x251. The last cycle is used for quality scoring of the previous cycle.
fanli is offline   Reply With Quote
Old 09-09-2016, 08:41 AM   #3
rEDI
Member
 
Location: United Kingdom

Join Date: Apr 2016
Posts: 14
Default

Hi Fanli

Thank you for your answer

The primer seqs are as follows:

forward
GTGCCAGCCGCCGCGGTAA

reverse
GGACTACACGGGTATCTAAT

They appear to be on each end of merged contig, but each read is 251bp including the primer (so ~231bp excluding the primer).

The overlap, according to the report file after merging, indicates that there is 211bp of overlap.

Which means that the only way I can make sense of this is 211+20+20=251bp read for both forward and reverse, that has assembled into a 291bp contig.

i.e. 211+20+20 (F) +20+20 (R) = 291bp?

What do you think?

Thank you again
rEDI is offline   Reply With Quote
Old 09-09-2016, 08:54 AM   #4
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

I think we're in agreement - does the attached diagram help?
Attached Images
File Type: jpg diagram.jpg (34.7 KB, 17 views)
fanli is offline   Reply With Quote
Old 09-09-2016, 08:58 AM   #5
rEDI
Member
 
Location: United Kingdom

Join Date: Apr 2016
Posts: 14
Default

That is perfect, thank you so much for explaining Fanli. That is most helpful

One more question - do the seqs unique to R1 and R2 not merge in this case, or are they merged regardless?

Thank you again
rEDI is offline   Reply With Quote
Old 09-09-2016, 09:01 AM   #6
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

They are merged as well - the full 291bp sequence from your original post is what you get. Another way to think about this is that you are sequencing a 251bp amplicon with 20 bases on each end unique to R1 or R2 and 231 bases in the middle covered by both.
fanli is offline   Reply With Quote
Old 09-09-2016, 09:05 AM   #7
rEDI
Member
 
Location: United Kingdom

Join Date: Apr 2016
Posts: 14
Default

Thank you Fanli. In this case is it however that only 211 bases are covered by both?
rEDI is offline   Reply With Quote
Old 09-09-2016, 09:07 AM   #8
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

Sorry, yes. 211 bases in the middle - math is hard :/
fanli is offline   Reply With Quote
Old 09-09-2016, 09:10 AM   #9
rEDI
Member
 
Location: United Kingdom

Join Date: Apr 2016
Posts: 14
Default

Thank you again for your very helpful answers
rEDI is offline   Reply With Quote
Reply

Tags
16s illumina analysis, bioinformactics, illumina, microbiome, trimming reads

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO