Seqanswers Leaderboard Ad

**GenoMax** · 10-10-2016, 03:55 PM

If you have extremely short inserts so R1/R2 completely overlap then that is a waste of sequencing (or you like to be over cautious).

Adapters may be cut automatically by MiSeq Reporter/BaseSpace if a setting is chosen at run time. Good libraries (long inserts) should not have adapter contamination, so it is not unusual to see clean reads.

**SDPA_Pet** · 10-10-2016, 04:17 PM

Originally posted by GenoMax View Post

If you have extremely short inserts so R1/R2 completely overlap then that is a waste of sequencing (or you like to be over cautious).

Adapters may be cut automatically by MiSeq Reporter/BaseSpace if a setting is chosen at run time. Good libraries (long inserts) should not have adapter contamination, so it is not unusual to see clean reads.

1> So, for 150bpX2 WGS are not completely overlapping? I thought this is short enough.

2>Mine is from HiSeq, but it is from BASEspace. I didn't find a lot of adapters? does this mean they remove it?

**GenoMax** · 10-10-2016, 05:05 PM

Originally posted by SDPA_Pet View Post

1> So, for 150bpX2 WGS are not completely overlapping? I thought this is short enough.

If your inserts are longer (say 350 bp) then R1/R2 won't overlap. Use the BBMerge program from BBMap to quickly determine how many overlap in middle.

2>Mine is from HiSeq, but it is from BASEspace. I didn't find a lot of adapters? does this mean they remove it?

If all R1/R2 reads are not full length (equal to number of cycles) then it is possible that they were already trimmed.

**SDPA_Pet** · 10-10-2016, 05:16 PM

Originally posted by GenoMax View Post

If your inserts are longer (say 350 bp) then R1/R2 won't overlap. Use the BBMerge program from BBMap to quickly determine how many overlap in middle.

If all R1/R2 reads are not full length (equal to number of cycles) then it is possible that they were already trimmed.

"equal to number of cycles" -- Not sure the cycles means? Do you mean the theoretic length? I tell them to do 150bpX2. The full length will be 150bp. Does this also mean 150 cycles?

**SDPA_Pet** · 10-10-2016, 05:26 PM

[QUOTE=GenoMax;199750]If your inserts are longer (say 350 bp) then R1/R2 won't overlap. Use the BBMerge program from BBMap to quickly determine how many overlap in middle.

Which part of this report tell me how many overlap in the middle?

BBMerge version 36.38
Extend2 is defaulting to 50 because it was unset but rem mode is being used.
Executing assemble.Tadpole2 [in=ecct.ecco.clean.ELM010016AB_S1_L001_R_interleaved.fastq, branchlower=3, branchmult1=20.0, branchmult2=3.0, mincountseed=3, mincountextend=2, minprob=0.5, k=62, prealloc=false, prefilter=0, ecctail=false, eccpincer=false, eccreassemble=true]

Using 24 threads.
Executing ukmer.KmerTableSetU [in=ecct.ecco.clean.ELM010016AB_S1_L001_R_interleaved.fastq, branchlower=3, branchmult1=20.0, branchmult2=3.0, mincountseed=3, mincountextend=2, minprob=0.5, k=62, prealloc=false, prefilter=0, ecctail=false, eccpincer=false, eccreassemble=true]

Initial:
Ways=61, initialSize=128000, prefilter=f, prealloc=f
Memory: max=102900m, free=100216m, used=2684m

Estimated kmer capacity: 1922031677
After table allocation:
Memory: max=102900m, free=99142m, used=3758m

After loading:
Memory: max=102900m, free=59791m, used=43109m

Input: 8479764 reads 1249669006 bases.
Unique Kmers: 648242372
Load Time: 60.614 seconds.

Writing mergable reads merged.
Started output threads.
Total time: 84.110 seconds.

Pairs: 4239882
Joined: 2489893 58.726%
Ambiguous: 1749956 41.274%
No Solution: 33 0.001%
Too Short: 0 0.000%
Fully Extended: 34222 0.404%
Partly Extended: 88184 1.040%
Not Extended: 8357336 98.556%
Adapters Expected: 22 0.000%
Adapters Found: 0 0.000%

Avg Insert: 210.6
Standard Deviation: 49.4
Mode: 245

Insert range: 35 - 390
90th percentile: 274
75th percentile: 252
50th percentile: 216
25th percentile: 173
10th percentile: 138

**GenoMax** · 10-10-2016, 05:30 PM

Originally posted by SDPA_Pet View Post

Pairs: 4239882
Joined: 2489893 58.726%
Ambiguous: 1749956 41.274%
No Solution: 33 0.001%
Too Short: 0 0.000%
Fully Extended: 34222 0.404%
Partly Extended: 88184 1.040%
Not Extended: 8357336 98.556%
Adapters Expected: 22 0.000%
Adapters Found: 0 0.000%

Avg Insert: 210.6
Standard Deviation: 49.4
Mode: 245

Insert range: 35 - 390
90th percentile: 274
75th percentile: 252
50th percentile: 216
25th percentile: 173
10th percentile: 138

Right there in the log you posted. If you wrote the merged reads to a file they will be in there as well. Looks like your average insert size is 210 bp so with 2 x 150 bp R1/R1 will overlap in middle.

**SDPA_Pet** · 10-10-2016, 05:44 PM

Hi GenoMax,

I am new in this field. Forgive me if I ask the native question.

1>What is insert? I thought the insert the fragment that they shear the genomic DNA. In my case they do 150bpX2, so the insert/fragment is about 150bp. If my understanding is wrong, what is the insert? How this software can calculate it?

2>How can you calculate they are overlap. 2X150bp=300bp. Because 210.6<300bp, so it is overlap?

Thanks

**SDPA_Pet** · 10-10-2016, 05:53 PM

Is 241-150=91bp is the roughly overlapping region(Length) between R1 and R2?

**GenoMax** · 10-11-2016, 03:25 AM

Originally posted by SDPA_Pet View Post

Hi GenoMax,

I am new in this field. Forgive me if I ask the native question.

1>What is insert? I thought the insert the fragment that they shear the genomic DNA. In my case they do 150bpX2, so the insert/fragment is about 150bp. If my understanding is wrong, what is the insert? How this software can calculate it?

Fragments are what results from sheared DNA. Even though a certain size is aimed for in preps, not all fragments are of that size (generally there is a distribution). As a result the fragments could be smaller (or much larger) than the mean size determined by bioanalyzer. You add adapters to this fragment (which makes it the "insert") during library prep, which adds another 120 bp (or there about) so the fragment that goes into the sequencer is actually longer.

2>How can you calculate they are overlap. 2X150bp=300bp. Because 210.6<300bp, so it is overlap?

Thanks

That is correct.

Code:

------------------------------   250 bp insert
--------------------->
      150 bp R1
              <---------------
                          150 bp R2

-------------=======----------    Merged read with 50 bp overlap (====) in middle

**SDPA_Pet** · 10-11-2016, 03:41 AM

Thanks GenoMax,

As you said, the insert was determined by bioanalyzer during the sequencing part. My bbmerge results tell me the everage insert is about 210.6. I just don't know how the software calculate this. I thought the sequencing person will be the only one knows the size of the insert, because they have analyzer and they did the lab work.

**GenoMax** · 10-11-2016, 04:09 AM

BBMerge looked at the end result of the merge process across all reads it could merge and then came up with the average insert size.

**SDPA_Pet** · 10-11-2016, 04:14 AM

Thanks. Could you explain relationship between the cycles and length. You said "If all R1/R2 reads are not full length (equal to number of cycles) then it is possible that they were already trimmed. "

I thought there is no relationship between cycles and length? If you run more cycles of sequencing, you will get more reads, but not longer reads.

The read length (the theoretic length, i.e. 2X150bp, 2X 250bp, 2X300bp etc) is decided by what reagent you used and sequencing platform you used.

**GenoMax** · 10-11-2016, 04:46 AM

No of cycles of sequencing = Length (number of base pairs in each) of reads (that are untrimmed).

If Illumina software is asked to do trimming during post-processing of data then you may end up with some reads that will not be full length (not equal to number of cycles of sequencing, since the adapter sequence will have been removed, making those reads short).

If you run more cycles of sequencing you will get LONGER reads.

Number of reads is equal to clusters that are successfully producing sequence on an Illumina flowcell. Quality of library/concentration of library loaded on flowcell determines the number of clusters.

Not all clusters will pass quality filtering (e.g. some may produce mixed sequence if two clusters touch/mix and will be removed by Illumina software).

The number of reads you get in the final data file = number of clusters that passed Illumina QC filter.

The type of kit and sequencer used will determine how much sequence (and length of reads) you can get. Not to complicate things but it is possible to run asymmetric sequencing with kits (e.g. 2 x 300 bp kit can be used to give 1 x 600 bp reads, not that you would want to in most cases, but the upper cap is ~600 cycles of sequencing from this kit)

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

The purpose of join/merge 2X150bp illumina seuqencing reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News