Seqanswers Leaderboard Ad

**gwilkie** · 03-19-2013, 12:34 PM

Hi Elfuser,

The different read lengths could be caused by adapter clipping? - the MiSeq has a small check box for this on the sample set up screen that is often defaulted to on. This clips off the adapter sequences from your reads. Any library fragment that is shorter than your read length will run into the adapters (a bit like sequencing into the vector in the old days!). Its a good idea to remove the adapter sequences, but it does lead to variable read lengths. Instead of letting the MiSeq do this, I usually remove adapter and quality trim my data with trim_galore, which you can set with a minimum acceptable read length.

bwa doesn't seem to mind datasets with different length reads. As far as I understand (I'm not a bioinformatician!) bwa works by looking for a short exact match between the read and the reference sequence (called the seed, default 20bp). When it finds a match, it then extends this. So as long as your reads are longer than the seed length it will work.

I don't know about bowtie2 I'm afraid.

Best wishes,
Gavin

**shaik sabiha** · 08-26-2015, 07:44 AM

Hi,

I am also facing a similar situation right now. I have Miseq data with varying read lengths. Is is fine to go ahead with the denovo assembly into contigs ? will these reads with different lengths affect the assembly and the downstream analysis?

**gwilkie** · 08-26-2015, 08:19 AM

Yes, go ahead. Most de novo assemblers these days can handle variable read lengths.

Depending on the assembler you choose, you probably want to trim your raw data for quality and remove any very short reads before you start the assembly. Remember its important to maintain the same order in your forward and reverse reads if you have paired ends.

Any reads that are shorter than your hash (k-mer) length ought to be removed or ignored by the assembly software.

**shaik sabiha** · 09-02-2015, 08:08 AM

Hi Gwilkie,
Thank you for your advice. I have trimmed the reads and then assembled them using velvet. Assembly looks good.

**shaik sabiha** · 09-04-2015, 03:35 AM

Hi,
Can you suggest which assembler shoud be prefered for longer read lengths i.e above 200bp.

**gwilkie** · 09-04-2015, 05:42 AM

It depends on what genome you are trying to assemble - e.g. virus, bacteria, vertebrate. I suggest you start a new thread as this is a complex subject and there are now many different assemblers to choose from... each with their strengths and weaknesses.

ABySS is a good general assembler that is easy to use and does not require a huge amount of computer resources. However, all De Bruijn graph assemblers have a maximum hash length (k-mer size) that generally cannot exceed 128 due to computing power limitations.

Therefore very long reads do not necessarily help the initial assembly but can be useful later for closing gaps, joining contigs or resolving repeats.

Hope that helps

Originally posted by shaik sabiha View Post

Hi,
Can you suggest which assembler shoud be prefered for longer read lengths i.e above 200bp.

**shaik sabiha** · 09-08-2015, 10:34 PM

Hi Gwilkie,

I think its a good idea to start a new thread. So far I was using velvet for 150bp reads and now I am experiment with IDBA for 250 bp ones. Will be checking out ABySS as well. Thank you

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Varying lengths in 2x150 Miseq sequencing data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News