Seqanswers Leaderboard Ad

**Karenj** · 01-11-2013, 05:28 AM

Any comments will be very much appreciated!

**xied75** · 01-11-2013, 06:10 AM

When you say "without any success" what do you mean? How do you check?

**Karenj** · 01-11-2013, 07:10 AM

Sorry for not being clear - I mean that the percentage of aligned reads get lower instead of higher...

**volks** · 01-13-2013, 03:47 AM

are you aligning against a transcript database? if not, you might consider using a splice aware aligner like tophat or star:

http://tophat.cbcb.umd.edu/

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/rna-star/

**Karenj** · 01-13-2013, 03:56 AM

Hi volks, I have to use BWA since this aligner allows the two reads in a read pair to be on different chromosomes - my analysis depends on this. I am aligning against a custummade reference genome.

**volks** · 01-13-2013, 04:08 AM

if you are certain that BWA is your only option ..
the parameters are pretty clear:

Options: -n NUM max #diff (int) or missing prob under 0.02 err rate (float) [0.04]
-o INT maximum number or fraction of gap opens [1]
-e INT maximum number of gap extensions, -1 for disabling long gaps [-1]
-i INT do not put an indel within INT bp towards the ends [5]
-d INT maximum occurrences for extending a long deletion [10]
-l INT seed length [32]
-k INT maximum differences in the seed [2]
-M INT mismatch penalty [3]
-O INT gap open penalty [11]
-E INT gap extension penalty [4]
-L log-scaled gap penalty for long deletions

as far as i understand it is not possible to have less reads aligned allowing for more mismatches (-n).

**Karenj** · 01-13-2013, 04:18 AM

Thanks volks. Yes, I am almost 100 percent sure that BWA is my only option. However, I am really a newbie to BWA, so I'm not sure that I understand your post. Most of the parameter settings, that you list, are default, right?

E.g. -n is 0.04 by default, and I thought that this parameter was one of the parameters that I should change, when allowing BWA to align with more mismatches? Sorry - but can you explain me again which parameters are default and which parameters I should change?

**volks** · 01-13-2013, 04:23 AM

defaults are given in brackets [].
for starters i would disable gapped alignment (-o 0), keep the seed at length and two mismatches (-l 32, -k 2) and try various different overall mismatches (e.g. -n 3 to 6). higher -n should give you more aligned reads.

**Karenj** · 01-13-2013, 04:40 AM

Ok, thanks. I will try to use the guidelines that you have given me.

So I should concentrate on changing -n (the one that is set to 0.04 as default)? I will try to set it between 3 and 6. How should this parameter be set if I want to allow e.g. twice as many mismatches per read compared to default?

I have read somewhere that it is a good a idea to also disable seeding by setting -l (10000) when allowing more mismatches - but I don't know if I should do this?

**volks** · 01-13-2013, 04:52 AM

if you run it on default it will tell you what the number of mismatches are for various read lenghts. just double that.

i dont see why you should turn off seeding, and i am not sure if setting -l 10000 would do that.

**xied75** · 01-14-2013, 10:05 AM

Disable seeding will make run slower. If speed is not an issue here.

**Karenj** · 01-14-2013, 12:56 PM

Speed is not the biggest issue... Xied75, would you disable seeding if/when allowing for more mismatches? I'm running some test changing the parameters that volks suggested me, but I don't have any results yet.

**xied75** · 01-17-2013, 07:14 AM

Hi, Karenj,

I did some test.

First thing, if you don't give any parameter to adjust, then:

Default value for n, which you saw at the beginning of output:

[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9

My data is 83bp thus n = 4, if I run with n = 8 or n = 16, I can see more reads mapped.

Now -l changes the seed length, seems doesn't work, it runs 100 times slower, and map less, -k change the mismatch within seed, giving a large number doesn't work either.

There are many more parameters you can change e.g. -o, -e, -i, -d, -M, -O, -E, the point is you do need understanding of it.

But the point of BWA is to align very fast with low error reads, if you adjust any of those listed above, it might align some hard reads, but the run time is significant LOOOOOOOONGER. Which you might better just use BWA to align first round and use another tool to align those unmapped, (like many re-aligner do).

**Karenj** · 01-17-2013, 07:24 AM

Hi xied75, thanks for your post. I'm a bit confused about where I see the default value for n. I use BWA at the Galaxy server, perhaps it works a bit different there?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

How to allow more mismatches in BWA?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News