Seqanswers Leaderboard Ad

**swbarnes2** · 07-02-2012, 08:24 AM

Well, of course allowing more mismatches will have more reads aligning. But if only 50% of your reads aligned on default, that doesn't sound like good quality to me. That, or your reference genome is not a very good match to your sample. You might have better luck making a de novo assembly, and aligning to that.

**Richard Finney** · 07-02-2012, 09:14 AM

I think 50% is "okay".
I think the problem boils down the the Solid system making calls based on transition. Once there's a bad transition, the rest of the read is junk. With Illumina, a bad quality read doesn't ruin the rest of the read. So with solid, almost of the read needs to be good to align well. I am open to being schooled better on this.

"Forcing" more alignments by allowing more mismatches is going to mean more false positives. Requiring fewer mismatches will mean missing some real calls. It's a trade-off.

Alignment is not perfect. Genomic references are not perfect. SNP calling is not perfect. De-novo assembly is not perfect.

Someday we'll look back and laugh at the current state of affairs. Right now it's fustrating and focusing our cameras to get the true picture remains very tricky.

**PedroA** · 07-02-2012, 02:40 PM

@swbarnes2 the reference genome is finished and is from the same species, but making a de novo assembly at first with 50 bp SOLiD reads don't look much pretty for me...

Richard is right, 50% could not be so bad if the alignment is of good quality. From a review that I have read from Li and Homer it seems true that following a sequencial error in the SOLiD machine all the bases can be wrong if they are converted to base calls prior of mapping (but I think BWA maps in color code mode and this problem might not occur).

I'm now testing for only 7 and 5 mismatches in the whole read to make results comparable.

Although I'm still confused about seeding, to do or not to do the seeding? what's the best for accuracy? For what I understand in BWA manual the algorithm can look for mismatches in the seed and if this number is below the threshold it will place the read in that position. However, quality scores start to decrease around from the middle of the read (whatever technology they are). And if from this region there are more mismatches than the allowed by the -n setting, the read is rejected from the position where it was aligned by the seed? (sorry if I'm interpreting things wrongly)

"Alignment is not perfect. Genomic references are not perfect. SNP calling is not perfect. De-novo assembly is not perfect.
Someday we'll look back and laugh at the current state of affairs. Right now it's fustrating and focusing our cameras to get the true picture remains very tricky."

I completely agree with you, and hopefully that day will come, however, doesn't seem to be so soon as desirable...

**Chipper** · 07-03-2012, 04:09 AM

Originally posted by Richard Finney View Post

I think 50% is "okay".
I think the problem boils down the the Solid system making calls based on transition. Once there's a bad transition, the rest of the read is junk. With Illumina, a bad quality read doesn't ruin the rest of the read. So with solid, almost of the read needs to be good to align well. I am open to being schooled better on this.

As Pedro said bwa does the mapping in colorspace so this is not an issue. Ther are two issues with seed-based aligners for SOLiD though. Since raw error rates are much higher than for other competing instruments, it is not uncommon that a read has more than two mismatches in the seed. Also, since a SNP gives two cs differences the alternative allele gets underrepresented if the SNP is in the seed.

Bwa defaults are for Illumina and needs to be changed for SOLiD, I found something like -l 23 -n 7 to give good results (shortening the seed too much or allowing more mismatches in the seed makes it much slower). For SNP-calling you may want to trim the reads first though.

**PedroA** · 07-03-2012, 08:57 AM

Yes, that's what I also experienced. I had to trim both the 5' end (2 bases) and the 3' end (3 bases) because there were many reads with N bases in these positions.

The results that I've got for this data were the following (and hope that this could help others also):

defaults: 52% mapped reads
-n5 -l-1 -k3: 63.3% mapped reads
-n5 -l25 -k3: 64.2% mapped reads (no big difference for the previous)
-n7 -l-1 -k3: 68.8% mapped reads
-n7 -l25 -k3: 71.9% mapped reads
-n10 -l-1 -k3: 71.5% mapped reads
-n10 -l25 -k3: 77.8% mapped reads

But at a first sight it seems to me that the quality of the mapping decreases considerably for -n10, and so I will not base my further analysis on these.

**colindaven** · 07-04-2012, 11:13 PM

Thanks for a useful thread. To get back to your initial question on how the settings affect SNP calling, why not call SNPs on three or four of the "better quality" alignment settings, and compare the VCFs.

I think Heng Li stated once bwa will work ok with colour space data but more mismatches should be allowed. Iterative read trimming from the 3' end seems to be what allows Lifescope and NovoalignCS to align more reads than BWA using default settings.

Lastly : did you try SAET ? I am getting 1-5% more alignments for bacteria after processing the reads using SAET.

**carmeyeii** · 12-13-2012, 12:07 PM

Hi PedroA,

And everyone on this great thread!

What did you use to trim your SOLiD reads? Is there anything you recommend for assessing the quality of my reads, such as FastQC for Illumina? I am aware that I can use this application for the SOLiD reads, but I've read elsewhere that the chemistry of SOLiD gives for a very weird-looking per-base quality boxblot -- so what is the standard for a good quality run?

Thanks!!

Carmen

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

BWA mapping for SOLiD single-end reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News