SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA: paired end reads, wrong orientation but listed as properly paired rdeborja Bioinformatics 3 06-11-2014 04:39 AM
Running out of disk space with BWA, wrong CMD line? sindrle RNA Sequencing 3 11-26-2013 08:14 AM
Aborted BWA run - what is wrong? yaximik Bioinformatics 1 06-10-2013 03:52 AM
BWA : XM tag is sometimes wrong Lcontami Bioinformatics 3 04-01-2013 12:04 PM
BWA concise format output -edit distance wrong biterbilen Bioinformatics 2 11-06-2009 03:55 PM

Reply
 
Thread Tools
Old 10-24-2019, 09:19 AM   #1
radood
Junior Member
 
Location: USA

Join Date: Oct 2019
Posts: 4
Question Wrong bwa alignment?

Hi,
Below is a supplementary alignment for one of my reads with a mapping quality score of 43.This should map to chr2: 29446797 in hg19

03012p9SX:030B 2064 1 29446797 43 36M241H -1 -1 36 CCCCACGTGAACGAGGGAGGGAGGGAGGGTTGGGTG array('B', [32, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41]) [('SA', 'chr2,42479618,+,238M39S,60,2;'), ('MD', '26A9'), ('RG', 'SAMPLE.sample23'), ('NM', 1), ('AS', 31), ('XS', 23), ('fm', '3_0')]

Since the read is on the negative strand, below is the forward sequence:
CACCCAACCCTCCCTCCCTCCCTCGTTCACGTGGGG

However this sequence does not map to chr2: 29446797. I tried blasting but I think it is too short so I aligned it to the genomic sequence as you can see here
https://ibb.co/R7VHrQH
This is not a good alignment at all. Is BWA wrong? Am I missing something?
Thank you for your input in advance!
radood is offline   Reply With Quote
Old 10-28-2019, 01:27 PM   #2
radood
Junior Member
 
Location: USA

Join Date: Oct 2019
Posts: 4
Default

Checking to see if folks have feedback on this. I read that "BWA actually follows the SAM spec and reports Phred scores as MAPQ values.* The calculation is based on the number of optimal (best) alignments found, as well as the number of sub-optimal alignments combined with the Phred scores of the bases which differ between the optimal and sub-optimal alignments."

Does that mean that the mapq score of a clipped read is a combination of the quality of the 'matched' alignment AND the quality of the alignment of the clipped bases? Perhaps this explains the strange finding I saw above.

If so, how does one extract the mapping quality of only the 'matched' portion of the read at this location (29446797)?
radood is offline   Reply With Quote
Old 10-29-2019, 10:06 AM   #3
r.rosati
Member
 
Location: Brazil

Join Date: Aug 2015
Posts: 91
Default

Hi! Yes you can BLAST it, but since it's a short sequence it's best if you use the `blastn` algorithm instead of the `megablast` one. You can also decrease the word size to 7 to increase sensitivity.

Here's your search, it'll be available until october 31:
https://blast.ncbi.nlm.nih.gov/Blast...ID=VGW43RPD014

According to the BLAST results, the sequence matches chr2:29446798-29446833 (hg19).

...PS anyways the CIGAR string says "36M" but the alignment to GRCh37.p13 shows one mismatch.

...PPS oh, I just noticed - I know where you got it wrong! Bit 16 in the SAM flag does not simply mean "the read is aligned to the reverse strand", it means "the read is represented as its reverse-complement because it aligned to the reverse strand". Sequences in SAM files are always represented on the forward strand. So if bit 16 is 1, as in this case, it means that "CCCC..." is ALREADY the reverse-complement of the actual read. So you don't have to reverse-complement it again. In fact, you can find your sequence in the same screenshot you posted.

Last edited by r.rosati; 10-29-2019 at 10:46 AM.
r.rosati is offline   Reply With Quote
Old 10-30-2019, 02:37 PM   #4
radood
Junior Member
 
Location: USA

Join Date: Oct 2019
Posts: 4
Default

Wow I do see it now!!! Thank you so very much Rosati. Yes this is super helpful and it makes a lot of sense

I didn't realize that reads in the SAM file always represent the forward strand. How about cases when I check read.get_forward_sequence in pysam, sometimes the output is the reverse complement of the sequence in the read itself, like this example:

Read:
0a023638:0B0B 83 0 9999 0 144M 0 10005 144 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCC array('B', [12, 12, 41, 37, 32, 32, 32, 32, 41, 37, 37, 37, 22, 32, 27, 12, 37, 32, 27, 32, 27, 27, 41, 41, 27, 27, 32, 32, 32, 32, 32, 27, 37, 37, 41, 37, 37, 37, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 37, 41, 41, 37, 41, 41, 27, 37, 41, 41, 37, 41, 37, 41, 41, 41, 41, 37, 32, 41, 32, 41, 37, 32, 22, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 37, 27, 41, 41, 27, 37, 37, 12, 41, 41, 41, 37, 41, 37, 41, 41, 41, 37, 37, 27, 41, 37, 22, 37, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 37, 27]) [('MD', '0A143'), ('RG', 'HMMGHBBXX.lane0.2P_FMIEx_321'), ('NM', 1), ('AS', 143), ('XS', 137)]

But read.get_forward_sequence() is:
GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG

Thank you so much!
radood is offline   Reply With Quote
Reply

Tags
alignment, bwa, read

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:16 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO