Seqanswers Leaderboard Ad

**chadn737** · 02-15-2012, 08:28 AM

Did you align to the whole genome or transcripts?

**kgulukota** · 02-15-2012, 08:43 AM

This was to transcripts.

**chadn737** · 02-15-2012, 08:51 AM

I guess I'm not really that surprised then that you see this. Do the unmapped portion of the reads match the genome? If not, then you have some sort of contamination at the end of your reads. If they do, then that tells me that the reference transcript is wrong in where it makes the cutoff for the 3' UTR.

Do you have a reference genome? If so why map to transcripts over the genome. I think you can miss a lot by mapping to transcripts because you are making the implicit assumption that all transcripts and isoforms are known.

**swbarnes2** · 02-15-2012, 09:23 AM

The other reason one can get both a mapping coordinate and the unmapped flag is that sam specs call for unmapped reads to be given the mapping coordinates of their mapped mate. This is so the two reads will sort together. But maybe you only have single end data, in which case, you won't see this.

To distinguish these, maybe you could pad your transcripts with n's so that reads won't cross between adjacent reference transcripts, and therefore wouldn't have the unmapped flag set. You could also try other aligners, as they might not set the unmapped flag in those circumstances. I know that bwa will, but other aligners might behave differently.

**kgulukota** · 02-15-2012, 10:06 AM

Chadn737 - For the pipeline I am building, mapping to transcripts and to the genome and then combining information from both is important. (I will post more details on it after the pipeline is in a decent shape). I am also not surprised at the mapping per se. Rather my surprise was that BWA would report it as unmapped and yet provide the reference ID.

swbarnes2 - Padding with N's could "fix" this. I'll try it. Other aligners: I have tried bowtie2 and have had some trouble. I am privately in touch with Ben Langmead in trying to resolve that issue. It is clear though that SAM format, a standard though it is, does have these odd behaviors based on what software created the file.

Thank you both for your responses. Very helpful in driving me to my final pipeline. I have come to the conclusion that I should go with the Ref Name rather than the SAM flag (though perhaps with some additional checking if the flag is 4).

Gulu

**maubp** · 02-15-2012, 04:20 PM

Originally posted by kgulukota View Post

It is clear though that SAM format, a standard though it is, does have these odd behaviors based on what software created the file.
...
Thank you both for your responses. Very helpful in driving me to my final pipeline. I have come to the conclusion that I should go with the Ref Name rather than the SAM flag (though perhaps with some additional checking if the flag is 4).

That would be wrong - the spec is very clear that the FLAG determines if a read is mapped or not, and if unmapped other fields are to be regarded as undefined (although there is usually an expected value).

**Simon Anders** · 02-15-2012, 11:43 PM

Have a look at the FAQ for bwa: http://bio-bwa.sourceforge.net/

"I see a read stands out the end of a chromosome and is flagged as unmapped (flag 0x4). What is happening here?

"Internally BWA concatenates all reference sequences into one long sequence. A read may be mapped to the junction of two adjacent reference sequences. In this case, BWA will flag the read as unmapped, but you will see position, CIGAR and all the tags. A better solution would be to choose an alternative position or trim the alignment out of the end, but this is quite complicated in programming and is not implemented at the moment."

If the read happens to align to both the end of one chromosome and the beginning of the next, this is sufficiently odd so that one should simply consider it unmapped, so I agree here with maubp in #7.

**maubp** · 02-16-2012, 12:29 AM

Yes, that BWA "feature" is a classic example. It would be nice if they fixed it to clear the reference name, mapping position, etc in this case - as well as the flag say it was unmapped.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 23 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

How do you deal with reads "unmapped to NM_0012345"?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News