Seqanswers Leaderboard Ad

**efoss** · 09-05-2011, 04:39 PM

A friend of mine solved this one for me: The 167 flag in column 2 of the offending line tells me that the read is, in fact, not mapped. If I enter "167" here, I can get that information:

http://picard.sourceforge.net/explain-flags.html

Now why it counts as unmapped when it appears to be mapped to chr9_gl000201_random 36145 is still mysterious to me, but in any event, I should be able to fix this problem by deleting the offending line.

Eric

**swbarnes2** · 09-05-2011, 11:17 PM

Originally posted by efoss View Post

A friend of mine solved this one for me: The 167 flag in column 2 of the offending line tells me that the read is, in fact, not mapped. If I enter "167" here, I can get that information:

http://picard.sourceforge.net/explain-flags.html

Now why it counts as unmapped when it appears to be mapped to chr9_gl000201_random 36145 is still mysterious to me, but in any event, I should be able to fix this problem by deleting the offending line.

Eric

There are two classic reasons why unmapped reads can appear as if they are mapped.

The first is that sam file specs call for an unmapped mate of a mapped read to be given the position of the mapped read. This is so that the two reads will sort together, but that doesn't seem to be happening here.

The second happens if you align with bwa. bwa concatenates reference contigs together before aligning, so if a read hangs off the edge of one contig on to another, it'll have a mapping position, but since that's not quite right, it's also marked as unmapped. GATK is confused by this behavior, unless you tell it to ignore it.

Those binary flags are a little weird, as I'd think that the first should have a flag of 91, since its mate is unmapped.

But in general, believe the flags. If the flag has the 4 bit flagged, it doesn't matter what the rest of the entry says, don't count the read as mapped.

Rather than delete the line, you can set the stringency on GATK to be lenient, so it will skip that line.

**efoss** · 09-06-2011, 10:12 AM

Originally posted by swbarnes2 View Post

There are two classic reasons why unmapped reads can appear as if they are mapped.

The first is that sam file specs call for an unmapped mate of a mapped read to be given the position of the mapped read. This is so that the two reads will sort together, but that doesn't seem to be happening here.

The second happens if you align with bwa. bwa concatenates reference contigs together before aligning, so if a read hangs off the edge of one contig on to another, it'll have a mapping position, but since that's not quite right, it's also marked as unmapped. GATK is confused by this behavior, unless you tell it to ignore it.

Those binary flags are a little weird, as I'd think that the first should have a flag of 91, since its mate is unmapped.

But in general, believe the flags. If the flag has the 4 bit flagged, it doesn't matter what the rest of the entry says, don't count the read as mapped.

Rather than delete the line, you can set the stringency on GATK to be lenient, so it will skip that line.

Thanks very much. That makes sense - and this file was, in fact, made using bwa.

I searched around for a lenient option in GATK and didn't see one. Is the syntax like this?:

java -Xmx20G -jar /home/efoss/sequencing/GenomeAnalysisTK-1.1-31-gdc8398e/GenomeAnalysisTK.jar VALIDATION_STRINGENCY=LENIENT ...

Thanks.

Eric

**swbarnes2** · 09-06-2011, 10:37 AM

Originally posted by efoss View Post

Thanks very much. That makes sense - and this file was, in fact, made using bwa.

I searched around for a lenient option in GATK and didn't see one. Is the syntax like this?:

java -Xmx20G -jar /home/efoss/sequencing/GenomeAnalysisTK-1.1-31-gdc8398e/GenomeAnalysisTK.jar VALIDATION_STRINGENCY=LENIENT ...

Thanks.

Eric

Yes, that's the one. It's probably better to keep all the reads in your file, and have GATK ignore the problem ones, than to delete them out.

**efoss** · 09-06-2011, 11:01 AM

Originally posted by swbarnes2 View Post

Yes, that's the one. It's probably better to keep all the reads in your file, and have GATK ignore the problem ones, than to delete them out.

GATK doesn't seem to like my VALIDATION_STRINGENCY=LENIENT argument:

##### ERROR MESSAGE: Invalid argument value 'VALIDATION_STRINGENCY=LENIENT' at position 0.

I took this from a Picard Tools command I found with Google. My complete command (broken up into separate lines for readability) is as follows:

java -Xmx20G -jar /home/efoss/sequencing/GenomeAnalysisTK-1.1-31-gdc8398e/GenomeAnalysisTK.jar

VALIDATION_STRINGENCY=LENIENT

-R /mnt/fred/home/efoss/gene_databases/human_genome/wg.fa

-T UnifiedGenotyper

-I /mnt/fred/home/efoss/sequencing/MDS/patient1/recreate_patient1_sam_file_plus_header_090511_1.bam

-B:dbsnp,VCF /mnt/fred/home/efoss/gene_databases/GATK_resource_files/00-All.vcf -nt 6

-o /mnt/fred/home/efoss/sequencing/MDS/patient1/MDS_patient1_dcov500_UnifiedGenotyper_090611_1.raw.vcf

-dcov 500

Thanks.

Eric

**swbarnes2** · 09-06-2011, 12:31 PM

Maybe that command line is obsolete?

This is from the version 1.1.23 help entry:

-S,--validation_strictness <validation_strictness> How strict should we be with validation (STRICT|LENIENT|
SILENT)

**efoss** · 09-06-2011, 01:51 PM

Originally posted by swbarnes2 View Post

Maybe that command line is obsolete?

This is from the version 1.1.23 help entry:

Perfect - thanks so much for the help.

Eric

**raonyguimaraes** · 09-07-2011, 04:25 AM

The best tip i can give you is to use the data from Broad in your analysis.

ftp://ftp.broadinstitute.org/pub/seq...sembly19.fasta
And
ftp://ftp.ncbi.nlm.nih.gov/snp/organ.../00-All.vcf.gz

**cedance** · 04-26-2012, 11:03 PM

Originally posted by swbarnes2 View Post

The first is that sam file specs call for an unmapped mate of a mapped read to be given the position of the mapped read. This is so that the two reads will sort together, but that doesn't seem to be happening here.

swbarnes2, Thanks for explaining the reason of MAPQ error with regard to bwa. However, I don't follow this one. Could you please explain this part a bit more detailed? Thank you.

**swbarnes2** · 04-27-2012, 12:13 PM

Originally posted by cedance View Post

swbarnes2, Thanks for explaining the reason of MAPQ error with regard to bwa. However, I don't follow this one. Could you please explain this part a bit more detailed? Thank you.

When you map single reads, if a read doesn't map, its chromosome and position entries in the .bam file are left blank, or they have a star, or something like that. The 4 flag is also set.

When you have paired reads, if one end maps, and the other doesn't, the mapped end has the chromosome and position in its bam entry, just like you would think it should. But the unmapped read also has that same chromosome and position filled out in its entry. It's not left blank like you might think it should be. The reason for this is so that if you sort by chromosome and position, the unmapped read will sort right next to its mapped mate.

Only if both ends of a pair fail to map are they given empty chromosome and position values.

Just because an entry in the .bam file has a chromosome and position doesn't mean that it really mapped there. If it has a 4 in the flag, then you can't necesarily believe anything else in the .bam entry (except the read name, and sequence, and quality).

**cedance** · 04-27-2012, 02:38 PM

Originally posted by swbarnes2 View Post

When you map single reads, if a read doesn't map, its chromosome and position entries in the .bam file are left blank, or they have a star, or something like that. The 4 flag is also set.

When you have paired reads, if one end maps, and the other doesn't, the mapped end has the chromosome and position in its bam entry, just like you would think it should. But the unmapped read also has that same chromosome and position filled out in its entry. It's not left blank like you might think it should be. The reason for this is so that if you sort by chromosome and position, the unmapped read will sort right next to its mapped mate.

Only if both ends of a pair fail to map are they given empty chromosome and position values.

Just because an entry in the .bam file has a chromosome and position doesn't mean that it really mapped there. If it has a 4 in the flag, then you can't necesarily believe anything else in the .bam entry (except the read name, and sequence, and quality).

swbarnes2, great! thank you.
Just 1 more question. How about the NH:i:1 and XT:A:U optional flags in case of tophat and bwa? Of course its easy to just check for the unmapped read by doing a "bitwise and" in perl as and($flag, 4) == 4. However, I was just wondering because, in bwa, I checked for a recent library and found out that 5 reads with unmapped read flag set had the optional flag XT:A:U as well... Isn't this a bug then? It concludes all the same though that one can't rely on any other info/optional flag to decide if the read is unmapped or not..

**Sam64** · 10-18-2012, 01:19 AM

Originally posted by swbarnes2 View Post

Yes, that's the one. It's probably better to keep all the reads in your file, and have GATK ignore the problem ones, than to delete them out.

Are you sure when you say it's probably better to keep all the reads....
Would anybody have an opinion about the problem ? Is it necessary to eliminate the reads with errors (MAPQ should be 0 for unmapped read) ? or can you use the option "VALIDATION_STRINGENCY=LENIENT" to avoid the problem and continue the analysis without removing these reads ?

Thank you,

Sam64

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

GATK "MAPQ should be 0 for unmapped read" complaint for mapped read

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News