SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
error with sam output ->Parse error at line xxxxx: missing colon in auxiliary data manore Bioinformatics 11 11-25-2013 01:50 PM
SAM output from bowtie/crossbow sethnr Bioinformatics 0 02-03-2012 01:56 AM
sam to Bowtie output seq_GA Bioinformatics 0 03-21-2011 10:56 PM
Sorting SAM output from Bowtie DrD2009 Bioinformatics 9 11-10-2010 11:52 AM
Bowtie and SAM output rdeborja Bioinformatics 0 12-03-2009 10:30 AM

Reply
 
Thread Tools
Old 04-06-2011, 11:13 PM   #1
burt
Junior Member
 
Location: Singapore

Join Date: Jan 2011
Posts: 8
Default Error in bowtie's sam output?

Hi,

I'm not sure if its a bug with the bowtie's sam output, but all my mapping quality would take the value of 255 while the bit flag would take on values of either 0,4,16. I had run bowtie in colorspace.

Has anyone else encountered a similar issue?

Here's a sample of the sam output with the qwerky values.

1279_33_430_F3 4 * 0 0 * * 0 0 TGGTGACCTGGGCCCTGAGGANCGCGTTGATCTACCTCCGCTCAATTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XM:i:0
1279_33_409_F3 16 chr6 49186545 255 50M * 0 0 TATTCGTACTGAAAATCAAGATCAAGCGAGCTTTTGCCCTTCTGCTCCAC Iqqqqqqqqqqqqqqqqqqqqqqqqqq!Iq!!qqqqqqqqqqqqqqqqqI XA:i:2 MD:Z:50 NM:i:0 CM:i:2
1279_34_783_F3 0 chr16 11144009 255 50M * 0 0 TATGTGCTTGGCTGAGGAGCCAATGGGGCGAAGCTACCATCTGTGGGATT Iqqqqqqqqqqqqqqqqqqqq!Iqqqqqqqqqqqqqqqqqqqqqqq!!qI XA:i:2 MD:Z:50 NM:i:0 CM:i:2
1279_40_121_F3 0 chr17 39983712 255 50M * 0 0 ACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACC Iqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq!!qqq!!qqqqI XA:i:2 MD:Z:50 NM:i:0 CM:i:2
1279_41_567_F3 16 chr6 49186546 255 50M * 0 0 ATTCGTACTGAAAATCAAGATCAAGCGAGCTTTTGCCCTTCTGCTCCACG IqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqI XA:i:0 MD:Z:50 NM:i:0 CM:i:0


-burt
burt is offline   Reply With Quote
Old 04-11-2011, 01:43 AM   #2
me_myself_andI
Member
 
Location: Singapore

Join Date: Nov 2010
Posts: 30
Default

Related and up to now also unanswered post: http://seqanswers.com/forums/showthread.php?t=10624
me_myself_andI is offline   Reply With Quote
Old 04-11-2011, 03:43 AM   #3
Joker!sAce
Member
 
Location: Denmark

Join Date: Feb 2011
Posts: 21
Default

Field MAPQ considers pairing in calculation if the read is paired. If such a calculation is difficult, 255 is applied, indicating the mapping quality is not available. I assume your query sequences had quality scores with them(FastQ files). Maybe your query sequence was not paired? There is a high order of probability that this is not an error and you should not worry about it, but confirm this.

Last edited by Joker!sAce; 04-11-2011 at 03:45 AM.
Joker!sAce is offline   Reply With Quote
Old 04-11-2011, 05:43 PM   #4
burt
Junior Member
 
Location: Singapore

Join Date: Jan 2011
Posts: 8
Default

Hi, thanks for the reply. My input data is a set of single-end reads. I'm pretty sure if a problem with the sam output in colorspace.

Just something to add on to the problem description. If I were to look only are reads that are mappable, all of them are reported to have 50M perfect matching. This is very unusual as it's highly unlikely for all reads to map perfectly to the reference genome, especially when we are investigating the transcriptome landscape of the cell.
burt is offline   Reply With Quote
Old 04-11-2011, 07:53 PM   #5
feixue1039
Member
 
Location: China

Join Date: Mar 2011
Posts: 18
Default

Hi

I encountered the same issue as burt. My input data is single-end reads sequenced by Illumina Hiseq 2000 platform. I just wonder whether a read with a MAPQ value of 255 is filtered or not before the calculation of FPKM value in downstream analyses (for instance, cufflinks/cuffdiff)?

Any reply will be appreciated.

feixue1039
feixue1039 is offline   Reply With Quote
Old 04-11-2011, 10:17 PM   #6
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by burt View Post
If I were to look only are reads that are mappable, all of them are reported to have 50M perfect matching.
That's not what the CIGAR score means. 50M only means no indels, no hard or soft clipping at the ends. There are other elements later in the sam line that indicate what the discrepancies exist between the read and the reference.
swbarnes2 is offline   Reply With Quote
Old 03-20-2012, 07:04 AM   #7
loodramon
Junior Member
 
Location: Dublin Ireland

Join Date: Feb 2012
Posts: 2
Default

Hi,

I am seeing the same thing with my single stranded (non-paired) RNA-seq alignments.

There is either a Mapping Quality score of 255 and has a bit flag of 16 or else the read is not mapped and has a bit flag of 4.

Is this common to all non-paired RNA-seq data? I'm new to RNA-seq.

Thanks in advance
loodramon is offline   Reply With Quote
Old 03-20-2012, 10:55 AM   #8
loodramon
Junior Member
 
Location: Dublin Ireland

Join Date: Feb 2012
Posts: 2
Default

Hi again,

I should mention that I used the latest version of bowtie for this alignment.

I've been told by a colleague that: Bowtie doesn't calculate mapping quality values, so it prints 255 to the MAPQ field of the sam file if the read aligns or 0 otherwise.
loodramon is offline   Reply With Quote
Old 11-24-2012, 11:20 PM   #9
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

Hi,

Do you know what is the meaning of 100M from bowtie sam output?
I run bowtie with single-read.
My input file is 100 read length.

Thanks.
edge is offline   Reply With Quote
Old 11-24-2012, 11:20 PM   #10
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

Hi.

Thanks for answer about 255 by bowtie sam output.

Hi,

Do you know what is the meaning of 100M from bowtie sam output?
I run bowtie with single-read.
My input file is 100 read length.

Thanks.
edge is offline   Reply With Quote
Old 11-25-2012, 07:26 AM   #11
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by edge View Post
Hi,

Do you know what is the meaning of 100M from bowtie sam output?
I run bowtie with single-read.
My input file is 100 read length.

Thanks.
It's generally better to start a new thread rather than to resurrect a really old one. Nevertheless, the "100M" is part of the CIGAR string, which is defined in the SAM specification. It means 100 matches (practically, this just means that there were no indels). I recommend familiarising yourself with the SAM format if you're going to do much with sequencing data.
dpryan is offline   Reply With Quote
Old 11-25-2012, 03:53 PM   #12
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

Thanks, dpryan.

I will have a look on the document that you shared.
Really appreciate
edge is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:21 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO