SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
htseq-count for sam and gff3 sofia17 RNA Sequencing 45 11-04-2016 04:32 PM
HTSeq sort sam DZhang Bioinformatics 41 11-03-2014 12:36 AM
SAM (from bowtie2) to BAM problem using samtools amango Bioinformatics 10 04-15-2013 01:52 AM
Bowtie2 - segmentation fault due to --sam-RG memento Genomic Resequencing 3 02-02-2012 03:37 PM
Bowtie2 ( not running with option --sam-rg CHRYSES Bioinformatics 2 01-24-2012 03:28 AM

Reply
 
Thread Tools
Old 09-05-2012, 12:38 PM   #1
all_your_base
Member
 
Location: USA

Join Date: Mar 2012
Posts: 40
Default HTSeq not working with Bowtie2 .SAM

Hi all,

I am having a weird problem with my Bowtie2 .SAM output for use with HTseq to count reads that correspond to genes in a .gff file.

Usually, I can just feed my Bowtie1 .SAM into HTseq using the following command:

htseq-count -m union -s no -t gene -i ID -o myOutput.sam myInput.sam organism.gff


However, after switching to Bowtie2 and running the same command, I get gigabytes of this:


Warning: Read HWI-ST1234:350WK3ACXX:6:1101:1780:2126/1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST1234:350WK3ACXX:6:1101:1780:2126/2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST1234:350WK3ACXX:6:1101:1671:2238/1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST1234:350WK3ACXX:6:1101:1671:2238/2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST1234:350WK3ACXX:6:1101:2011:2134/1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST1234:350WK3ACXX:6:1101:2011:2134/2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)


According to other forums, this usually happens when the SAM isn't sorted by read ID, so that htseq can't find the two halves of a paired-end read. However, I tried sorting my SAM in multiple ways, such as:

sort -k1 myfile.sam > myfile_sorted.sam


I still get the same error! Any help or suggestions are greatly appreciated
all_your_base is offline   Reply With Quote
Old 09-05-2012, 01:13 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If any of those reads are multimapped, then using the command line sort command will not do what you want. Use samtools sort -n
dpryan is offline   Reply With Quote
Old 09-05-2012, 01:34 PM   #3
all_your_base
Member
 
Location: USA

Join Date: Mar 2012
Posts: 40
Default

@dpryan,

Thanks for the reply, but can you please explain your answer? How does the samtools sort command differ than unix sort?

Also, since Bowtie2 produces a SAM file by default, to use SAMtools sort, do I have to first convert to BAM, then sort, then convert back to SAM?

Thanks...
all_your_base is offline   Reply With Quote
Old 09-05-2012, 02:22 PM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

The problem is the /1 and /2 in your read names. The SAM specification indicates that the names of paired reads be identical. SAM identifies read 1 or read 2 by the FLAG bits. Remove the /1 & /2 from the names in your SAM files and repeat your analysis.
kmcarr is offline   Reply With Quote
Old 09-06-2012, 06:18 AM   #5
all_your_base
Member
 
Location: USA

Join Date: Mar 2012
Posts: 40
Default

@kmcarr

Wonderful, I trimmed the /1 and /2 off my reads and made sure the mates were next to each other after sorting, and HTSEQ runs fine without the previous error messages.

Quick question...
After processing a few thousands reads, HTSEQ reports the following error:

Warning: Malformed SAM line: MRNM != '*' although flag bit &0x0008 set
Warning: Malformed SAM line: RNAME != '*' although flag bit &0x0004 set

This is from raw Bowtie2 output; the only modifications were my /1 and /2 trimming and sorting.

Anyone have an idea where these errors are coming from??

Thanks!
all_your_base is offline   Reply With Quote
Old 09-06-2012, 06:57 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

RNAME and MRNM are the name of the chromosome (or scaffold or whatever) to which the current read and its mate (for MRNM) map. Since the flags indicate that the reads are unmapped, it's just complaining that there's stuff here instead of an *, meaning "Not available". I don't recall ever seeing that with bowtie2, only bwa. You can normally ignore such warnings.
dpryan is offline   Reply With Quote
Old 09-06-2012, 07:03 AM   #7
all_your_base
Member
 
Location: USA

Join Date: Mar 2012
Posts: 40
Default

@dpryan,

Thanks for all your help. My analysis is working well now
all_your_base is offline   Reply With Quote
Reply

Tags
bowtie2, counts, htseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:09 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO