Seqanswers Leaderboard Ad

**amitm** · 08-17-2014, 03:54 PM

hi TimK,
Most probably your SAM file is not Name sorted. HTSeq requires input SAM to be read name sorted. Generally if a SAM/ BAM is sorted, its coordinate-based.

Check the header of SAM to see what sorting is present.
To name sort, use samtools sort command and use the -n option instead of the -o option.

HTSeq expects that the read pairs would be consecutive in SAM file. For PE data, esp. in RNA-seq, coordinate sorting screws this expectation.

Edit -
-o in samtools sort is for output. Sorry about confusion. Coordinate sorting is default. For name sorting use -n

**TimK** · 08-17-2014, 04:20 PM

HI amitm,

thanks for the suggestion but the bam files I used first are sorted (tried name and position).
I then used only sam files (for human readability) and checked them manually. They seem
to be sorted by name (at least the few % I checked were). So I am quite sure the sorting is
not the problem.

**amitm** · 08-17-2014, 04:29 PM

hi TimK,
That missing mate warning is characteristic when HTSeq is supplied with non- Name-sorted SAM.
Name sorting is not default and its different from the (coordinate) sorted BAM which (say) TopHat returns.
Manual checking is not reasonable as there are millions. After using samtools sort with -n parameter, do you still get the same error?

**TimK** · 08-17-2014, 07:10 PM

Just to exclude possible errors I re-did the following:
1. I sorted a *.bam using samtools sort -n
2. I sorted a *.sam using sort -s -k 1,1

The result for both is the same:
Warning: 18631016 reads with missing mate encountered.
29112998 SAM alignment pairs processed.

Additionally, reagarding my own scripts:
1. >95% of the read names appear twice in the *.sam file
2. all reads have their mate in the next line

And I agree that it sounds very much like the sam/bam files are not correctly sorted but I can't find the mistake. Is there any naming convention and some characters I use in the names shouldn't be used?
Also I realized that there are several versions of the sam format. I'm using samtools v0.1.19-44428cd. Could it be related to that?

TimK

**amitm** · 08-18-2014, 12:18 AM

hi TimK,
Samtools version is fine and it shouldn't create such an error. Nonetheless even after Name sorting if you are getting error then probably you should do these -
1) Use Picard to calculate alignmnet statistics of your data. Is proper alignment good?

2) Try another count generation tool like featurecounts

This tool doesn't require Name sorting. Typical coordinate sorted BAM file can be directly taken as input. And is faster too.

See if that helps.

**TimK** · 08-18-2014, 01:23 AM

Hi Amitm,

thanks a lot for the help. I used smalt for the mapping to the reference but will have a look at Picard and definitely at featurecounts.

Hope it will work with one of those.
Cheers,
Tim

**ronaldrcutler** · 06-23-2016, 01:08 PM

Follow-up

Hello, I ran into a similar problem using BAM files and then sorted using:

Code:

samtools sort -n -O BAM -o <output> <input>

While this indeed fix the error message about 'missing mates' and this other warning:

Code:

arning: Read ACB052:253:C76YKACXX:8:2311:13183:88600 claims to have an aligned mate which could not be found in an adjacent line.

I went ahead and compared the read counts of the sorted and unsorted after running through HTSeq-count and found that the sorted had significantly less counts even though the __no_feature, __ambiguous, and __alignment_not_unique parameters were less in the sorted file than the unsorted.

Any thoughts on this?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

HTSeq question: high number of missing mates/unmatched reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News