Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • slowsmile
    Member
    • May 2011
    • 22

    Why HTseq warning of unfound mate pairs?

    Dear all
    I am using htseq-count tool to summarize gene counts from bam files generated by tophat (v 2.03) based on bowtie2. I've used this pipeline (based on bowtie1) several times with human RNA-Seq and have been generating good results.

    In the most recent project, we are working with Ecoli K12 genome, 100 bp paired-ends.

    I tried htseq-count tool on the accepted_hits.bam files generated by tophat but it gave me all the warnings of "xxx claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)". I then sorted the bam files with samtools prior to this step yet still got no luck: thousands of the same warnings came out and I got no reads in the output gene_counts.txt file.

    I lchecked the sam file (first 10 lines, converted from the sorted bam file) and they looked like these:

    HWI-ST984:1021021ACXX:2:1210:8261:88919 99 chr 1 255 4M14I82M =57 156 AGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACC @@BFFFDFHHHHHJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJIJIIJJJJJJJHFFFBBCEEEEEEDDDDDDDDDDDDDDDDDDDDC AS:i:-57 XN:i:0 XM:i:2 XO:i:1 XG:i:14 NM:i:16 MD:Z:2C0T82 YT:Z:UU NH:i:1
    HWI-ST984:1021021ACXX:2:1308:13660:65155 99 chr 2 255 6M9I85M = 117 215 TATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGT CCCFFFFFHGHHHJJJJJJJJJJJJJJJJJJJJJJJJJIIIIJIJJHIGIFJJJIJGHIJHHHH?CEFEFFEECD>@BCDDDCDDDDDD@CDDDDDBDD9 AS:i:-42 XN:i:0 XM:i:2 XO:i:1 XG:i:9 NM:i:11 MD:Z:0G0C89 YT:Z:UU NH:i:1
    HWI-ST984:1021021ACXX:2:2108:14990:23666 99 chr 10 255 100M = 167 257 TTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTT CCCFFFFFHHHHHJJIIIIJJJJIIIJIJIJIIJHIJIJJJJIJJIJEHIJIJJJJJIHHHHHFFCDFFEEECEEDDDDDDDDBDDACCCDDDDDDCDDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:1
    HWI-ST984:1021021ACXX:2:1214:16246:55224 89 chr 10 255 100M * 00 TTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTT A@C>>CCC9?A<BC>::EECACC=>DDDDECCBB@@EFGGHFC===<FC?893F@B9B>EBDBDB9C9EFB3F?1JIEIGGIIGHEGHDHDFFFFFFCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:1
    HWI-ST984:1021021ACXX:2:1108:7813:47825 99 chr 22 255 100M = 113 191 CGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGT CCCFFFFFHHGHHJJJJHJHIJIJJJJJJJJJJJJHJIIIGIJJIJJJJJJJJJJJJJJJHIJJHHHHHFDDDCC>CCEEDDDDEDDDFDDDDDDDDDCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:1
    HWI-ST984:1021021ACXX:2:1105:8881:46986 163 chr 23 255 100M = 137 214 GGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTC CCCFFFFFHHHHHJEIHGHGIGHGHIJIJJJIIIIIHIIJIIJIJJJJJJIIJHJJIJI@GIJJJIHHHBDFD>AEEEEDDDDEDDDEDDCCDDDDDDCD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:1
    I then checked the sequence stats with samtools flagstat and found 82.25% reads are properly paired.

    So what is wrong with my bam file? There are definitely majority of proper mate pairs in the bam file. Why can't they be sorted in a way that mate pairs are assgined in adjacent lines for htseq-count to read?
    I used samtools sort commend to do the soring? Any better ideas?

    I'm pretty new in this field, so pardon me if similar questions have been asked before.
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    Have you sorted by position or by name? You have to us e"samtools sort -n" to sort by read name, in order to cause lines describing mates to appear next to each other.

    Comment

    • slowsmile
      Member
      • May 2011
      • 22

      #3
      Thanks Simon.
      I forgot to add -n in the samtools sorting process and thus messed up with SAM reads.
      I re-ran the program today and this time htseq-count works fine with by-name soring.

      Comment

      • xy6699
        Member
        • Oct 2011
        • 12

        #4
        Hi,

        I have the same warnings : Warning: Read xxxx claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

        My sam file looks like this:

        HWI-EAS261_0019_FC:1:1:1144:8868#0 99 11 128587218 255 76M = 128587361 219 GAAAAGCACACGCATGATGGTTTTGCTATCGTGTGACATTTATTTCATACTTGCTACCTGTAAGAAATTCCTTGAA IIIIIIHHIIIIEIIIIGIIIIIIIIIIIIIIIGIII<IIGIIIIHGFIIHIIIIIIIFIHIIIHH?IIGHIHIIG NM:i:0 NH:i:1
        HWI-EAS261_0019_FC:1:1:1144:8868#0 147 11 128587361 255 76M = 128587218 -219 AGGGATGCTGTTTCTAAGGCATGTAGGTGCTGAGGGTCTACCCCAAAGGGTAGTTTGGGACTGCAGGGCAGGCAGG DCIFIIIIIIIDBIBIIG@FHGIIIII@HIIIIIIHIIHGIIIIHIIIGIDIIIIHIIIIIIIIIIIHIIIIGIII NM:i:1 NH:i:1
        HWI-EAS261_0019_FC:1:1:1145:1981#0 99 22 21959147 255 76M = 21959234 163 GAGAAGTTCAGATGAGTTTGGCCAAGTTCCCTGGGTGGTGAGAGGCCTGGCCTGCCTCATGTAGTAACAGAACTGC HHHHHHHHHHHFGHHGGGGEGG<GGDEGGGDGGGGGGDDGGEGGGGDGGEDFBGGGGGGBGGGGA<BAEF@GFGEE NM:i:0 NH:i:1
        HWI-EAS261_0019_FC:1:1:1145:1981#0 147 22 21959234 255 76M = 21959147 -163 CCTTCCTCTTTTTGGAAGAAAAAAGAGGCAGGATCTCACTGTCTTGTCCAGGCTGGAAGGCAGTGGCGTGATCATG =F<AE@7IF@EHHGHGIIIIIGGG@G8GGD<GEDGGGBGE>IHGIGIEIIIIGGGIDIFIIGIIGIHHHIIIIIHI NM:i:0 NH:i:1
        HWI-EAS261_0019_FC:1:1:1145:8828#0 99 10 6054667 255 76M = 6054796 5361 TGCCACTGCCCCGTGTCCTGTGATGTGACTTCAGAGCTTCCAAAACGCAGGCAAGCACAACGGATGTCTCCTGGGC DFHHEHHHHHHHGHHHHHHBGEBB:GGGGGGDBGB4DGGGHHHHHHHHGHBHFBHG:G@42FF,,DBBDB+>DGGA NM:i:0 NH:i:1
        HWI-EAS261_0019_FC:1:1:1145:8828#0 147 10 6054796 255 64M5156N12M = 6054667 -5361 CCCTGCTTCTTACCAAGAAATTCTTGTTCTTTTGGTTTTCTAGATTGTTCTTCTACTCTTCCTCTGTCTCCGCTGC CBE3EGDDHBGI>DDIG@BIEBEBHHDD>EG@DDDAGGDFBIBDDBDDDED>DDDGAGGDGG@GGEDGDHIHIDII NM:i:1 XS:A:- NH:i:1

        I have sorted the bam file from tophat using: samtools sort -n
        and then convert bam to sam using: samtools view .bam >.sam

        I could see in my sam file, the lines with the same name are next to each other, why does ht-seq still give me this warning?

        Many thanks

        Comment

        • slowsmile
          Member
          • May 2011
          • 22

          #5
          To: xy6699
          Your sam file looks properly sorted (at least from the section you posted here). The warnings may come from other unpaired reads. Did you check your alignment stats? What is the percentage of aligned reads that are properly paired?

          Comment

          • Simon Anders
            Senior Member
            • Feb 2010
            • 995

            #6
            The warning is not about improperly paired mates but about missing mates. Take the read ID from one of the warnings, grep for it in the SAM file, and check whether it really appears an even number of times, in adjacent lines.

            Comment

            • xy6699
              Member
              • Oct 2011
              • 12

              #7
              Hi,

              Thanks a lot for the reply.

              I looked at the warning reads carefully and found that they have very low mapping quality and actually the adjacent mate reads have the same sequence, so they are not really "mate" pairs.

              Take one warning for example:

              Warning: Read HWI-EAS261_0019_FC:1:1:2912:15323#0 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

              and check the read "HWI-EAS261_0019_FC:1:1:2912:15323#0" in my sam file:

              HWI-EAS261_0019_FC:1:1:2912:15323#0 163 12 57869932 3 18M197N58M = 57870226 505 CCGGCTACCCGCTGGTCCCCAGCCTGCGGAGGGCGCTGTCGGCGGTGGCTCTCGGTAGAACACCAGGCTGTTACCC IIIIIIIHIIIIIIIFHIIIIEGIG<GGGBHIIDEEIIDGADGD+)@C??AAA8ABBDBDEB@EEBC8>C<>@8@? NM:i:1 XS:A:- NH:i:2 CC:Z:= CP:i:57869932 HI:i:0
              HWI-EAS261_0019_FC:1:1:2912:15323#0 419 12 57869932 3 18M197N58M = 57870226 699 CCGGCTACCCGCTGGTCCCCAGCCTGCGGAGGGCGCTGTCGGCGGTGGCTCTCGGTAGAACACCAGGCTGTTACCC IIIIIIIHIIIIIIIFHIIIIEGIG<GGGBHIIDEEIIDGADGD+)@C??AAA8ABBDBDEB@EEBC8>C<>@8@? NM:i:1 XS:A:- NH:i:2 HI:i:1

              I think I can just discard these reads...

              Many thanks,
              Xin

              Comment

              • Simon Anders
                Senior Member
                • Feb 2010
                • 995

                #8
                Originally posted by xy6699 View Post
                ... and actually the adjacent mate reads have the same sequence, so they are not really "mate" pairs.
                Exactly. You may now wonder where in your pipeline the mates got lost (the the other mate with its sequence must be somewhere). Maybe you filtered them out in some previous step.

                Comment

                • Madza Farias Virgens
                  Junior Member
                  • Oct 2016
                  • 2

                  #9
                  The program continues to run even after spiting out these warnings.
                  Does anyone knows if it skips the troubled reads? thanks

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  38 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  100 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  121 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  114 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...