Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTseq count - all reads are "alignment_not_unique"

    I have been having problems using HTseq-count.
    I’ll go through my protocol
    1: Take tophat output (accepted_hits.bam), sort them by read and convert to bam file.

    samtools sort -n mst12_t14_2_accepted_hits.bam mst12_t14_2_accepted_hits_read_sorted

    samtools view mst12_t14_2_accepted_hits_read_sorted.bam > mst12_t14_2_accepted_hits_read_sorted.sam


    2: Use HTseq-count to count reads for each gene using GFF file used for tophat alignment

    htseq-count -m union -s no -t exon -i gene_id mst12_t14_2_accepted_hits_read_sorted.sam magnaporthe_oryzae_70-15_8_transcripts.gff > test.txt

    There were no error messages but when I looked at the output, the count for every gene is zero and all the reads are in “alignment_not_unique” category. Reads in this category supposedly have NH > 1 in SAM file, but the reads I can see all have NH=1.
    Below is a sample of the SAM file and the GFF.

    D3P26HQ1:1290TA0ACXX:2:1101:1108:73648 73 supercont8.4 4401713 50 35M * 0 0
    CCTTGATGTGGACGCGCCCNGCGACCTTGCTGGCG ?=@BDDDDCF>D>EGG6FF#1:?DHDB@DH9D>;F AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:19G15 YT:Z:UU NH:i:1
    D3P26HQ1:1290TA0ACXX:2:1101:1109:25458 89 supercont8.4 1740219 50 35M * 0 0
    AGCGAATGACGAGCGNCGCTATCATGATAACGTCA ?091D?)1:?<<333#@2EDE<DC<>C;?2=41?: AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:15A19 YT:Z:UU NH:i:1
    D3P26HQ1:1290TA0ACXX:2:1101:1112:73199 99 supercont8.1 3567120 50 36M = 3567285 210
    TGCACAGAGAGATGGTTGCNTATATATATGCAAAAA ?<;ADDDD+<C;DE@2A@@#33A1CBEEC999CDDD AS:i:-1 XN:i:0 XM:i:1 XO:i:0
    XG:i:0 NM:i:1 MD:Z:19A16 YT:Z:UU NH:i:1
    D3P26HQ1:1290TA0ACXX:2:1101:1112:73199 147 supercont8.1 3567285 50 45M = 3567120 -210 CACAGACAAAGCGGCTCGTCTTGGCGCGGGCACGCATTGGCTCGG @CBHE?C?EGA@A;-(.9DDED0BD@@;<)B2DA<3HD80+BB?@ AS:i:-5 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:15G6T22 YT:Z:UU NH:i:1
    D3P26HQ1:1290TA0ACXX:2:1101:1113:7390 89 supercont8.6 1611568 50 34M * 0 0 AGTACGTCTCGGCTNTCCAGAAGGCCAAGATCAC ?1CC?)1C:+33#3BC:EECA++A?BBB+??? AS:i:-3 XM:i:2 XO:i:0 XG:i:0 MD:Z:2G11T19 NM:i:2 NH:i:1

    supercont8.5 MG8_CALLGENES_FINAL_1 start_codon 4470284 4470286 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 stop_codon 4469688 4469690 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 exon 4470173 4470286 . - . gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 CDS 4470173 4470286 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 exon 4470034 4470120 . - . gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 CDS 4470034 4470120 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 exon 4469273 4469969 . - . gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 CDS 4469691 4469969 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";


    Any ideas?

    Cheers, Darren

  • #2
    I'd guess you are using the version of two weeks ago, which contained a bug misreading the "NH" tag. Try the newest version.

    Comment


    • #3
      Thanks for your swift reponse Simon, I have installed the latest version (HTSeq-0.5.3p9) and everything works fine.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      69 views
      0 likes
      Last Post seqadmin  
      Working...
      X