Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • htseq-count only not-unique counts

    Hi all,

    When counting for stranded RNAseq reads on an annotated BAC-sequence, i get no counts for my features although the features show tons of reads aligned to it in IGV.
    I checked if those were all multimapped reads, but there are actually also a lot of uniq alignments (NH:i:1).
    Do you know how to solve this?


    Reads were aligned using tophat2 and counted with following options:

    Code:
    htseq-count -m union -s reverse -i ID -t gene RNAseq2013-1-htseq-sorted.sam ../../BAC-Annotation/pure-bac2.gff
    The output i get from htseq-count:

    748 GFF lines processed.
    246143 sam line pairs processed.
    augustus_masked-bac2_383_131865-processed-gene-0.19 0
    maker-bac2_383_131865-augustus-gene-0.20 0
    maker-bac2_383_131865-augustus-gene-0.21 0
    no_feature 124646
    ambiguous 0
    too_low_aQual 0
    not_aligned 0
    alignment_not_unique 121497

    one of the features:
    maker-bac2_383_131865-augustus-gene-0.20 has location at: bac2:383-131865 65171-70109

    one of the unique aligned reads which should produce a count:
    ----------------------
    Read name = OBIWAN:33276GACXX:8:1106:19961:7833
    Location = bac2:383-131865:65,654
    Alignment start = 65,610 (+)
    Cigar = 51M
    Mapped = yes
    Mapping quality = 50
    ----------------------
    Base = G
    Base phred quality = 37
    ----------------------
    Pair start = bac2:383-131865:65737 (-)
    Pair is mapped = yes
    Insert size = 178
    Pair orientation = F2R1
    ----------------------
    Second in pair
    -------------------
    MD = 51
    XG = 0
    NH = 1
    NM = 0
    XM = 0
    XN = 0
    XO = 0
    AS = 0
    YT = UU
    -------------------
    Alignment start position = bac2:383-131865:65610
    TGAGAACAAGATGGTGAAGCTCATAGCTGATGCTAGCAAAGAGTGGGGGAT

    Thank you, Michel

  • #2
    You might use the "-o" option to try to debug things.

    Comment


    • #3
      I'd be looking at the various -s options. Although you do seem to be aware that they exist and that your run is stranded, is reverse the correct orientation?

      Comment


      • #4
        Hi, thank you for the fast response. i already played with the -s option which doesnt change the output and i am pretty sure "reverse" is the correct setting for stranded truseq illumina rna-seq reads.

        When using the -o option, the sam file is only containing multimapped reads like:

        OBIWAN:33276GACXX:8:2311:4833:66136 161 bac2:383-131865 65536 3 51M = 65923 3264 AATGAGCAACCAGCAGCCACAACCCTGCATGGGGTAGTTCTTCAAGTGCCA @CCFFFFFHHHHHJJIJJJJIJJJJIIJGIJJJJ?FHGHIIIJJJJGIJIG AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:2 CC:Z:= CP:i:65536 HI:i:0 XF:Z:alignment_not_unique


        BUT when i look at the orginial sam file to do the counting with, i find unique aligned reads on the same feature (from position 65171 to 70109 of bac2).
        Unfortunately they dont get reported.

        OBIWAN:33276GACXX:8:2111:9231:51465 163 bac2:383-131865 65541 50 51M = 65637 147 GCAACCAGCAGCCACAACCCTGCATGGGGTAGTTCTTCAAGTGCCAGTGAT CCCFFFFFHHHHHJJJJJJJJJJJJJJJJCGIHIJIJJJJJHIJJJJFHII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:51 YT:Z:UU NH:i:1

        By the way, do you know what the = sign in column 7 means? Should this be the strand information?

        best, Michel

        Comment


        • #5
          It seems very odd that the unique alignments aren't output to the file specified with -o. Maybe try making a miniature SAM file with just a couple uniquely aligned pairs that you know should be counted in transcripts and then rerun htseq-count (optionally with a miniature annotation file). BTW, what version of htseq-count and python are you using?

          Regarding the = sign, that just means that the mate is mapped to the same chromosome/contig. If the mates are mapped to different contigs, then that field will contain the name of the other contig.

          Comment


          • #6
            i am using htseq-count version 0.5.4p3 and Python 2.7.3

            Comment


            • #7
              Originally posted by moser View Post
              i am using htseq-count version 0.5.4p3 and Python 2.7.3
              I'm unable to reproduce this issue with the same versions of htseq-count and python. You might try making a SAM file with just a few reads and an annotation with just a single gene to which the reads should align and rerun htseq-count. If the reads are missing from the file specified by -o, then perhaps just post things here.

              Comment


              • #8
                Sorry, if this seems too simple to be an issue, but did you check that the location of your features match in both the sam and GTF files?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                39 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                35 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X