Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • No counts from HTSEq

    So I've just used the pre-built GRCh38 bowtie2 index to map paired end reads using Tophat. Approximately 70% of the reads mapped (~11000000).
    Code:
    tophat -o pilot_S10.5 ../bowtieIndex/GCA_000001405.15_GRCh38_n
    o_alt_analysis_set.fna.bowtie_index pilot_S10_L005_R1_001.fastq pilot_S10_L005_R2_001.fastq
    These were then sorted by name (samtools sort -n). I then downloaded the Homo_sapiens.GRCh38.88.gtf file from Ensembl and ran the sorted reads through HTSeq:

    Code:
    htseq-count -f bam -t gene --stranded=yes accepted.sorted.bam /projects/comm00007/rnaSeqData/Homo_sapiens.GRCh38.88.gtf  > sample.count
    This has resulted in zero reads mapping. I get ~1/3rd of reads registering as no feature and the rest register as alignment not unique. I also get nothing if I ask to count exon rather that gene. When I load a sample of the reads up into Seqmonk though, I get numerous reads mapping to genes. Any thoughts as to why?

    The only thing that I can think of is that the bowtie index file isn't the same assembly as the gtf file. I'm in the process of downloading corresponding fa file (Homo_sapiens.GRCh38.dna.toplevel.fa.gz I think) from ensembl and trying to align everything again. I'm not particularly sanguine though as I'm assuming the gtf that Seqmonk uses is coming from ensembl as well, whilst my alignment is using the bowtie2 pre-built index.

    Cheers
    Ben.

  • #2
    If you are just starting out then do not use TopHat. It is no longer the state of art for RNAseq data analysis.

    You could use any other splice aware aligner or if you want to stay in the "family" then HISAT2/StringTie is the current recommended software from the same folks who developed TopHat.

    Comment


    • #3
      If your reference, indexs and annotations do not match exactly (in terms of gene names) then you are not going to get the counting to work. For counting also consider using featureCounts. Much faster, can produce count matrix from multiple BAM files and can take non-sorted BAM's.

      Comment


      • #4
        I'm not just starting out. I used the exact same pipeline to process a timcourse about a year-ish ago - one of the reasons I was thinking that the gtf/genome files, as you say, might not be matching. In an odeal world, there'd be an associated gtf file alongside the pre-generated bowtie indexes.

        Thanks for pointing me towards HISAT2, will investigate/align. Though, just because it's no longer cutting edge, doesn't mean that Tophat is now useless. The alignment should at least, be reasonable. Moving to HISAT2 will leave me with the same conundrum of not being sure that the pre-built indexes are the same as the gtf file I get from Ensembl. Unless I go and build one myself that is.

        So now it just gets strange. I had found featurecounts and ran it yesterday. It's giving me a very small proportion of reads mapping to genes and a large proportion being multi-mapped. Which leaves me with two possibilities. Either a) there's something unexpected going on in my data or b) given that it's using the same gtf that htseq-count used to produce zero counts, there's something odd going on in the gtf/genome file combo. I'm guessing a), but am curious as to why htseq wasn't/isn't working.

        Comment


        • #5
          Originally posted by tirohia View Post
          I'm not just starting out. I used the exact same pipeline to process a timcourse about a year-ish ago - one of the reasons I was thinking that the gtf/genome files, as you say, might not be matching. In an odeal world, there'd be an associated gtf file alongside the pre-generated bowtie indexes.

          You can get those from Illumina iGenomes site. The bundle contains matching sequence, annotations, indexes the whole bit.

          Thanks for pointing me towards HISAT2, will investigate/align. Though, just because it's no longer cutting edge, doesn't mean that Tophat is now useless.
          Fair point. Authors of TopHat have this note on their site now.
          ---------------------------------------
          Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality (i.e. spliced alignment of RNA-Seq reads), in a more accurate and much more efficient way.
          ----------------------------------------

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-27-2024, 06:37 PM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-27-2024, 06:07 PM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          56 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          70 views
          0 likes
          Last Post seqadmin  
          Working...
          X