Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • No counts from HTSEq

    So I've just used the pre-built GRCh38 bowtie2 index to map paired end reads using Tophat. Approximately 70% of the reads mapped (~11000000).
    Code:
    tophat -o pilot_S10.5 ../bowtieIndex/GCA_000001405.15_GRCh38_n
    o_alt_analysis_set.fna.bowtie_index pilot_S10_L005_R1_001.fastq pilot_S10_L005_R2_001.fastq
    These were then sorted by name (samtools sort -n). I then downloaded the Homo_sapiens.GRCh38.88.gtf file from Ensembl and ran the sorted reads through HTSeq:

    Code:
    htseq-count -f bam -t gene --stranded=yes accepted.sorted.bam /projects/comm00007/rnaSeqData/Homo_sapiens.GRCh38.88.gtf  > sample.count
    This has resulted in zero reads mapping. I get ~1/3rd of reads registering as no feature and the rest register as alignment not unique. I also get nothing if I ask to count exon rather that gene. When I load a sample of the reads up into Seqmonk though, I get numerous reads mapping to genes. Any thoughts as to why?

    The only thing that I can think of is that the bowtie index file isn't the same assembly as the gtf file. I'm in the process of downloading corresponding fa file (Homo_sapiens.GRCh38.dna.toplevel.fa.gz I think) from ensembl and trying to align everything again. I'm not particularly sanguine though as I'm assuming the gtf that Seqmonk uses is coming from ensembl as well, whilst my alignment is using the bowtie2 pre-built index.

    Cheers
    Ben.

  • #2
    If you are just starting out then do not use TopHat. It is no longer the state of art for RNAseq data analysis.

    You could use any other splice aware aligner or if you want to stay in the "family" then HISAT2/StringTie is the current recommended software from the same folks who developed TopHat.

    Comment


    • #3
      If your reference, indexs and annotations do not match exactly (in terms of gene names) then you are not going to get the counting to work. For counting also consider using featureCounts. Much faster, can produce count matrix from multiple BAM files and can take non-sorted BAM's.

      Comment


      • #4
        I'm not just starting out. I used the exact same pipeline to process a timcourse about a year-ish ago - one of the reasons I was thinking that the gtf/genome files, as you say, might not be matching. In an odeal world, there'd be an associated gtf file alongside the pre-generated bowtie indexes.

        Thanks for pointing me towards HISAT2, will investigate/align. Though, just because it's no longer cutting edge, doesn't mean that Tophat is now useless. The alignment should at least, be reasonable. Moving to HISAT2 will leave me with the same conundrum of not being sure that the pre-built indexes are the same as the gtf file I get from Ensembl. Unless I go and build one myself that is.

        So now it just gets strange. I had found featurecounts and ran it yesterday. It's giving me a very small proportion of reads mapping to genes and a large proportion being multi-mapped. Which leaves me with two possibilities. Either a) there's something unexpected going on in my data or b) given that it's using the same gtf that htseq-count used to produce zero counts, there's something odd going on in the gtf/genome file combo. I'm guessing a), but am curious as to why htseq wasn't/isn't working.

        Comment


        • #5
          Originally posted by tirohia View Post
          I'm not just starting out. I used the exact same pipeline to process a timcourse about a year-ish ago - one of the reasons I was thinking that the gtf/genome files, as you say, might not be matching. In an odeal world, there'd be an associated gtf file alongside the pre-generated bowtie indexes.

          You can get those from Illumina iGenomes site. The bundle contains matching sequence, annotations, indexes the whole bit.

          Thanks for pointing me towards HISAT2, will investigate/align. Though, just because it's no longer cutting edge, doesn't mean that Tophat is now useless.
          Fair point. Authors of TopHat have this note on their site now.
          ---------------------------------------
          Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality (i.e. spliced alignment of RNA-Seq reads), in a more accurate and much more efficient way.
          ----------------------------------------

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 11:49 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          61 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X