Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • hg19 genome and unassigned sequences

    Hi!

    I started working with RNA-seq data on an in-house Galaxy installation, but they did not have tophat installed, which I wanted to give a try. So I installed tophat2 on my own computer, together with a hg19 Bowtie2 index that I found on the Illumina ftp server (iGenomes).

    One significant difference that I noted was that the Illumina hg19 genome did not contain the chrUn or hap sequences, which were present in the Galaxy installation. However, the associated genes.gtf did contain annotation on these "chromosomes".

    Here comes my question: does it matter to tophat2/bowtie2 whether I have the chrUn etc. sequences present or not?

    I would expect that my fastq data will align less complete if parts of the genome data is missing and indeed, I find ~75% of reads aligning to the genome, while a related STAR run on the Galaxy-installed genome covers almost 90% of the reads. I could imagine that the difference is junk that I am not interested in anyway, but I am not sure.

    What I would like to do afterwards is an expression analysis with cufflinks/cuffdiff.

    Happy to hear your feedback.

    abisko00

  • #2
    Short update to this question:

    the Tophat2 manual demands that the gtf-file uses the same chromosome identifiers as the bowtie-index.
    So before using a known annotation file with this option please make sure that the 1st column in the annotation file uses the exact same chromosome/contig names (case sensitive) as shown by the bowtie-inspect command above.
    This is clearly not the case, since as I said, the hg19_genes.gtf supplied by iGenomes contains the chrUn and hap annotations and the hg19 fasta does not provide the associated sequences. I just started another tophat run with a "cleaned" hg19_genes.gtf. I'll let you know if it matters, but I nevertheless appreciate your input. I really don't want to repeat tophat on all my seq-data.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 11:49 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    61 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X