Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • where are my reads going: millions of fragments that uniquely aligned to the genes

    Hi experts,

    I am new to sequencing and the related use of bioinformatics so to skip the process of generating bam files I do the following:

    I first generate Tophat alignment in illumina basespace.
    The alignment and the summary of the alignment looks reasonable, for example for a rapid run of 90 samples for one of the sample I get the following data in basespace:

    Number of Reads: Read1-4,709,154 Read2-4,709,154
    Total Aligned Reads (% Reads): Read1-81.68% Read2-85.20%
    Abundant Reads (% Reads) Read1-18.52% Read2-19.53%
    Unaligned Reads (% Reads) Read1-18.32% Read2-14.80%

    I then take the bam files [*.alignment.bam] and use it to align to the human genes .gtf file using the bioconductor RNAseq workflow pipeline. I have downloaded the human genes .gtf file from iGenome.

    when I check for the millions of fragments that uniquely aligned to the genes using the " round( colSums(assay(se)) / 1e6, 1 ) " command in R

    I get only 0.5 million reads aligned to genes

    given the tophat data from basespace (shown above) even if I consider 81% of 4,709,154 and then subtract the percent for unaligned reads and abundant read I should still get around 2 million reads for the sample.

    where are my reads going when I am aligning it to the .gtf file.

    Thanks in advance for your valuable time.

    Ram

  • #2
    Always the first thing to check is if your fasta and gtf use the same chromosome notation.

    Comment


    • #3
      Thanks,
      But I guess if the chromosome notations are different the command will not be executed. If I am getting some results, shouldn't it mean that the notations are same?
      Besides I am using gtf file from illumina igenome and the bam files are also generated by illumina.

      In your answer do you mean, bam file and gtf file, right?

      Thanks

      Comment


      • #4
        The chromosome names in the bam file and bam file header will be the same as those in the reference fasta file used to align the reads to.

        Have you checked if the basespace alignment and the gtf file use the same version of the human genome?
        Last edited by mastal; 04-04-2017, 06:34 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X