Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • where are my reads going: millions of fragments that uniquely aligned to the genes

    Hi experts,

    I am new to sequencing and the related use of bioinformatics so to skip the process of generating bam files I do the following:

    I first generate Tophat alignment in illumina basespace.
    The alignment and the summary of the alignment looks reasonable, for example for a rapid run of 90 samples for one of the sample I get the following data in basespace:

    Number of Reads: Read1-4,709,154 Read2-4,709,154
    Total Aligned Reads (% Reads): Read1-81.68% Read2-85.20%
    Abundant Reads (% Reads) Read1-18.52% Read2-19.53%
    Unaligned Reads (% Reads) Read1-18.32% Read2-14.80%

    I then take the bam files [*.alignment.bam] and use it to align to the human genes .gtf file using the bioconductor RNAseq workflow pipeline. I have downloaded the human genes .gtf file from iGenome.

    when I check for the millions of fragments that uniquely aligned to the genes using the " round( colSums(assay(se)) / 1e6, 1 ) " command in R

    I get only 0.5 million reads aligned to genes

    given the tophat data from basespace (shown above) even if I consider 81% of 4,709,154 and then subtract the percent for unaligned reads and abundant read I should still get around 2 million reads for the sample.

    where are my reads going when I am aligning it to the .gtf file.

    Thanks in advance for your valuable time.

    Ram

  • #2
    Always the first thing to check is if your fasta and gtf use the same chromosome notation.

    Comment


    • #3
      Thanks,
      But I guess if the chromosome notations are different the command will not be executed. If I am getting some results, shouldn't it mean that the notations are same?
      Besides I am using gtf file from illumina igenome and the bam files are also generated by illumina.

      In your answer do you mean, bam file and gtf file, right?

      Thanks

      Comment


      • #4
        The chromosome names in the bam file and bam file header will be the same as those in the reference fasta file used to align the reads to.

        Have you checked if the basespace alignment and the gtf file use the same version of the human genome?
        Last edited by mastal; 04-04-2017, 06:34 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X