Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat can't find read-pairs??

    Hi, got a question that may be a very quick answer... I am using PE Illumina reads for Tophat runs, and always get output like this:


    bowtie.left_kept_reads.log
    52107199 reads; of these:
    52107199 (100.00%) were unpaired; of these:
    21494360 (41.25%) aligned 0 times
    27619125 (53.00%) aligned exactly 1 time
    2993714 (5.75%) aligned >1 times
    58.75% overall alignment rate

    bowtie.right_kept_reads.log
    52025239 reads; of these:
    52025239 (100.00%) were unpaired; of these:
    23520504 (45.21%) aligned 0 times
    25623432 (49.25%) aligned exactly 1 time
    2881303 (5.54%) aligned >1 times
    54.79% overall alignment rate



    ...and I am using Tophat with the following command:

    tophat --no-convert-bam -p 20 genome_ref sample1_1.fastq sample1_2.fastq


    My question: why is tophat saying that 100% of the left and right reads were unpaired? My two reads files have exactly the same number of lines and were filtered before tophat to only keep the matching PE reads. Here is an example of my reads:


    ==> sample1_1.fastq <==
    @HHHABC:23:CEX1CEE:4:1101:2733:2083/1 1:N:0:CGATGT
    GCACATCCAATAACAAATTGTCTTTTATAAATGGTTACTTATTTGAGCAGAATTGAGCAAGACAGCCATGCAAAGTGTTACGTTGAAATACTGTCAATATG
    +
    @@@DDDDFHGHGGGGIGIJGHIIJJJICICICHHHFHIIIGJJJIGEGGHBFFHHEHIJJJJJJIEIGHFHIHIIHHIEEHFHFEDEDACEAEEEDDCDCD
    @HHHABC:23:CEX1CEE:4:1101:2837:2057/1 1:N:0:CGATGT
    CTTGTTTAGTTCGGGACTCCGCGGCTCTGGAACGGAACTACAAGAGCCGCAGGTCCGGTTTGAAAAGCTGCAACAGCTGGGTTT
    +
    :BDFDDFBFFFAEGIADHHGG8?@FHGFDH<FBFHGAEEC?E>EAB7@>BB@B55:=B?BDDA:>>C8>@9>AA?AACDC0<??


    ==> sample1_2.fastq <==
    @HHHABC:23:CEX1CEE:4:1101:2733:2083/2 2:N:0:CGATGT
    CTTACATTTTCGCATGGCTTAATATATAGCAGGTATTAAACATTACCTATATTTATAAGTCCATCATTAATGAACAACATATGGGTTATTGTCATATTGAC
    +
    @CFFFFFHDFHHIJJJIJIIJIEIEEGHGGIGCFGHEHGHIGHHIJJGIIGGIIEIJDEIHIJIJJJJIJIIEIFGIDHHGAEFDFDBCCDEEEDEEECC
    @HHHABC:23:CEX1CEE:4:1101:2837:2057/2 2:N:0:CGATGT
    GAATGAATTAACCTCGAAATATGCTGCTGGATGCAAAGAAGTCGAATGTATTTATGATCTAGATTTATTATTGCGGTGAAGCAGCTGACATGTTTCTGTCC
    +
    @<<DDADAFAFHHIJJJJCB>FHGIIIFGJBGHAHGCGECG@FDEEHH*9BFHIGIHE=CFG><FHDGCEE>@AAEEHDCFBCEDCEACCCCDEEDDDDC>



    Does anyone else experience the same problem!? Thanks for the input.

  • #2
    I am guessing here but I suspect what is happening is that Tophat is running bowtie twice -- once with only the left reads and one only with the right reads. Thus bowtie does not think that there is any pairing since bowtie does not get pairs to begin with. After bowtie is complete with the two separate input files then Tophat gets to pick-and-choose the reads it wants from the left and right outputs in order to make a sensible alignment based on the known intron/exons.

    Comment


    • #3
      I think the bit flag in the output bam file will tell you that it's pair end read

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 11:49 AM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X