Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Must paired-end reads be in the same order in the two files for Tophat?

    Hello,

    Will TopHat work if R1.fastq and R2.fastq have their reads in different order?

    What if R1.fastq has some reads whose R2 mate did not make it past QC, and viceversa (R2.fastq has some reads whose R1 mate did not make it past QC)?

    I could only find this information about how it uses paired-end information after mapping independently each mate, but I can't find info on how it relates mate pairs to each other on the two files.

    TopHat maps left and right reads separately using Bowtie, that is, it doesn't use Bowtie's pair searching like --fr, --rf, --ff. Using the mapped reads, TopHat finds pairs if the two reads of a pair are on different strand (it ignores if they are on the same strand) and the inner distance is within user specified range.
    From: http://seqanswers.com/forums/showpos...49&postcount=2

    Thanks for your input.
    Last edited by friducha; 02-12-2015, 05:22 PM.

  • #2
    Reads always need to be in the same order. It is best to use tools like BBDuk that keep pairs together for quality control; doing QC on the two files independently will only cause problems. I did write a tool for fixing that situation, though, repair.sh.

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Comment


    • #3
      Just for reference, there are quite a few tools that respect paired data, such as Prinseq and Trimmomatic. Often, custom pipelines don't take this into account. Also, Pairfq is a lighter approach to pairing reads. This makes it really easy to incorporate into a pipeline. In most cases, the follow is all you need to install:

      Code:
      curl -L git.io/pairfq_lite > pairfq_lite
      chmod +x pairfq_lite
      ./pairfq_lite -h
      The last command just prints the usage, which is explained on the the wiki or from the inline documentation available at the command line.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      57 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X