Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Tophats bam output back to separate paired end read fastq files

    Hi all,

    I was wondering if anyone could offer me some advice on using paired end reads with Tophat, specifically with the output. I'm planning on using Tophat as part of a pipeline for processing my sequence data. The reads that map are obviously going to be easy to deal with, but the unmapped.bam file is proving a bit problematic. I would like to get that bam file back to two fastq files containing the paired reads which didn't map to the reference genome (hg19 in this case). What I was thinking was to convert to sam, and then use Picard's SamToFastq function, but that is returning the following error

    MAPQ must be zero if RNAME is not specified;

    Which I haven't been able to find anything about online. I'm also not sure how time consuming this will be. I'm currently just playing around with a random sample of my data just trying to get everything working, but my actual data files are probably going to be 20gb + at least in fastq format anyway.

    I was also thinking of converting the accepted_hits.bam file to sam and then writing a unix script which would take the files which were input into tophat and write any read which isn't present in the accepted_hits file into 2 new files.

    What do you think?

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin


    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
55 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
51 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
45 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X