Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use SAM file to pull reads from FASTQ

    Hi Folks,

    I have a SAM file with unpaired reads (originally from a FASTQ) and I would like to use it to pull the read and its pair from the FASTQ file - does anyone know if there is a script out there to do this?

    I have used the Picard tools SamToFastq but to my knowledge there is not a script in Picard or SamTools to do exactly what I described here (or maybe there is and I just haven't found it!).

    Thank you!

  • #2
    If the reads are unpaired, how can you pull their mate?

    Juts work out the read names you desire, and write a short script to fish those reads out.

    Comment


    • #3
      ^ good point...i have a feeling there is some miswording in the question.

      I'd try to truncate the file through some form of filtering (maybe samtools or bamtools) and then use one of the sam/bam to fastq conversion scripts.
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment


      • #4
        Yes, sorry for the unclear wording. The SAM file is a result of mapping paired end reads to a reference. I have a SAM file with mapped mated pairs that I was able to convert to a FASTQ which worked great. But I also have a SAM file with mapped unmated pairs - it is with this file that I would like to use to pull the reads that mapped (but their "mate" did not) and their pair from the original FASTQ files.

        Ideally the output would be these pairs in a FASTQ file.

        Comment


        • #5
          You should be able to extract those alignments as long as the aligner you used set the flags right. The unmapped mates will have 0x4 set and the mapped mates should have 0x8 set. You might need to name sort the bam first but then you could pull out only those reads with this:
          Code:
          samtools view -f 0xC -b alignments.bam > singletons.bam
          Then convert that bam file into fastq.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #6
            Actually that will also pull out all unaligned reads in addition to your singleton alignments. So more filtering will be necessary. Pairs make this tricky because the SAM annotation of pairs is messy.
            /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
            Salk Institute for Biological Studies, La Jolla, CA, USA */

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X