Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding unmapped paired-end mates with samtools

    I believe the reference sequence is incorrect in my region of interest. I have access to alignment files for several whole genomes and I'd like to use Velvet to do a targeted de novo assembly of just this region. In order to take full advantage of the paired-end reads, I want to compile a file with all reads for which one of the paired-end mates is mapped to the region (I would like the mapped reads, plus any unmapped paired-end mates that don't align to the faulty reference plus any paired-end mates that align elsewhere in the genome). I can use samtools view to identify the mapped reads, but the mates have the same read name, and are only differentiated by 64 vs 132 in the flag code. What is the best way to get the mates that are unmapped or mapped elsewhere?

  • #2
    SAMtools specs call for the unmapped mate of a mapped read to be given the position of the mapped end.

    So if you think that the area from 10,000-12,000 is not right, get all the reads whose coordiantes are 10,000-12,000, and that'll include all the reads that really do map there, as well as the unmapped mates of any of those reads.

    As for reads where one end maps there, and the other maps elsewhere, I don't think you can do that directly with samtools. But once you get a list of all the names of reads that map to your region, you can get all of the reads with that name out of your fastq, and that's really the set of reads you want anyway.
    Last edited by swbarnes2; 10-31-2011, 02:19 PM.

    Comment


    • #3
      Hey newguy,

      I am having the same issue. For me, I am looking for insertions (specifically mobile element insertions) that have occurred in the re-sequenced genome with respect to the reference.

      I ran RetroSeq, and it gave me a bunch of points that it has called as points of insertion of mobile elements. What I want to do is extract all of the reads around that (potential) break-point and do a local assembly. I appreciate swbarnes2's input, but I, perhaps like newguy, was hoping that there was a tool already written for this purpose.

      swbarnes2: Thank you for your suggestion; you have guided my coding. (It probably can be done with a simple grep.)

      Cheers,

      Kevin

      Comment


      • #4
        Mobile element insertion detection question

        me too, if you have the experiance of how to deal with mobile element insertion or known scripts to detect that, please share with me, I am interested it too. I have data from complete genomics, they have generated MEI files for sequencing data, but I dont know how to summary those novel MEI.

        thanks



        Originally posted by kjlee View Post
        Hey newguy,

        I am having the same issue. For me, I am looking for insertions (specifically mobile element insertions) that have occurred in the re-sequenced genome with respect to the reference.

        I ran RetroSeq, and it gave me a bunch of points that it has called as points of insertion of mobile elements. What I want to do is extract all of the reads around that (potential) break-point and do a local assembly. I appreciate swbarnes2's input, but I, perhaps like newguy, was hoping that there was a tool already written for this purpose.

        swbarnes2: Thank you for your suggestion; you have guided my coding. (It probably can be done with a simple grep.)

        Cheers,

        Kevin

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 11:49 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X