Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help picking up an abandoned sequencing project

    Hello All,

    My lab has gotten two runs of mouse genome sequences from two genomes. Both runs are paired end sequenced, once with 36bp and once with 100bp. Unfortunately the people from the sequencing core who actually ran the sequences are unresponsive and the people from my lab who coordinated with them are long gone.

    Now I am trying to pick up the sequencing project. So far there are SNPs and InDels from samtools, and SVs from GASV and BreakDancer. BreakDancer and samtools were run by others so I am not sure what parameters they received. I have several issues that I am looking for help on:

    (1) I checked the deletions called by Breakdancer and GASV using the samtools sequence viewer, and found that reads often map in the called deletions, but have a lower quality. Does anyone have any sanity check suggestions for working with SVs from GASV and BreakDancer?

    (2) I suspect that some of the deletions are due to transposable element insertions in the reference and vice versa. I would like to find the transposable element insertions, but don't know of any tools out there for doing this. Do you guys know of any? If not, does anyone have a suggestion for how to pull out of BAM files only the paired reads with one end mapped?

    --> This last part is what I am struggling with because I don't even know how the BAM files were made and what was included. Also I heard that sometimes unmapped reads get the same coordinate as the mate, would this hurt my situation or is there a flag that I could use?


    These are just two of my most pressing troubles, but please let me know if you have any suggestions. Thank you in advance for any help.
    Last edited by giror; 06-17-2011, 08:44 AM.

  • #2
    Transposon mapping with paired-end reads is straightforward.

    1) Create a reference file that contains the sequence of each transposon.
    2) Align read one and read two separately to the transposon reference.
    3) Align read one and read two separately to the genome reference, using repeat masking (so you won't align to transposons).
    4) Filter the read one genome alignments with the read two transposon alignments, using the unique read identifier.
    5) Repeat with read two genome and read one transposon alignments.

    There are more sophisticated strategies, but this works relatively well given adequate read depth.

    -Harold

    Comment


    • #3
      thanks Harold

      This is generally the strategy I imagined. Unfortunately I am on an 8gig ram mac with a terabyte HD and I am not sure I could efficiently read through the entire BAM files which are 51 and 71 GB. The reads have already been mapped back to the genome, but I'm not sure of the parameters that were used. Do you know of a way I could get this information from the BAM?

      If not, could you recommend an alignment program given the hardware constraints that I am under?
      Last edited by giror; 06-17-2011, 10:36 AM.

      Comment


      • #4
        The approach I suggested would almost certainly require repeating the alignments. I don't know which aligner was used to generate your existing dataset, but the repeats were either masked (yielding no matches) or not (multiple matches). Most aligners return the unique matches so, either way, the transposon reads would be missing.

        Our aligners run on a server cluster, so I can't offer any software recommendations for your system. A cloud solution might be your best option.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X