Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing multiple nuclear capture probe sets

    Hi,

    I'm interested in doing HybSeq and I designed a few sets of baits/probes to do capture sequencing of hundreds of nuclear loci in ~120 species of a plant genus. All species are closely related and the genus is relatively young (~3 million years). I designed the probes with a new pipeline (Sondovac) that uses transcriptome and genome skim data (shotgun libraries), both of which are provided by the user. The program worked well with one transcriptome and one genome skim data (574 genes, >600bp), however, I have four transcriptomes and 19 libraries that I would like to take advantage of.

    My approach was to combine the 19 libraries into a single file—they are all from closely related species—making a composite/chimeric low coverage genome representing all species. I then ran the pipeline to compare these reads against each of the transcriptomes. Overall I got really good results, recovering ~1400 to ~1700 genes (>600bp) depending on the transcriptome I used.

    My question is the following, how can I compare these four sets of probes and pick the best one, or even better, how can I combine these sets into an all encompassing one? How can I compare the sets and find the shared probes—using BLAT or similar program?

    I was thinking of two ways to combined them: 1) create a non-redundant set of probes using CD-HIT-EST at a low similarity (80%) so that many of the initial probes are cluster together and only a few are kept, or 2) create a set of probes that are shared among the four initial sets (maybe find the shared ones with BLAT?), and then, create a non-redundant set from those–i.e., take the shared probes and runs them through CD-HIT-EST to get unique probes. I've attached a figure to explain my two ideas. The total number of probes from the four sets combined is ~15,000 and option 1 reduced this number to ~7,500 non-redundant probes.

    Option 2 will have less probes but these will be shared among sets, giving me high confidence that the genes exist in every sample. Option 1 will result in more probes but some of those will only be found in some samples and not in others. There is some capture failure expected with both methods regardless.

    Any comments or recommendation (technical, conceptual, or programs to use) on any of this will be greatly appreciated.

    Cheers,
    Simon
    Attached Files
    Last edited by saimonara; 05-13-2016, 12:43 PM.

  • #2
    Any ideas? Anyone?

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 08:47 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    54 views
    0 likes
    Last Post seqadmin  
    Working...
    X