Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RSII mapper benchmarking

    Hi all,

    I have RNA sequencing data (D. melanogaster) from 3 libraries (1-2Kb, 2-3Kb, 3-7Kb). What would be your suggestion for a mapper? I'm mostly interested in benchmarking isoform detection compared to illumina HiSeq2500. I'd like to do a benchmarking of the "established" tools (since I realized this knowledge is missing), but you can suggest new ones.

    Thanks in advance for suggestions.

  • #2
    Do you mean how to map PacBio transcriptome (Iso-Seq) reads back to the reference genome or back to the reference transcript?

    It also depends on what you have already done for the Iso-Seq data. If you have the results from running the classify + cluster pipeline of Iso-Seq, you will be getting something called "high-quality, quiver-polished, full-length sequences". These are expected to be at least >= 99% accurate. You can align them to the genome using GMAP/STAR or to the reference transcript using BLAST or BLASR.

    If you have something called reads_of_insert.fasta (CCS reads) or isoseq_flnc.fasta (CCS reads, but specifically, full-length ones), they are of variable quality ranging from 85-99%+. They can be mapped with GMAP/STAR (but careful with low quality alignments, may need filtering) and BLAST/BLASR too (but again, careful with quality).


    Please refer to this tutorial for details on how to use each aligner:
    GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.


    Additionally the wiki contains may useful information for analyzing Iso-Seq data downstream.



    Also look into Iso-Aux for combining short read + long read data and doing comparisons:
    GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.


    --Liz

    Comment


    • #3
      Originally posted by Magdoll View Post
      Do you mean how to map PacBio transcriptome (Iso-Seq) reads back to the reference genome or back to the reference transcript?

      It also depends on what you have already done for the Iso-Seq data. If you have the results from running the classify + cluster pipeline of Iso-Seq, you will be getting something called "high-quality, quiver-polished, full-length sequences". These are expected to be at least >= 99% accurate. You can align them to the genome using GMAP/STAR or to the reference transcript using BLAST or BLASR.

      If you have something called reads_of_insert.fasta (CCS reads) or isoseq_flnc.fasta (CCS reads, but specifically, full-length ones), they are of variable quality ranging from 85-99%+. They can be mapped with GMAP/STAR (but careful with low quality alignments, may need filtering) and BLAST/BLASR too (but again, careful with quality).


      Please refer to this tutorial for details on how to use each aligner:
      GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.


      Additionally the wiki contains may useful information for analyzing Iso-Seq data downstream.



      Also look into Iso-Aux for combining short read + long read data and doing comparisons:
      GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.


      --Liz
      How to filter out all GMAP alignments that have less than 90% of the read aligned or less than 80% identity, by using "filterBAM" ?

      Comment


      • #4
        GMAP has two parameters to filter by coverage and identity. Do gmap --help for more details.

        The two parameters are --min-trimmed-coverage and --min-identity.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X