Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • indels using single end short reads!

    We had a sample with known 4bp deletion, but no tool would help me detect that...

    any suggestions?

    SSAHA supposedly does gapped alignment, but it gave me some 'novel' 1 or 2 base indels... not the one we know
    --
    bioinfosm

  • #2
    Originally posted by bioinfosm View Post
    We had a sample with known 4bp deletion, but no tool would help me detect that...

    any suggestions?

    SSAHA supposedly does gapped alignment, but it gave me some 'novel' 1 or 2 base indels... not the one we know
    SOAP may do it...it seems when you compile it, you specify how large a gap you are allowed to call for in the command line.

    "3) Maximum gap size
    -DMAXGAP=3
    Maximum size of a gap allowed in a read, then "-g" option during running should not exceed this definition."

    On the home page, they show 3 as an example, but 4 might work. I don't know how much it will slow down SOAP to allow it to try large gaps.

    I know it finds plenty of 2 bp insertions when I use -g 2.

    Comment


    • #3
      Indels with 4 bases are on the border of what I would consider "sane" when aligning/assembling short sequences. E.g., a 36mer aligned against the same sequence but with 4 bases deletion gives you a score ratio (= score/expected_score) of barely above 70%.

      I normally allow only 1 or 2 errors in Solexa mapping assemblies, but I quickly hacked together a change that will allow you to find indels or base changes with up to 4 bases in a Solexa mapping assembly. Grab http://www.chevreux.org/tmp/mira_2.9...x86_64.tar.bz2
      and run the Solexa demo. Have a look at the results in gap4 and decide for yourself whether this would fit your needs.

      Warning: Work in progress. Works for me, but not necessarily for you

      Regards,
      B.

      Comment


      • #4
        myrialign

        Maybe MyriAlign would be of use to you?
        Savannah is a central point for development, distribution and maintenance of free software, both GNU and non-GNU.

        Comment


        • #5
          SOAP worked nicely on the data... Thanks to the person who shared his script to use soap results and generate indel calls

          I was able to see the 4bp known deletion in the sample

          Torst - are you the author of Myrialign? I will check it out as well
          --
          bioinfosm

          Comment


          • #6
            Depending on your coverage, you can try assembling the reads, then simply blasting the contigs against the genome. I know of a few groups trying to do this, but I haven't heard of success, so I'm curious if you try this how far you get.

            -mark

            Comment


            • #7
              Aligning with Indels

              I've just finished a new aligner that will do indels up to 7bp. I don't have a web site for downloading it but if you'd like to try email novoalign @ gmail.com and I'll send you a copy. It's also at least as speedy as the best of the other aligners.

              Comment


              • #8
                Originally posted by bioinfosm View Post
                SOAP worked nicely on the data... Thanks to the person who shared his script to use soap results and generate indel calls

                I was able to see the 4bp known deletion in the sample
                Would said person be willing to share the scripts for using soap results? thanks in advance.

                Comment


                • #9
                  Novoalign and novopaired will do gapped alignments and is a fair bit faster than SOAP.
                  I've just released V1.03, this update improves quality scores for novopaired and also fixes a illegal instruction fault reported by one user.
                  You can download at www.novocraft.com
                  I've also changed the license term so it's free for any non-profit even if you don't publish in open journals.
                  Colin

                  Comment


                  • #10
                    Originally posted by ECO View Post
                    Would said person be willing to share the scripts for using soap results? thanks in advance.
                    Sorry but I never noticed your message in the new posts!

                    Sure, I would be happy to share. I used the soap algorithm, and then used a parsing perl script to get the results.

                    soap -a input -d reference -o prefix -s 10 -g 4

                    The parser is modified from Liu's script (BGI). You may PM me, and I will mail that to you, but would not want to put it up here..

                    sm
                    --
                    bioinfosm

                    Comment


                    • #11
                      Originally posted by bioinfosm View Post
                      We had a sample with known 4bp deletion, but no tool would help me detect that...

                      any suggestions?

                      SSAHA supposedly does gapped alignment, but it gave me some 'novel' 1 or 2 base indels... not the one we know
                      Hi!

                      Glad to read that you managed the task. Is it from a mammalian genome? If so, would you be willing to share your data set with us ( of course NDA can be done)?
                      We would love to test our mapping on that challenge.

                      Klaus

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      45 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X