Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • finding polymorphisms by large sequence alignments

    Hello all

    I just registered to SEQanswers after a friends recommendation. I work in molecular genomics since 1996, starting in livestock genomics in Barcelona and currently working in the genetics of human diseases. I am currently analysing three overlapping human BACs (between 150 and 180 kb size each and spanning a genomic region of about 340 kb) that have been sequenced with 454 tech (done abroad). The files we have correspond to i) the reads, ii) the assembled reads in contigs, and iii) the ordered contigs (by paired end tag sequencing) into scaffolds. The file types are .fna (fasta), .qual (Phred equivalent quality scores), .tsv (tab delimited file with consensus position-by-position base and flow signal info), sff (input file used in the assembly) and .ace (to be viewed by other viewer programes).

    I would like now to align these consensus sequences (all the scaffolds) to the corresponding region on the human reference genome and to other sequences with the aim to find the existing polymorphisms. From the alignment, I will extract a variation table (containing information for each polymorphism on its position, the encompassing sequence, and each allele in each sequence) as an output.

    However, I haven't seen any software that fits our requirements (export a list of polymorphisms between compared large sequences). I used several alignment engines as Geniaous, BioEdit, Clustalw, but, as expected, neither the computer nor the alignment tool are powerful enough to do such analysis.

    I see three different options to proceed:

    1) We can either write our own programe and run it in our computers. Is that the best / only way to proceed?

    2) We can download an existing programme and run it in our computers. Can I freely download a programe to do that in our servers?

    3) We can use online software where we can upload our sequences and run the analysis. Is there any specific website to do thaty?

    In case we decide options 2) or 3), does anyone know about any programe that would align (both pairwise and / or multi-alignment) such large sequences and would also give a list of the polymorphisms.

    Thanks in advance

    Kindest regards

    Alex Clop

    Molecular and Medical Genetics
    9th Floor Guy's Tower
    KCL
    St Thomas' St
    London SE1 9RT
    UK

    Tel.: +44(0)20 7188 9505
    Fax: +44(0)20 7188 8050
    e-mail: [email protected]

  • #2
    I think MAQ fits your list of requirements, thought you may need to do some file conversion. I normally work with Illumina data, rather than 454, so I've never tried it with the file types you've listed above.

    MAQ can be found at http://maq.sourceforge.net/
    Last edited by apfejes; 09-12-2008, 11:35 AM. Reason: typo
    The more you know, the more you know you don't know. —Aristotle

    Comment


    • #3
      MAQ might not handle read lengths above 60bp. So probably tools from MarthLab -http://bioinformatics.bc.edu/marthlab/Main_Page - will work best

      Also, 454 has its own software newbler or something that should help. It gives a list of SNP calls as well - HCDiffs.txt
      --
      bioinfosm

      Comment


      • #4
        Thanks for pointing that out. I had forgotten about the 60bp limit in MAQ.
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #5
          re: finding polymorphisms by large sequence alignments

          Thanks for your replies.

          Yes, there is Newbler which is supplied by Roche with the 454 machine. We do not have this facility here so don't have access to the software. We are thinking about developpin the tool ourselves, but honestly, I am not a bioinformatician and do not know how easy it is.

          Comment


          • #6
            Hi Alex

            Before you start developing I would look at Mummer. This seems to do at least part of what you need.



            I don t know if it will give you enough information about polymorphisms but perhaps you can use it as a starting point.

            Cheers
            Colin

            Comment


            • #7
              Dear Colin

              Thanks very much for the info. I will have a look now.

              I am currently testing the CLC Genomics Workbench from CLC Bio. It seems quite good up to now, but I am still testing it.

              Best regards

              Alex

              Comment


              • #8
                If possible, ask your sequencing provider to run GS Reference mapper with your data. If it is not posible try nucmer+show-snps from the Mummer package

                Comment


                • #9
                  Alex,

                  Please use NextGene from Softgenetics to detect SNP and link to NCBI dbSNP database. I will supply a demo copy for you to use.
                  Josliu

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  7 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  7 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  66 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X