Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Align multiple sequences in tabular or fasta format

    Hi Folks,

    I have ~100,000 short sequences (~25bp long) in fasta format. They are oligo probes used in affymetrix mouse 430-2 chip. I want to align all the sequences with mm9 genomic database to get either GFF or BED output. Can anyone suggest a good web- or windows-based tool for this purpose?

    The following is an example of the first probe, thanks!

    >probe:Mouse430_2:1415670_at:269:753; Interrogation_Position=2436; Antisense;
    GGCTGATCACATCCAAAAAGTCATG

  • #2
    There a several short read aligner for this purpose :

    - Bowtie
    - Soap2
    - BWA
    - Novoalign
    - ...

    Comment


    • #3
      For online based, I have seen Galaxy which i think would be good option since your dataset it small.

      Comment


      • #4
        Thanks to NicoBxl and husamia.

        Still trying to understand how to install bowtie in windows....

        I did tried galaxy using my fasta files. It turned out in error "reads file does not look like a FASTQ file." Galaxy requires 2 more columns (strandness and quality score) to run the alignment. However, it is not working even I tried to add 2 dummy columns and change the file identity from FASTA to FASTQ.

        Does anybody know how to run alignment without going through FASTQ requirement on galaxy? Thanks a million!

        Comment


        • #5
          Write a simple PERL script to convert your FASTA format into a FASTQ format.
          Then run bowtie to do the alignment.

          Comment


          • #6
            Galaxy should auto detect your format, and it should be able to take up fasta formats. If it is spitting out a fastq related error, make sure you are uploading with the correct options.
            Otherwise, the headers to your fasta file may be causing problems? Not sure if you can use wordpad or some other program in windows to change the headers to something simpler if you aren't familiar with command line.
            There are windows large text file editor programs such as 'gVim', or google for one.

            Comment


            • #7
              Originally posted by Kennels View Post
              There are windows large text file editor programs such as 'gVim', or google for one.
              anybody has experience with opening large text files such as fasta in windows? I usually like to use search and replace function alot what are some good editors for large files ~12GB
              I know this is huge file but I wonder if there anybody know of editor that responsibly handles such files without hogging up memory or crashing.

              Comment


              • #8
                Turned out working by aligning using bowtie! Thank you everyone for your suggestions.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                31 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X