Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fasta manipulation tools???

    Hello, everyone.
    I have a peak data that I can see on UCSC genome browser.
    I downloaded the sequences of all the peaks in fasta format and tried to align with ClustalX/W.
    but failed due to the duplicated headers (of seq ID) of the sequences.

    Can anyone tell me how to change or remove the headers for multiple alignment?
    Is there any software for that?

  • #2
    I don't know of any programs that solve that problem specifically, but you could always write some script to append an arbitrary identifier to the end of each sequence.

    Comment


    • #3
      Originally posted by mbk0asis View Post
      Can anyone tell me how to change or remove the headers for multiple alignment?
      Is there any software for that?
      For FASTA manipulation, my first port of call is FASTX-Toolkit, and then EMBOSS. A quick glance through the FASTX-Toolkit reveals a program called fastx_renamer. Here's something that should change sequence headers to a simple counter:

      Code:
      fastx_renamer -n COUNT -i input.fasta -o output.fasta

      Comment


      • #4
        Thank you very much!!!
        I will try now.

        Comment


        • #5
          Galaxy is a community-driven web-based analysis platform for life science research.


          super easy fasta manipulation. video tutorials are also very good

          It employs the above programs via a GUI

          Comment


          • #6
            Hi, all... again.
            I failed to install FastX-Toolkit on my computer (LinuxMint11 64bit - Ubuntu based).
            I am trying to find what went wrong.
            Meanwhile, what is the name of EMBOSS tool that can manipulate fasta files?

            Comment


            • #7
              Originally posted by mbk0asis View Post
              I failed to install FastX-Toolkit on my computer (LinuxMint11 64bit - Ubuntu based).
              fastx-toolkit is in Debian, so it should also be in Ubuntu and Mint:

              Code:
              aptitude install fastx-toolkit

              Comment


              • #8
                # sudo aptitude install fastx-toolkit

                No candidate version found for fastx-toolkit
                No candidate version found for fastx-toolkit
                No packages will be installed, upgraded, or removed.
                0 packages upgraded, 0 newly installed, 0 to remove and 18 not upgraded.
                Need to get 0 B of archives. After unpacking 0 B will be used.

                ???

                Comment


                • #9
                  It's in the oneiric and precise universe repositories from Ubuntu. Add one of those to your package repostiories list:

                  Comment


                  • #10
                    To completely remove the headers just use awk:
                    cat fasta.filename | awk '0 == NR % 2' > sequences_only
                    But the FASTX-toolkit is in general quite a useful utility to have, so perhaps have a look at the instructions here:
                    http://hannonlab.cshl.edu/fastx_tool...all_ubuntu.txt and here:
                    http://hannonlab.cshl.edu/fastx_toolkit/download.html

                    Hope this helps!

                    Comment


                    • #11
                      Originally posted by dvanic View Post
                      To completely remove the headers just use awk
                      Here's a perl one-liner which will replace fasta headers with the current input line number:

                      Code:
                      perl -pe 's/^>.*$/">$."/e' input.fasta > output.fasta
                      Same thing with awk:
                      Code:
                      awk '{if(/^>/){print ">"FNR} else{print $0}}' input.fasta > output.fasta

                      Comment


                      • #12
                        Finally, it worked, GRINGER.

                        I really really appreciate your help.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          Yesterday, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        58 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        45 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        55 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X