Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • .fasta and .fa are the same file?

    I'm already google but I'm not sure.
    Since in BWA tool use .fa as a reference ,and in GATK tool use .fasta .
    I already download hs37d5.fa file ,so can I use it to generate .dict and .fai file
    to use in gatk or not?

    I'm new guy in this field, please help me ,and sorry for my bad english

    Thank you,
    Silenus

  • #2
    Files ending with ".fa" and ".fasta" (and others, I've also come across ".fas" or ".fna") are typically FASTA files. You should be able to use that file with both BWA and the GATK.

    Comment


    • #3
      If in any doubt, just open the file in any ASCII text editor and look at it. The FASTA file format is about as simple a text format as you can get, and easily checked:

      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment


      • #4
        Originally posted by mbblack View Post
        If in any doubt, just open the file in any ASCII text editor and look at it. The FASTA file format is about as simple a text format as you can get, and easily checked:

        http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
        Opening 3+ GB text files (like human genome FASTA file the TO mentioned) with a standard text editor will most certainly use up all the RAM and typically render the computer inresponsive - not a very good idea! If you working in a command line environment, use "less" or something similar to look at the file.

        Comment


        • #5
          Originally posted by sarvidsson View Post
          Files ending with ".fa" and ".fasta" (and others, I've also come across ".fas" or ".fna") are typically FASTA files. You should be able to use that file with both BWA and the GATK.
          Oh I got it ,thank you sir @sarvidsson

          Comment


          • #6
            Originally posted by mbblack View Post
            If in any doubt, just open the file in any ASCII text editor and look at it. The FASTA file format is about as simple a text format as you can get, and easily checked:

            http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
            haha , by the way thanks for your comment @mbblack

            Comment


            • #7
              Originally posted by sarvidsson View Post
              Opening 3+ GB text files (like human genome FASTA file the TO mentioned) with a standard text editor will most certainly use up all the RAM and typically render the computer inresponsive - not a very good idea! If you working in a command line environment, use "less" or something similar to look at the file.
              Yes, I wasn't specifically thinking of the size of the specific file in question, but my point was that if you simply look at the first few lines of any text file you'd know whether it was in FASTA format or not. The extension is a meaningless reference, as I have had people send me files named "foo.fa" that were not at all actual FASTA files.
              Michael Black, Ph.D.
              ScitoVation LLC. RTP, N.C.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin


                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                Today, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              37 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              35 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X