Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get hg19.fa?

    I am trying to download a reference genome hg19 from UCSC site.
    I tried to convert hg19.2bit to hg19.fa by twoBitToFa on UCSC tools.
    It said "cannot execute binary file".
    then I tried
    "cat chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa >hg19.fa"
    But when I used this hg19.fa in bwa, it said
    "[bwa_index] fail to open file 'hg19.fa'. Abort!
    Aborted"

    I am still not able to get any reference sequence build.

    Note: I am using Linux and am a beginner.
    Last edited by ninad; 08-18-2011, 09:51 PM.

  • #2
    I actually did the exact same command for generating my hg19.fa file and it worked perfectly...
    Are you sure you put the right path of the file as argument for bwa?

    Comment


    • #3
      Yes, in fact I am in the same directory. Here is the command,
      "bwa index -a bwtsw -p hg19_bwa hg19.fa "

      Otherwise, have you used the twoBitToFa of UCSC?
      Is there any other source for this file?

      Comment


      • #4
        Just a recommendation, I would not use the "alternative name prefix" option -p . It helps to keep it simple. Also, in the following commands you will typically have to specify the base file (hg19.fa) which then points at all the other aligner-specific indices (in the same directory).

        So this should be sufficient:

        Code:
        bwa index -a bwtsw hg19.fa

        Comment


        • #5
          Thanks sdvie for the suggestion.

          Some1 please help asap. Can anyone upload a genome on rapidshare and share the link? I mean sounds ridiculous, but is there any other sophisticated way?

          Or please help about twoBitToFa usage.

          Comment


          • #6
            You can get the utility program TwoBitToFa from here:

            http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

            Once you downloaded it, you must change permissions first to allow it to be executed as a program.

            Then you execute it from a terminal:

            without arguments to see the options:

            Code:
            $ /path/to/twoBitToFa
            
            twoBitToFa - Convert all or part of .2bit file to fasta
            usage:
               twoBitToFa input.2bit output.fa
            options:
               -seq=name - restrict this to just one sequence
               -start=X  - start at given position in sequence (zero-based)
               -end=X - end at given position in sequence (non-inclusive)
               -seqList=file - file containing list of the desired sequence names 
                                in the format seqSpec[:start-end], e.g. chr1 or chr1:0-189
                                where coordinates are half-open zero-based, i.e. [start,end)
               -noMask - convert sequence to all upper case
               -bpt=index.bpt - use bpt index instead of built in one
               -bed=input.bed - grab sequences specified by input.bed. Will exclude introns
            
            Sequence and range may also be specified as part of the input
            file name using the syntax:
                  /path/input.2bit:name
               or
                  /path/input.2bit:name
               or
                  /path/input.2bit:name:start-end
            You will only need to execute the simple command:

            Code:
            $ /path/to/twoBitToFa /path/to/hg19.2bit /path/to/hg19.fa
            Good luck.

            Comment


            • #7
              Thanks sdvie. But I had already gone through these steps. unfortunately, my linux is i686 and not x86_64.
              Thats why it could not execute the binary I suppose.

              Now, I only have the option of "cat chr*.fa", which did not work.
              I am still stuck to obtain human reference genome hg19!!!

              Comment


              • #8
                Just download it from here:
                hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
                them
                tar -zxvf chromFa.tar.gz
                and
                cat chr*.fa > hg19.fa
                Done!
                Last edited by raonyguimaraes; 08-19-2011, 01:58 AM.

                Comment


                • #9
                  Thanks raonyguimaraes,
                  but I already tried that and it is still giving the same problem.
                  ~/chromFa$ cat *.fa >hg19s.fa
                  command:
                  ~/chromFa$ bwa index -a bwtsw hg19s.fa
                  Then this was output:
                  [bwa_index] fail to open file 'hg19s.fa'. Abort!
                  Aborted
                  This is not working right....

                  Comment


                  • #10
                    Found a similar question http://seqanswers.com/forums/archive...hp/t-5236.html

                    I have no idea whats going on... Check the version of your bwa

                    Comment


                    • #11
                      unprobable, but: enough memory available?

                      Comment


                      • #12
                        Could be http://seqanswers.com/forums/archive...p/t-10766.html

                        Try to use a small file, index only the chr1.fa

                        Comment


                        • #13
                          I tried to reproduce the error but it worked perfectly with new downloaded chr*.fa files...

                          Is it possible that you have no read permissions on the file?
                          My bwa version is 0.5.9-r16

                          Besides that I can't think of anything else...

                          Comment


                          • #14
                            I went through this even earlier, but I am not getting segmentatioon fault. I also have the latest version of bwa 0.5.7.

                            I am clueless too.

                            Comment


                            • #15
                              @peter
                              I will try using the mentioned specifications by you Peter. my bwa is 0.5.7 (r1310)

                              @raonyguimaraes-
                              I have tried using chr1.fa file and it works perfectly fine.
                              My hg19.fa file is 3 GB big. So considering the bwa indexing limit of 4 GB, it should still work.

                              I have checked permissions for the file, they are perfectly fine.
                              I believe the problem is either in concatenation or size limit. I have used cat in both above mentioned ways and its now resolved.
                              I dont know what can be any other problem.

                              @sdvie -How to check whether enough memory is available or not?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 11:49 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              61 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X