Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running 454 mapper by command line CLI

    Hello everybody,

    I am a total novice in linux trying to figure out the syntax to run by CLI and not knowing how to specify the annotation and target sequence file. I know how to start a mapping project and pull in reference and data files (ie runMapping referencepath [email protected]) but don't know how to specify and where to put snp130.txt, geneRef.txt, or my targetsequences.giff files. I know that this must be a pretty basic question for a bionformatician. Many thanks

    Jeca

  • #2
    You need to set the GOLDENPATH environmental variable to point to the directory containing those files. See the documentation. Play around a bit.

    Comment


    • #3
      I have met the same problem.

      I have put the genomic sequences under "chromosomes" directory, and the annotation files, refGene.txt and snp131.txt in the neighboring folder called "database". According to the document from Roche, the annotation files could be found this way. But I failed to let it work.

      Moreover, I have set the parent directory of the above directories, hg19, as the "GOLDENPATH":
      echo $GOLDENPATH
      /home/sulicon/data/hg19

      It didn't work neither...

      I have also tried to set "GOLDENPATH" as "/home/sulicon/data", and used "hg19", as the name of reference genome that would be used. Unfortunately, gsMapper wasn't able to recognize the folder structure..

      Any suggestion is appreciated.

      Comment


      • #4
        The GOLDENPATH is pointing to "/home/sulicon/data" which contains a subfolder "hg19" which contains the subfolders "chromosomes" (single fa files) and "database" (containing snp131.txt, refLink.txt, refGene.txt and productName.txt), correct?

        And how did you start the mapper?

        Minimal example (aasuming EST data as input):
        $ runMapping -cdna -gref hg19 READS.sff

        This should work ...

        Sven

        Comment


        • #5
          Hi Sven,

          Thanks very much! I have tried what you said:

          $ echo $GOLDENPATH
          /home/shuli/data
          $ ls /home/sulicon/data/hg19
          chromosomes database
          $ runMapping -cdna -gref hg19 /path/to/reads/reads.sff
          Error: Reference file/directory does not exist: hg19

          I have noticed you mentioned "single fa files" should be put into the "chromosomes" folder, whereas I have put fasta files each corresponding to a chromosome. Maybe this is the problem? Will have a try later on...

          Comment


          • #6
            Originally posted by sulicon View Post
            Hi Sven,

            Thanks very much! I have tried what you said:

            $ echo $GOLDENPATH
            /home/shuli/data
            $ ls /home/sulicon/data/hg19
            chromosomes database
            $ runMapping -cdna -gref hg19 /path/to/reads/reads.sff
            Error: Reference file/directory does not exist: hg19

            I have noticed you mentioned "single fa files" should be put into the "chromosomes" folder, whereas I have put fasta files each corresponding to a chromosome. Maybe this is the problem? Will have a try later on...

            1) This is how my (probably too fat) UCSC dir tree looks like,


            2) you set GOLDENPATH to /home/shuli/data and were using a different path to store the data /home/sulicon/data. You are telling gsMapper to look in /home/shuli/data/hg19 which is probably not the correct dir ..

            hth, Sven

            Comment


            • #7
              Thanks again.
              The GOLDENPATH variable is corrected now but the reference seq still can't be recognized...

              The following is the structure of my hg19 directory. It looks similar with yours.
              Code:
              $ tree hg19
              hg19
              |-- chromosomes
              |   |-- chr1.fa
              |   |-- chr10.fa
              |   |-- chr11.fa
              |   |-- chr11_gl000202_random.fa
              |   |-- chr12.fa
              |   |-- chr13.fa
              |   |-- chr14.fa
              |   |-- chr15.fa
              |   |-- chr16.fa
              |   |-- chr17.fa
              |   |-- chr17_ctg5_hap1.fa
              |   |-- chr17_gl000203_random.fa
              |   |-- chr17_gl000204_random.fa
              |   |-- chr17_gl000205_random.fa
              |   |-- chr17_gl000206_random.fa
              |   |-- chr18.fa
              |   |-- chr18_gl000207_random.fa
              |   |-- chr19.fa
              |   |-- chr19_gl000208_random.fa
              |   |-- chr19_gl000209_random.fa
              |   |-- chr1_gl000191_random.fa
              |   |-- chr1_gl000192_random.fa
              |   |-- chr2.fa
              |   |-- chr20.fa
              |   |-- chr21.fa
              |   |-- chr21_gl000210_random.fa
              |   |-- chr22.fa
              |   |-- chr3.fa
              |   |-- chr4.fa
              |   |-- chr4_ctg9_hap1.fa
              |   |-- chr4_gl000193_random.fa
              |   |-- chr4_gl000194_random.fa
              |   |-- chr5.fa
              |   |-- chr6.fa
              |   |-- chr6_apd_hap1.fa
              |   |-- chr6_cox_hap2.fa
              |   |-- chr6_dbb_hap3.fa
              |   |-- chr6_mann_hap4.fa
              |   |-- chr6_mcf_hap5.fa
              |   |-- chr6_qbl_hap6.fa
              |   |-- chr6_ssto_hap7.fa
              |   |-- chr7.fa
              |   |-- chr7_gl000195_random.fa
              |   |-- chr8.fa
              |   |-- chr8_gl000196_random.fa
              |   |-- chr8_gl000197_random.fa
              |   |-- chr9.fa
              |   |-- chr9_gl000198_random.fa
              |   |-- chr9_gl000199_random.fa
              |   |-- chr9_gl000200_random.fa
              |   |-- chr9_gl000201_random.fa
              |   |-- chrM.fa
              |   |-- chrUn_gl000211.fa
              |   |-- chrUn_gl000212.fa
              |   |-- chrUn_gl000213.fa
              |   |-- chrUn_gl000214.fa
              |   |-- chrUn_gl000215.fa
              |   |-- chrUn_gl000216.fa
              |   |-- chrUn_gl000217.fa
              |   |-- chrUn_gl000218.fa
              |   |-- chrUn_gl000219.fa
              |   |-- chrUn_gl000220.fa
              |   |-- chrUn_gl000221.fa
              |   |-- chrUn_gl000222.fa
              |   |-- chrUn_gl000223.fa
              |   |-- chrUn_gl000224.fa
              |   |-- chrUn_gl000225.fa
              |   |-- chrUn_gl000226.fa
              |   |-- chrUn_gl000227.fa
              |   |-- chrUn_gl000228.fa
              |   |-- chrUn_gl000229.fa
              |   |-- chrUn_gl000230.fa
              |   |-- chrUn_gl000231.fa
              |   |-- chrUn_gl000232.fa
              |   |-- chrUn_gl000233.fa
              |   |-- chrUn_gl000234.fa
              |   |-- chrUn_gl000235.fa
              |   |-- chrUn_gl000236.fa
              |   |-- chrUn_gl000237.fa
              |   |-- chrUn_gl000238.fa
              |   |-- chrUn_gl000239.fa
              |   |-- chrUn_gl000240.fa
              |   |-- chrUn_gl000241.fa
              |   |-- chrUn_gl000242.fa
              |   |-- chrUn_gl000243.fa
              |   |-- chrUn_gl000244.fa
              |   |-- chrUn_gl000245.fa
              |   |-- chrUn_gl000246.fa
              |   |-- chrUn_gl000247.fa
              |   |-- chrUn_gl000248.fa
              |   |-- chrUn_gl000249.fa
              |   |-- chrX.fa
              |   |-- chrY.fa
              |   `-- chromFa.tar.gz
              `-- database
                  |-- refGene.txt
                  |-- refLink.txt
                  `-- snp131.txt

              Comment


              • #8
                what about "productName.txt"?

                Comment


                • #9
                  I don't have this file. Is it required? And the problem is that even the reference genome can't be recognized:
                  "Error: Reference file/directory does not exist: hg19"

                  Maybe I need the "bigZips" folder as you did?

                  Comment


                  • #10
                    As Roche stated in their manual that the suite recognizes the UCSC directory structure I went the lazy way, I just used the whole tree. I have not really tested which files/folder can be omitted ..

                    Though it is very strange that you get an error message stating that the file has not been found. .. it sounds as if there is still a "mismatch" between the GOLDENPATH path and the actual data location ..

                    Maybe it is best to try the whole tree and (if you are patient) remove the parts not necessary for your mapping (but probably it is not worth removing files).

                    Comment


                    • #11
                      It turns out that the reason for this is I've forgot to 'export' the GOLDENPATH variable... Everything is OK now.

                      Comment


                      • #12
                        Hello everybody

                        I want to achieve a mapping of reads 454 on a genome with a threshold of 5 or 10 reads to the formation of consensus. This setting is in the GUI version of GSMapper "Minimum contig depht" but I can not find it in the CLI version ofGSMapper.

                        This parameter is there in the CLI version?

                        Thank you for your help

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin


                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                          Yesterday, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        39 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        41 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        35 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        55 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X