Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Orthomcl running problem

    Hi there,
    I was trying to run orthomcl in my linux workstation. I am facing this problem:
    Code:
    [root@genomics bin]# ./orthomcl-pipeline -i /home/zillur/Desktop/zillur/phd/orthomcl -o /home/zillur/Desktop/zillur/phd/orthomcl/output -m /usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example --nocompliant
    Warning: directory "/home/zillur/Desktop/zillur/phd/orthomcl/output" already exists, are you sure you want to store data here [Y]? y
    Starting OrthoMCL pipeline on: Mon Sep 26 20:11:08 2016
    Git commit: unknown
    
    =Stage 1: Validate Files =
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta ... 5076 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta ... 5217 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta ... 5542 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta ... 3 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta ... 5323 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta ... 5586 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta ... 5709 sequences
    Validated 7 files
    Stage 1 took 0.02 minutes 
    
    =Stage 2: Validate Database=
    Stage 2 took 0.00 minutes 
    
    
    =Stage 3: Load OrthoMCL Database Schema=
    /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    Error executing command: /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log. See logs /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log and /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    The log is as follows:
    Code:
    [zillur@genomics log]$ more 3.loadschema.stderr.log 
    Can't locate OrthoMCLEngine/Main/Base.pm in @INC (@INC contains: /usr/bin/../lib/perl /root/perl5/lib/perl5/x86_64-linux-thread-multi /root/perl5/lib/perl5 /h
    ome/zillur/perl5/lib/perl5/x86_64-linux-thread-multi /home/zillur/perl5/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /
    usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/orthomclInstallSchema line 6.
    BEGIN failed--compilation aborted at /usr/bin/orthomclInstallSchema line 6.
    Any suggestions please.

    Best Regards
    Zillur

  • #2
    Is there anybody has any idea? Please. I appreciate your helps.

    Best Regards
    Zillur

    Comment


    • #3
      Hi

      It looks like you need to set your PERL5LIB environment variable so that it points to where your orthomcl perl files are. Something like this:
      export PERL5LIB=/path/to/orthomcl.

      One suggestion though, have you tried OrthoFinder? It's far easier to run, it just requires a single command. It's also a lot more accurate than OrthoMCL:
      Phylogenetic orthology inference for comparative genomics - davidemms/OrthoFinder


      David

      Comment


      • #4
        Thank you very much for your suggestions. Yeah. I have tried orthofinder and it gave me outputs. I wanted to run orthomcl to compare, maybe its not necessary now. Do you have any suggestions how can I process the outputs to get a gene presence/absence matrix?

        Thank you again.

        Best Regards
        Zillur

        Comment


        • #5
          The file Orthogroups.csv is effectively a presence/absence matrix: The rows are orthogroups and the columns are species so if there are any genes listed in the i,j-th cell then the ith orthogroup is present in the jth species.

          All the best
          David

          Comment


          • #6
            Thank you very much for your comment. I want a matrix like:

            Code:
                          genome1	genome2 genome3
            gene1  	 1     	 0     	 0
            gene2  	 0     	 0     	 0
            gene3  	 1     	 1     	 1
            gene4  	 0     	 0     	 1
            How can I do this?

            Best Regards
            Zillur

            Comment


            • #7
              You'd just need to replace empty cells with 0 and cells with text in with 1.

              All the best
              David

              Comment


              • #8
                Thank you very much for your reply.
                You'd just need to replace empty cells with 0 and cells with text in with 1.
                Exactly I want to do this. But how can replace this?

                Thanks for your suggestions.
                Best Regards
                Zillur

                Comment


                • #9
                  This is a python script that will do it for you:


                  Code:
                  import sys
                  import csv
                  
                  if len(sys.argv) != 2:
                      print("Usage: python presence_absence.py Orthogroups.csv")
                      sys.exit()
                  
                  inFN = sys.argv[1]
                  outFN = inFN + ".01_matrix.csv"
                  with open(inFN, 'rb') as infile, open(outFN, 'wb') as outfile:
                      reader = csv.reader(infile, delimiter="\t")
                      writer = csv.writer(outfile, delimiter="\t")
                      writer.writerow(reader.next())
                      for line in reader:
                          writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                  All the best
                  David

                  Comment


                  • #10
                    Thank you very much for your script. I was trying to run, but:
                    Code:
                    [zillur@genomics Results_Sep26]$ python matrix_convert_binary.py Orthogroups.csv
                    Traceback (most recent call last):
                      File "matrix_convert_binary.py", line 14, in <module>
                        writer.writerow(reader.next())
                    AttributeError: '_csv.reader' object has no attribute 'next'
                    My system is:
                    Code:
                    [zillur@genomics Results_Sep26]$ python
                    Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) 
                    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
                    Type "help", "copyright", "credits" or "license" for more information.
                    I am not sure what I need to modify. Any idea?
                    Thanks again.

                    Best Regards
                    Zillur

                    Comment


                    • #11
                      It was written for python 2, below is a version which will work with both python 2 and 3:

                      Code:
                      import sys
                      import csv
                      
                      if len(sys.argv) != 2:
                          print("Usage: python presence_absence.py Orthogroups.csv")
                          sys.exit()
                      
                      inFN = sys.argv[1]
                      outFN = inFN + ".01_matrix.csv"
                      with open(inFN, 'r') as infile, open(outFN, 'w') as outfile:
                          reader = csv.reader(infile, delimiter="\t")
                          writer = csv.writer(outfile, delimiter="\t")
                          writer.writerow(next(reader))
                          for line in reader:
                              writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                      Comment


                      • #12
                        Thank you very much for your valuable suggestions. The code perfectly converted the matrix into a binary matrix. But the problem is I can't load the new csv file in R as it is:
                        Code:
                        [zillur@genomics Results_Sep26]$ head Orthogroups.csv.01_matrix.csv 
                        	PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta	PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta	PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta	PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta	PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta	PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta	PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta
                        OG0000000	1	1	0	0	0	0	1
                        OG0000001	1	1	1	0	1	1	1
                        OG0000002	0	0	0	0	0	1	0
                        OG0000003	0	0	0	0	1	1	0
                        OG0000004	1	1	0	0	0	0	1
                        OG0000005	0	0	0	0	1	0	0
                        OG0000006	1	1	0	0	0	0	1
                        OG0000007	1	1	1	0	1	1	1
                        OG0000008	0	0	1	0	0	0	0

                        But when I load the csv in R, it looks like:

                        Code:
                        > data = read.csv("Orthogroups.csv.01_matrix.csv", sep=",")
                        > head(data)
                          PlasmoDB.28_PbergheiANKA_AnnotatedProteins.fasta.PlasmoDB.28_Pchabaudichabaudi_AnnotatedProteins.fasta.PlasmoDB.28_Pfalciparum3D7_AnnotatedProteins.fasta.PlasmoDB.28_Pgallinaceum8A_AnnotatedProteins.fasta.PlasmoDB.28_PknowlesiH_AnnotatedProteins.fast ...
                        1                                                                                                                                                                                                                                 OG0000000\t1\t1\t0\t0\t0\t0\t1
                        2                                                                                                                                                                                                                                 OG0000001\t1\t1\t1\t0\t1\t1\t1
                        3                                                                                                                                                                                                                                 OG0000002\t0\t0\t0\t0\t0\t1\t0
                        4                                                                                                                                                                                                                                 OG0000003\t0\t0\t0\t0\t1\t1\t0
                        5                                                                                                                                                                                                                                 OG0000004\t1\t1\t0\t0\t0\t0\t1
                        6                                                                                                                                                                                                                                 OG0000005\t0\t0\t0\t0\t1\t0\t0
                        What should I do now?
                        Thanks again for your help and comment.

                        Best Regards
                        Zillur

                        Comment


                        • #13
                          It's a tab-delimited file, try this instead:
                          data = read.csv("Orthogroups.csv.01_matrix.csv", sep="\t")

                          Comment


                          • #14
                            Thank you very much. Got it.

                            Best Regards
                            Zillur

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            12 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            51 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            68 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X