Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • no "combined/ "subdirectory after isoform level clustering by using pbtranscript-tofu

    "tofu_wrap.py" can divides input into different size bins, runs clustering on the individual bins and combines them later.The question is that there is no 'combined/' subdirectory in tofu_wrap output result.So,i can't get final output files i want to work with.
    The command i used is below:
    tofu_wrap.py --nfl_fa isoseq_nfl.fasta --ccs_fofn reads_of_insert.fofn --bas_fofn input.fofn -d clusterOut --quiver --gmap_db /zs32/data-analysis /liucy_group /llhuang/Reflib/gmapdb --gmap_name hg19 isoseq_flnc.fasta final.consensus.fa Because of my sever is high-powered single-node computer, I can't install SGE successfully and therefore don't select parameter '--use_sge'. I don't know whether this is the cause of the problem. By the way, I've already tried to add '--bin_manual' back in my command following Bowhan's advice(thank you),but it still only have no 'combined/' subdirectory in output files. There are "
    0to1kb/
    1to2kb/
    2to3kb/
    3to4kb/
    4to5kb/
    fasta_fofn_files/" in the directory clusterOut.
    Moreover, what should i do next if i run "tofu_wrap.py" successfully. I want to obtain the difference of transcripts from the third sequencing data between human and mouse.
    Any advice will be appreciated, thank you in advance!

  • #2
    can you please paste the log? and the content of the output directory (the output of `tree` command for example).

    Comment


    • #3
      Originally posted by bowhan View Post
      can you please paste the log? and the content of the output directory (the output of `tree` command for example).
      I have tried so many times, and it always presented the following error message:"Segmentation fault (core dumped)". There is no log file .
      BUG:
      Click image for larger version

Name:	bug.png
Views:	1
Size:	65.6 KB
ID:	305108
      "clusterOut "
      Click image for larger version

Name:	Cluster out.png
Views:	1
Size:	11.1 KB
ID:	305106
      all files:
      Click image for larger version

Name:	all files.png
Views:	1
Size:	17.9 KB
ID:	305107

      Comment


      • #4
        Originally posted by lingling huang View Post
        I have tried so many times, and it always presented the following error message:"Segmentation fault (core dumped)". There is no log file .
        BUG:
        [ATTACH]4392[/ATTACH]
        "clusterOut "
        [ATTACH]4390[/ATTACH]
        all files:
        [ATTACH]4391[/ATTACH]
        The job failed because the system call on the `blasr` command failed.
        It actually gave an error message complaining that "m151230.../29032/536_60_CCS" is not unique. This is weird if you didn't intervene with the Iso-Seq runs.

        Can you please check the # of appearance of this header in your input `isoseq_flnc.fa` file? perhaps with
        Code:
        grep '/29032/536_60_CCS' isoseq_flnc.fasta
        And see how many times it has appeared.

        Nonetheless, I am not sure if it has anything to do with the segmentation fault, which is usually caused by memory issue. But let's see if fixing the duplicate fasta entries can make your issue go away.

        Comment


        • #5
          two times it has appeared.

          Comment


          • #6
            Originally posted by lingling huang View Post
            two times it has appeared.
            can you please check how many of them are duplicated? perhaps with
            Code:
            awk '/>/{++a[$1]}END{for(b in a) if(a[b]>1) printf "%s\t%d\n", b, a[b]}'  isoseq_flnc.fa
            Thanks

            Comment


            • #7
              Originally posted by bowhan View Post
              can you please check how many of them are duplicated? perhaps with
              Code:
              awk '/>/{++a[$1]}END{for(b in a) if(a[b]>1) printf "%s\t%d\n", b, a[b]}'  isoseq_flnc.fa
              Thanks
              I don't understand the results
              Click image for larger version

Name:	result.png
Views:	1
Size:	120.8 KB
ID:	305110

              Comment


              • #8
                Originally posted by lingling huang View Post
                I don't understand the results
                [ATTACH]4394[/ATTACH]
                The awk command parses the input file line by line, counting how many times each header appears. At last, it prints out all the headers (with their times of appearances) if it appears more than once.

                Looks like all of your sequences are duplicated.

                Can you please check your `input.fofn` file (the one you fed into `ConsensusTools.sh CircularConsensus`) to see if each line (which is a path to a `bax.h5` file) is unique? Or you have each file appearing twice.

                Comment


                • #9
                  Originally posted by bowhan View Post
                  The awk command parses the input file line by line, counting how many times each header appears. At last, it prints out all the headers (with their times of appearances) if it appears more than once.

                  Looks like all of your sequences are duplicated.

                  Can you please check your `input.fofn` file (the one you fed into `ConsensusTools.sh CircularConsensus`) to see if each line (which is a path to a `bax.h5` file) is unique? Or you have each file appearing twice.
                  each line in my input.fofn file seems to be unique.
                  Click image for larger version

Name:	input.png
Views:	1
Size:	74.2 KB
ID:	305111

                  Comment


                  • #10
                    Originally posted by lingling huang View Post
                    each line in my input.fofn file seems to be unique.
                    [ATTACH]4396[/ATTACH]
                    You seems to have one `bax.h5` missing and one `bas.h5` there. Can you please fix that and try again?
                    You should already see the duplication after `CircularConsensus` run (no need to run pbtranscript and tofu)

                    Comment


                    • #11
                      Originally posted by bowhan View Post
                      You seems to have one `bax.h5` missing and one `bas.h5` there. Can you please fix that and try again?
                      You should already see the duplication after `CircularConsensus` run (no need to run pbtranscript and tofu)
                      I'll try it.Thank you so much. You are a great help to me!

                      Comment


                      • #12
                        Originally posted by bowhan View Post
                        You seems to have one `bax.h5` missing and one `bas.h5` there. Can you please fix that and try again?
                        You should already see the duplication after `CircularConsensus` run (no need to run pbtranscript and tofu)
                        Hi,bowhan, Sorry for disturbing you again,I have tried what you told,I fixed my input.fofn file and each line is unique. However,when running tofu_wrap.py , it appears an error again:
                        Click image for larger version

Name:	error.png
Views:	1
Size:	105.9 KB
ID:	305113

                        log files:
                        c7to372.sh.elog and c7to372.sh.olog
                        Click image for larger version

Name:	c7to372.sh.elog.png
Views:	1
Size:	32.7 KB
ID:	305114 Click image for larger version

Name:	c7to372.sh.olog.png
Views:	1
Size:	12.0 KB
ID:	305115

                        Comment


                        • #13
                          Originally posted by lingling huang View Post
                          Hi,bowhan, Sorry for disturbing you again,I have tried what you told,I fixed my input.fofn file and each line is unique. However,when running tofu_wrap.py , it appears an error again:
                          [ATTACH]4398[/ATTACH]

                          log files:
                          c7to372.sh.elog and c7to372.sh.olog
                          [ATTACH]4399[/ATTACH] [ATTACH]4400[/ATTACH]
                          It clearly had a different error now, which said that you didn't have pysam.
                          Were you following the instruction here to install tofu? It asks you to install pysam with `pip install pysam`. If not, please follow the instructions to install it. Make sure you do it under smrtshell and virtualenv.

                          Comment


                          • #14
                            Originally posted by bowhan View Post
                            It clearly had a different error now, which said that you didn't have pysam.
                            Were you following the instruction here to install tofu? It asks you to install pysam with `pip install pysam`. If not, please follow the instructions to install it. Make sure you do it under smrtshell and virtualenv.
                            HI,bowhan, how is it going? I'm going out of my mind! I came across a new problem at below.

                            Click image for larger version

Name:	read.png
Views:	1
Size:	104.2 KB
ID:	305117

                            Comment


                            • #15
                              Not sure if this problem was solved, but I almost wonder if some of the .bax.h5 files were actually missing (not just missing in input.fofn, but that they don't actually exist). This has occasionally happened before in file transfers.

                              If you are still experiencing issue, try checking the input.fofn files all exist.

                              You can do so by downloading this simple script:
                              GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.


                              Then do
                              `python file_exists.py input.fofn`

                              If all files listed exist, it would say check passed. Otherwise it will tell you which ones are missing. Remove the missing .bax.h5 files from input.fofn.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X