Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SplitSeek - de novo detection of splice junctions

    Hi,

    We have been working on a new method for de novo detection of splice junctions in RNA-seq data, called SplitSeek, which was published today in Genome Biology:



    The code can be downloaded from:



    Hope you like it!

    Adam Ameur

  • #2
    Hi Adam,

    After checking out the paper and the documentation, I assume that the length of the two anchors in "split_read_mapper" should be the same. Am I right? Does it make sense to use different anchors for the left and right splits? (I am asking this because by default "split_read_mapper" suggest you to use 25 for the left split and 30 for the right split)

    Thanks

    Comment


    • #3
      Hi fennan,

      Yes, I think it makes sense to have same length of the two anchors for this application.

      Adam

      Comment


      • #4
        hi Adam,

        I am trying to use splitseek for novel junction detection. After running split_read_mapper I get the following error


        SEVERE: Schema file /WT/ab_wtp_v1.2.1/etc/schemas/schema_18_2_adj could not be found.
        java.lang.Exception: Schema file /WT/ab_wtp_v1.2.1/etc/schemas/schema_18_2_adj could not be found.
        at com.lifetechnologies.solid.wt.ReadMapper.startMapReadsJob(ReadMapper.java:129)
        at com.lifetechnologies.solid.wt.mapper.MappingTask.doTask(MappingTask.java:95)
        at com.lifetechnologies.solid.wt.mapper.Mapper.mapReads(Mapper.java:462)
        at com.lifetechnologies.solid.wt.cmd.MapperCmd.runCmd(MapperCmd.java:245)
        at com.lifetechnologies.solid.wt.cmd.AbstractCmd.run(AbstractCmd.java:221)
        at com.lifetechnologies.solid.wt.cmd.CmdRunner.main(CmdRunner.java:20)

        Any ideas. Thanks.

        Comment


        • #5
          I got into similar problem when running split_read_mapper. Due to the "not-so-great" documentation it took me a while until I saw where the problem was.

          The thing is that the basic installation of this software does not provide the whole set of schemas that might be necessary during the execution time. You are probably using a value of 18 for the wt.mapper.split.left.length and/or wt.mapper.split.right.length. This, combined with the 2 mismatches that the program allows by default, makes the algorithm to look for the schema "schema_18_2". I do have that file in the "etc/schemas/" folder but for some reason I don't know, sometimes there is an "extra" need of a "schema_*_*_adj". Maybe someone here can shed light on this.

          Looking into the REALESE_NOTES file:
          'Mapping tasks will fail if user specifies a mapping configuration for which there is no corresponding schema file in etc/schemas. Is this case, the user will receive an error like: "Schema file schema_##_# could not be found". Missing schemas may be available through support.'

          I have contacted the official SOLiD support since I needed "schema_31_2" but after almost 2 weeks I did not get an answer... Let me know if you get luckier.

          I hope this helps!

          Comment


          • #6
            Thanks Fennan. I am running with split read length 25 now. It is taking forever to run it. The first time it was at extension phase (after 2 days). I stopped it and I am running it again. Hopefully this time it will finish running. Do you know how long it takes to run it?
            I will be contacting ABi for the other schemas once I know I can run the program. I will let you know if something worthwhile happens.

            Thanks again.

            Comment


            • #7
              Hi anjana,

              It seems you're going through the same problems I had two weeks ago! Check this thread, as it might help you: http://seqanswers.com/forums/showthread.php?t=4898

              It also happened to me that the mapping wouldn't finish. Again, the lack of documentation and support made it so hard to see where the problem was.

              The main problem is that the main program "does not always realize" that a job has failed and it keeps on waiting for it to finish. Checking the mapper.log file and the intermediate output files that are placed in the "output/tmp/" folder might help you. My problems were mainly two:

              1) RAM memory: if there are too many jobs in the same node, it will collapse.
              2) Hard drive space: the intermediate files are VERY big, so make sure you have enough space for it.

              In addition, in order to reduce the execution time it is very important that you have the input files in the local hard drives of your nodes, so the network file system it is not constantly used.

              Good luck!

              Comment


              • #8
                Hi anjana.vr,
                I had the same error as you said, so I want to know whether you solve it now, and would you like to tell me how??
                Thank you for your help

                Comment


                • #9
                  Hi whfwind,

                  The problem was that the schema 18_2 was not present. I did ask the ABI FAs for that schema but then I realized that I only needed 18_1 schema.

                  Check in the schema directory and see if you have the offending schema file. If you dont you should probably ask your ABi FAS.

                  Comment


                  • #10
                    I just revise the parameter of the length of first part and second part split of one read,
                    and give me the error did not finad schema_18_2_adj, I think I really need this file for my program running.
                    but I donot know how to generate this file by myself
                    Thank you

                    Comment


                    • #11
                      Hi fennan,
                      Now I need the schema_18_2_adj, So I want to know how can you solve your problem, and I hope you can help me.

                      Thank you !

                      Comment


                      • #12
                        Hi!

                        Sorry for the late reply.. One think that might work is to change the 'valid adjacent mismatches' parameter in the config file:

                        -----------------

                        ## whether or not to count valid adjacent mismatches as one mismatch, instead of two
                        VALID_ADJACENT_MISMATCHES_COUNT_AS_ONE_FOR_MAPPING_AND_EXTENSION

                        -----------------

                        I guess that 'schema_18_2' will then be used instead of 'schema_18_2_adj' and maybe that works better for you.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 12:17 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-29-2024, 10:49 AM
                        0 responses
                        18 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-25-2024, 11:49 AM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-24-2024, 08:47 AM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X