Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Valentina,

    I'm sorry - I didn't get your message, I would have replied sooner if I had known you'd asked a question!

    If sounds like you're trying to use the FindPeaks 3.2 instructions on FindPeaks 3.1. While there are a lot of similarities, FindPeaks 3.1 doesn't handle .map files or have the expandable parameters for -dist_type 1.

    To get FindPeaks 3.2, you might have to wait a day or two for a pre-compiled jar file (It's my plan to work on it tomorrow), or if you're impatient, there are instructions in the wiki at http://vancouvershortr.wiki.sourceforge.net

    Anthony
    The more you know, the more you know you don't know. —Aristotle

    Comment


    • #17
      Hi Kevin,

      FindPeaks 3.2 hasn't been officially released yet, but it's definitely functional, and has a lot of enhancements. Since it's being developed as GPL software on Sourceforge, you're more than welcome to download and play with it. It's very stable and is really only missing a few features before the official release.

      As I mentioned above, I will be providing pre-compiled jar files tomorrow.

      With respect to operating systems, I'm not aware of any reason that FindPeaks 3.2 wouldn't run under windows/Mac or other systems. I've used Findpeaks on Redhat and Ubuntu (and occasionally solaris) over the past year, and haven't had any issues.

      I don't have access to a windows computer, but if you'd like to tell me what errors you get on a windows pc, I'd be happy to try to fix them.

      Anthony
      The more you know, the more you know you don't know. —Aristotle

      Comment


      • #18
        I am in line for 3.2 as well
        --
        bioinfosm

        Comment


        • #19
          Ok! I have a tagged, pre-compiled version up on the web. Feel free to give it a try!

          FindPeaks runs, and should do everything that it's expected to do. Obviously, I usually run from class files instead of the built jar files, so there may be issues that way, but they can be fixed VERY quickly if you let me know of any problems.

          The more you know, the more you know you don't know. —Aristotle

          Comment


          • #20
            Hi Anthony!

            I use version 3.2 and it works fine! Thanks a lot!

            I don't know if it is still true for the latest version, I mean 3.2.2.3, but in 3.2.2(?) there was one thing missing:

            I wanted to use .map files of maq alignment. I had two duplicate experiments so I wanted to merge data. But if I just merge two .map files at one, FindPeaks seemed not to see the information from one of them..

            So I had to create files in Eland format, merge them, and then if was already fine..

            Could you check this please? Or make it possible to pass to the program several files with aligned reads?

            Thanks in advance!

            Merry Christmas!

            Valentina

            Comment


            • #21
              Hi Valentina,

              Sorry for the delay in the reply - I've been busy moving houses for the past week or so, and it's taking me a while to catch up on my correspondence.

              If you're using two .map files, you should be using the Maq mapmerge command, which combines the two (or more) files together into a single .map file. Map files are both a blessing and a curse because they're pre-sorted. If you're using a single .map file, FindPeaks will be very fast because no sorting is necessary - however, it also makes it very difficult to use multiple .map files, in case the chromosomes are in different orders, etc. It could be done, but since Maq already provides a utility for merging them, I've decided not to duplicate that functionality.

              At any rate, you shouldn't need to convert to eland format - that definitely is a step in the wrong direction.

              I'm currently in the process of deciding how to handle multiple .map files for multiple samples, so I'll take your suggestion into consideration while I work on that function.

              Anthony
              The more you know, the more you know you don't know. —Aristotle

              Comment


              • #22
                Hey Anthony,

                Yes, I missed this Maq mapmerge command,
                thanks a lot for your help!

                Happy new year!
                Valentina

                Comment


                • #23
                  Hi anthony, I have a question about how to use the FDR functions in findpeaks - below is how I am using it without FDR, just a minimum number of reads of 8

                  Code:
                   java -jar ~/analysis-utils/fp3/FindPeaks.jar -name hES_H3K27me3.fastq  -dist_type 1 200 -hist_size 50 -hist_precision 10 -eff_frac 0.8 -output fp3  -input hES_H3K27me3.fastq.map.maq -aligner maq -prepend chr -duplicatefilter -minimum 8
                  Using this command i detect ~25,000 peaks from 30M bowtie aligned reads

                  if I use -landerwaterman instead of -minimum, I obtain ~ 2M peaks and an FDR file - to filter out "false" peaks do i then choose an FDR cutoff (eg 0.01), check the FDR file for the peak height associated with that FDR for each chromosome and then remove peaks below this height? or have I missed something obvious?

                  thanks

                  Comment


                  • #24
                    Hi frozenlyse,

                    The short answer is that no, you're not missing anything obvious. There are historical reasons for why the software is run that way, relating to the pipeline users at the GSC, but it's something I've been trying to replace for a while. (See below.)

                    I hope I've understood your question correctly - let me know if what I've written misses the point.

                    The -landerwaterman flag uses (obviously) a lander-waterman algorithm to identify the cutoff based upon the ratio of height 1 and height 2 peaks that are observed. (That code was provided by another researcher at the GSC, so I'm not intimately familiar with all it's functions.) It should not, in and of itself, do any filtering at all. I would not expect it to change the number of peaks identified in the peaks file. (If that's your question, then I'm somewhat stumped, and I'd like to hear more about what's going on so I can look into it more closely.)

                    The -minimum flag is used to drop all peaks below a given height. I have the suspicion that using the -minimum flag and the -landerwaterman flag at the same time should be counter-indicated. (I believe FindPeaks tells you as much if you try to use them at the same time.) If you use -minimum, the landerwaterman will not know how many peaks were found with height 1 and 2, and should fail to operate correctly.

                    There is also a flag labeled "-iterations" which provides a similar result, but can be used for non-integer peak heights, based up on zero vs. non-zero regions, however, despite the functional implementation it hasn't had a lot of validation. (This is the FDR that I've been implementing, rather than the landerwaterman. If anyone would like to try it out and help improve it, I'd love to hear from them.) Again, it should be used without the -minimum flag.

                    What is normally done is to run once with -landerwaterman (or -iterations) to find the FDR value that you'd like, and then run a second time with -minimum to drop all peaks below this. It seems redundant, but is actually very useful for pipeline work.

                    Going back to what I mentioned above, I'm about to start on a new round of FindPeaks development in which this should be cleaned up a bit more and will finally allow an FDR value to be passed at the command line. Hopefully that will simplify the process you've outlined above. The general problem was that all of the files were being written on the fly, and findpeaks then couldn't go back into the file and modify the written out peaks after the FDR was calculated.

                    In the new development, FP will now cache the peaks and wig information, calculate the FDR, and then write out the files - so this should allow much more sophisticated processing - and get around the problem you've alluded to in your post.

                    I hope that helps.

                    Anthony
                    Last edited by apfejes; 01-07-2009, 10:41 PM. Reason: clarity and mistakes.
                    The more you know, the more you know you don't know. —Aristotle

                    Comment


                    • #25
                      Hi Anthony - that clears a lot of my confusion up

                      just one more question - using the lander-waterman FDR (for now) the FDR is estimated for each height for each chromosome - does it make sense to use a different height cutoff for each chromosome, or should i say, take the median of all the heights closest to my desired cutoff (0,01 for now)?

                      Thanks again for findpeaks,
                      Aaron

                      Comment


                      • #26
                        Hi Aaron,

                        Ok, I'm glad that made sense - it was a much longer ramble than I would usually give on SeqAnswers. (-:

                        With respect to using chromosome-by-chromosome FDRs, both options have strengths and weaknesses. I've often noticed that one or two chromosomes have FDRs that are rather unusual, e.g. for many of the samples I've looked at, Human Chr 12 seems to have a higher threshold than the other chromosomes. I think this is very specific the to the experiment being done, so I can't say what you should do.

                        If the Std Dev of the thresholds you get is very small, it's probably not a disaster to just apply one threshold to the whole genome. If you get widely different thresholds for each chr with the same FDR, you're clearly going to have to treat each chromosome separately. The point of calculating FDR separately for each chr was simply because we noticed that there are sometimes the global threshold isn't a great idea. At least this way you can evaluate whether a local or global threshold is more appropriate.

                        Cheers,

                        Anthony
                        The more you know, the more you know you don't know. —Aristotle

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        46 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X