Seqanswers Leaderboard Ad

**apfejes** · 11-13-2008, 12:22 PM

Hi Valentina,

I'm sorry - I didn't get your message, I would have replied sooner if I had known you'd asked a question!

If sounds like you're trying to use the FindPeaks 3.2 instructions on FindPeaks 3.1. While there are a lot of similarities, FindPeaks 3.1 doesn't handle .map files or have the expandable parameters for -dist_type 1.

To get FindPeaks 3.2, you might have to wait a day or two for a pre-compiled jar file (It's my plan to work on it tomorrow), or if you're impatient, there are instructions in the wiki at http://vancouvershortr.wiki.sourceforge.net

Anthony

**apfejes** · 11-13-2008, 12:27 PM

Hi Kevin,

FindPeaks 3.2 hasn't been officially released yet, but it's definitely functional, and has a lot of enhancements. Since it's being developed as GPL software on Sourceforge, you're more than welcome to download and play with it. It's very stable and is really only missing a few features before the official release.

As I mentioned above, I will be providing pre-compiled jar files tomorrow.

With respect to operating systems, I'm not aware of any reason that FindPeaks 3.2 wouldn't run under windows/Mac or other systems. I've used Findpeaks on Redhat and Ubuntu (and occasionally solaris) over the past year, and haven't had any issues.

I don't have access to a windows computer, but if you'd like to tell me what errors you get on a windows pc, I'd be happy to try to fix them.

Anthony

**bioinfosm** · 11-13-2008, 01:59 PM

I am in line for 3.2 as well

**apfejes** · 11-15-2008, 01:27 PM

Ok! I have a tagged, pre-compiled version up on the web. Feel free to give it a try!

FindPeaks runs, and should do everything that it's expected to do. Obviously, I usually run from class files instead of the built jar files, so there may be issues that way, but they can be fixed VERY quickly if you let me know of any problems.

http://sourceforge.net/project/platformdownload.php?group_id=232586

**valeu** · 12-24-2008, 05:26 AM

Hi Anthony!

I use version 3.2 and it works fine! Thanks a lot!

I don't know if it is still true for the latest version, I mean 3.2.2.3, but in 3.2.2(?) there was one thing missing:

I wanted to use .map files of maq alignment. I had two duplicate experiments so I wanted to merge data. But if I just merge two .map files at one, FindPeaks seemed not to see the information from one of them..

So I had to create files in Eland format, merge them, and then if was already fine..

Could you check this please? Or make it possible to pass to the program several files with aligned reads?

Thanks in advance!

Merry Christmas!

Valentina

**apfejes** · 01-02-2009, 09:15 AM

Hi Valentina,

Sorry for the delay in the reply - I've been busy moving houses for the past week or so, and it's taking me a while to catch up on my correspondence.

If you're using two .map files, you should be using the Maq mapmerge command, which combines the two (or more) files together into a single .map file. Map files are both a blessing and a curse because they're pre-sorted. If you're using a single .map file, FindPeaks will be very fast because no sorting is necessary - however, it also makes it very difficult to use multiple .map files, in case the chromosomes are in different orders, etc. It could be done, but since Maq already provides a utility for merging them, I've decided not to duplicate that functionality.

At any rate, you shouldn't need to convert to eland format - that definitely is a step in the wrong direction.

I'm currently in the process of deciding how to handle multiple .map files for multiple samples, so I'll take your suggestion into consideration while I work on that function.

Anthony

**valeu** · 01-05-2009, 02:12 AM

Hey Anthony,

Yes, I missed this Maq mapmerge command,
thanks a lot for your help!

Happy new year!
Valentina

**frozenlyse** · 01-07-2009, 09:39 PM

Hi anthony, I have a question about how to use the FDR functions in findpeaks - below is how I am using it without FDR, just a minimum number of reads of 8

Code:

 java -jar ~/analysis-utils/fp3/FindPeaks.jar -name hES_H3K27me3.fastq  -dist_type 1 200 -hist_size 50 -hist_precision 10 -eff_frac 0.8 -output fp3  -input hES_H3K27me3.fastq.map.maq -aligner maq -prepend chr -duplicatefilter -minimum 8

Using this command i detect ~25,000 peaks from 30M bowtie aligned reads

if I use -landerwaterman instead of -minimum, I obtain ~ 2M peaks and an FDR file - to filter out "false" peaks do i then choose an FDR cutoff (eg 0.01), check the FDR file for the peak height associated with that FDR for each chromosome and then remove peaks below this height? or have I missed something obvious?

thanks

**apfejes** · 01-07-2009, 10:40 PM

Hi frozenlyse,

The short answer is that no, you're not missing anything obvious. There are historical reasons for why the software is run that way, relating to the pipeline users at the GSC, but it's something I've been trying to replace for a while. (See below.)

I hope I've understood your question correctly - let me know if what I've written misses the point.

The -landerwaterman flag uses (obviously) a lander-waterman algorithm to identify the cutoff based upon the ratio of height 1 and height 2 peaks that are observed. (That code was provided by another researcher at the GSC, so I'm not intimately familiar with all it's functions.) It should not, in and of itself, do any filtering at all. I would not expect it to change the number of peaks identified in the peaks file. (If that's your question, then I'm somewhat stumped, and I'd like to hear more about what's going on so I can look into it more closely.)

The -minimum flag is used to drop all peaks below a given height. I have the suspicion that using the -minimum flag and the -landerwaterman flag at the same time should be counter-indicated. (I believe FindPeaks tells you as much if you try to use them at the same time.) If you use -minimum, the landerwaterman will not know how many peaks were found with height 1 and 2, and should fail to operate correctly.

There is also a flag labeled "-iterations" which provides a similar result, but can be used for non-integer peak heights, based up on zero vs. non-zero regions, however, despite the functional implementation it hasn't had a lot of validation. (This is the FDR that I've been implementing, rather than the landerwaterman. If anyone would like to try it out and help improve it, I'd love to hear from them.) Again, it should be used without the -minimum flag.

What is normally done is to run once with -landerwaterman (or -iterations) to find the FDR value that you'd like, and then run a second time with -minimum to drop all peaks below this. It seems redundant, but is actually very useful for pipeline work.

Going back to what I mentioned above, I'm about to start on a new round of FindPeaks development in which this should be cleaned up a bit more and will finally allow an FDR value to be passed at the command line. Hopefully that will simplify the process you've outlined above. The general problem was that all of the files were being written on the fly, and findpeaks then couldn't go back into the file and modify the written out peaks after the FDR was calculated.

In the new development, FP will now cache the peaks and wig information, calculate the FDR, and then write out the files - so this should allow much more sophisticated processing - and get around the problem you've alluded to in your post.

I hope that helps.

Anthony

**frozenlyse** · 01-07-2009, 11:44 PM

Hi Anthony - that clears a lot of my confusion up

just one more question - using the lander-waterman FDR (for now) the FDR is estimated for each height for each chromosome - does it make sense to use a different height cutoff for each chromosome, or should i say, take the median of all the heights closest to my desired cutoff (0,01 for now)?

Thanks again for findpeaks,
Aaron

**apfejes** · 01-08-2009, 09:09 AM

Hi Aaron,

Ok, I'm glad that made sense - it was a much longer ramble than I would usually give on SeqAnswers. (-:

With respect to using chromosome-by-chromosome FDRs, both options have strengths and weaknesses. I've often noticed that one or two chromosomes have FDRs that are rather unusual, e.g. for many of the samples I've looked at, Human Chr 12 seems to have a higher threshold than the other chromosomes. I think this is very specific the to the experiment being done, so I can't say what you should do.

If the Std Dev of the thresholds you get is very small, it's probably not a disaster to just apply one threshold to the whole genome. If you get widely different thresholds for each chr with the same FDR, you're clearly going to have to treat each chromosome separately. The point of calculating FDR separately for each chr was simply because we noticed that there are sometimes the global threshold isn't a great idea. At least this way you can evaluate whether a local or global threshold is more appropriate.

Cheers,

Anthony

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News