SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to get uniquely mapped reads from Tophat subeet Bioinformatics 10 11-28-2012 05:56 AM
not uniquely mapped reads unidodo RNA Sequencing 2 04-22-2011 02:07 PM
Uniquely mapped reads with bowtie mapper Bioinformatics 2 11-22-2010 10:44 PM
Multiple hits from BLAT? thsuk1 Bioinformatics 2 09-21-2010 12:13 AM
cufflinks and non-uniquely mapped reads clariet Bioinformatics 1 05-08-2010 11:13 AM

Reply
 
Thread Tools
Old 06-16-2010, 11:57 PM   #1
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default BLAT - uniquely mapped reads/multiple hits

Hi all,

I wanted to know if there was a simple way to filter the output.psl from BLAT to obtain a file containing only the uniquely mapped reads.
Concerning the multiple hits, does BLAT sort them by any order of probability or else? I couldn't find this information in the documentation... Having a look at the output file makes me think it doesn't.
But, can I nevertheless tell the soft to output only the alignments that are the most likely to be true?

Thanks in advance.
Adamo is offline   Reply With Quote
Old 06-17-2010, 05:06 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

"Most likely to be true" is a nebulous standard. You can, however, filter a psl file to report only the best and nearly the best hits for a given query. The program pslReps, which should be distributed with BLAT, filters .psl files. There are a number of parameters to adjust the stringency of filtering. Here is a link to some tips given by Jim Kent (author of BLAT and pslReps) on the parameters they use at UCSC. Of course that was in the context of aligning ESTs or full length cDNAs. He makes the point in his response that it is not possible to force pslReps to only report a single alignment for a query (even when using the "-singleHit" option) if there are multiple hits with the same or nearly the same score.
kmcarr is offline   Reply With Quote
Old 06-17-2010, 05:17 AM   #3
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default

Yes, "most likely to be true" is a very fuzzy notion here.
I hadn't see pslReps/Sort... was distributed with blast, I'm still a newbie and I'm quite confused with all the different softs that have been developped...
This is a real mess for someone new in this field as I am!

Thank you very much for your help, I'll try to run pslReps and others psl stuff.
Adamo is offline   Reply With Quote
Old 07-02-2010, 03:58 PM   #4
lifeng.tian
Member
 
Location: Philadelphia

Join Date: Jul 2009
Posts: 16
Default

You may find the git repo helpful, here is the link:
http://genome.ucsc.edu/admin/git.html

I used BLAT recently in a RNA-seq splice junction detection project, here is
some perl scripts for running BLAT and parsing psl result, might be of help to you:
http://github.com/lifengtian/SplicePL

I tried pslReps for exactly the same problem, it was not designed for it.



Quote:
Originally Posted by Adamo View Post
Yes, "most likely to be true" is a very fuzzy notion here.
I hadn't see pslReps/Sort... was distributed with blast, I'm still a newbie and I'm quite confused with all the different softs that have been developped...
This is a real mess for someone new in this field as I am!

Thank you very much for your help, I'll try to run pslReps and others psl stuff.

Last edited by lifeng.tian; 07-02-2010 at 04:01 PM.
lifeng.tian is offline   Reply With Quote
Old 07-05-2010, 12:04 AM   #5
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default

Thank you, I think it can be very helpful!

However, I have some questions about how to use the scripts (I'm all new to biology and bioinformatic...):

Why should I mask the genome? (actually, I haven't understood this notion yet). I'll work on a bacterial one, do I have to mask it too?

I only have single end read, is it ok anyway? Will it work if I just use the "--forward=..." thing?

As I understand it, I'll have my alignment stored in the "temp" directory after running Blat. Then, what is the command to filter the output.psl so that I obtain only uniquely mapped reads?

Sorry if some questions are a little bit naive...!

Last edited by Adamo; 07-05-2010 at 12:56 AM.
Adamo is offline   Reply With Quote
Old 07-05-2010, 10:54 AM   #6
lifeng.tian
Member
 
Location: Philadelphia

Join Date: Jul 2009
Posts: 16
Default

Please check out this perl script at
http://github.com/lifengtian/SpliceP...t_singleend.pl

It will run BLAT on N processes and generate temp/unique and temp/unique.psl
LMK if you have more questions at lifeng2@mail.med.upenn.edu

BTW, you don't need to mask the genome.

Last edited by lifeng.tian; 07-05-2010 at 03:39 PM.
lifeng.tian is offline   Reply With Quote
Old 07-06-2010, 05:22 AM   #7
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default

Thanks you again, I'm having a look at your script. It seems quite approachable, even for me!
I'll let you know if I need some more help.
Adamo is offline   Reply With Quote
Old 07-06-2010, 05:42 AM   #8
lifeng.tian
Member
 
Location: Philadelphia

Join Date: Jul 2009
Posts: 16
Default

Just remind you, the minscore will determine the final number of unique reads. The default value of 30 is way too low for bacterial genome and long reads. Assuming the read length is 200bp, then a 90% match requires
a minscore of 180.

Last edited by lifeng.tian; 07-06-2010 at 05:50 AM.
lifeng.tian is offline   Reply With Quote
Old 07-06-2010, 06:28 AM   #9
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default

The thing is, I've reads of different lenghts, from 100bp to 300bp. Can't I specify a percentage instead of a precise score?

Last edited by Adamo; 07-06-2010 at 06:42 AM.
Adamo is offline   Reply With Quote
Old 07-06-2010, 06:36 AM   #10
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default

Oops, mistake.

Last edited by Adamo; 07-06-2010 at 06:40 AM.
Adamo is offline   Reply With Quote
Old 07-06-2010, 01:21 PM   #11
lifeng.tian
Member
 
Location: Philadelphia

Join Date: Jul 2009
Posts: 16
Default

I modified the blat_singleend.pl.
Try run it with --minidentity=90
IT will require the match score to be larger than individual_read_length * 0.9.

Quote:
Originally Posted by Adamo View Post
The thing is, I've reads of different lenghts, from 100bp to 300bp. Can't I specify a percentage instead of a precise score?
lifeng.tian is offline   Reply With Quote
Reply

Tags
blat, unaligned reads

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO