Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLAT - uniquely mapped reads/multiple hits

    Hi all,

    I wanted to know if there was a simple way to filter the output.psl from BLAT to obtain a file containing only the uniquely mapped reads.
    Concerning the multiple hits, does BLAT sort them by any order of probability or else? I couldn't find this information in the documentation... Having a look at the output file makes me think it doesn't.
    But, can I nevertheless tell the soft to output only the alignments that are the most likely to be true?

    Thanks in advance.

  • #2
    "Most likely to be true" is a nebulous standard. You can, however, filter a psl file to report only the best and nearly the best hits for a given query. The program pslReps, which should be distributed with BLAT, filters .psl files. There are a number of parameters to adjust the stringency of filtering. Here is a link to some tips given by Jim Kent (author of BLAT and pslReps) on the parameters they use at UCSC. Of course that was in the context of aligning ESTs or full length cDNAs. He makes the point in his response that it is not possible to force pslReps to only report a single alignment for a query (even when using the "-singleHit" option) if there are multiple hits with the same or nearly the same score.

    Comment


    • #3
      Yes, "most likely to be true" is a very fuzzy notion here.
      I hadn't see pslReps/Sort... was distributed with blast, I'm still a newbie and I'm quite confused with all the different softs that have been developped...
      This is a real mess for someone new in this field as I am!

      Thank you very much for your help, I'll try to run pslReps and others psl stuff.

      Comment


      • #4
        You may find the git repo helpful, here is the link:


        I used BLAT recently in a RNA-seq splice junction detection project, here is
        some perl scripts for running BLAT and parsing psl result, might be of help to you:
        Yet another bioinformatics tool to detect de novo splice junctions from paired-end RNA-seq reads (human genome only) - lifengtian/SplicePL


        I tried pslReps for exactly the same problem, it was not designed for it.



        Originally posted by Adamo View Post
        Yes, "most likely to be true" is a very fuzzy notion here.
        I hadn't see pslReps/Sort... was distributed with blast, I'm still a newbie and I'm quite confused with all the different softs that have been developped...
        This is a real mess for someone new in this field as I am!

        Thank you very much for your help, I'll try to run pslReps and others psl stuff.
        Last edited by lifeng.tian; 07-02-2010, 04:01 PM.

        Comment


        • #5
          Thank you, I think it can be very helpful!

          However, I have some questions about how to use the scripts (I'm all new to biology and bioinformatic...):

          Why should I mask the genome? (actually, I haven't understood this notion yet). I'll work on a bacterial one, do I have to mask it too?

          I only have single end read, is it ok anyway? Will it work if I just use the "--forward=..." thing?

          As I understand it, I'll have my alignment stored in the "temp" directory after running Blat. Then, what is the command to filter the output.psl so that I obtain only uniquely mapped reads?

          Sorry if some questions are a little bit naive...!
          Last edited by Adamo; 07-05-2010, 12:56 AM.

          Comment


          • #6
            Please check out this perl script at
            Yet another bioinformatics tool to detect de novo splice junctions from paired-end RNA-seq reads (human genome only) - lifengtian/SplicePL


            It will run BLAT on N processes and generate temp/unique and temp/unique.psl
            LMK if you have more questions at [email protected]

            BTW, you don't need to mask the genome.
            Last edited by lifeng.tian; 07-05-2010, 03:39 PM.

            Comment


            • #7
              Thanks you again, I'm having a look at your script. It seems quite approachable, even for me!
              I'll let you know if I need some more help.

              Comment


              • #8
                Just remind you, the minscore will determine the final number of unique reads. The default value of 30 is way too low for bacterial genome and long reads. Assuming the read length is 200bp, then a 90% match requires
                a minscore of 180.
                Last edited by lifeng.tian; 07-06-2010, 05:50 AM.

                Comment


                • #9
                  The thing is, I've reads of different lenghts, from 100bp to 300bp. Can't I specify a percentage instead of a precise score?
                  Last edited by Adamo; 07-06-2010, 06:42 AM.

                  Comment


                  • #10
                    Oops, mistake.
                    Last edited by Adamo; 07-06-2010, 06:40 AM.

                    Comment


                    • #11
                      I modified the blat_singleend.pl.
                      Try run it with --minidentity=90
                      IT will require the match score to be larger than individual_read_length * 0.9.

                      Originally posted by Adamo View Post
                      The thing is, I've reads of different lenghts, from 100bp to 300bp. Can't I specify a percentage instead of a precise score?

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      18 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X