Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to convert blastn output file to sam/bam

    I am trying to convert the output of blast in a .sam or .bam file using the blast2sam tool.

    The alignement of the reads has been done with the command

    blastn -query 130205_UNC11-SN627_0280_AC1NEKACXX_TTAGGC_L004_1.fasta -db blast_ref -word_size 15 -outfmt "6 qseqid sseqid pident nident length mismatch positive gapopen gaps ppos qframe sframe sstrand qcovs qstart qend qseq sstart send sseq evalue bitscore score" -out blast_tab

    This is the first line of the output blast_tab:

    UNC11-SN627:280:C1NEKACXX:4:1101:11031:1976 sequenzadifusione 93.62 44 3 44 0 0 93.62 1 1 plus 98 2 48 TGAACCCGGGAGGTGGAGGTTGCAGTGAGCCGAGATTGCGCCACTGC 24710 24756 TGAACCCGGGAGGTGGAGGCTGCAGTGAGCTGAGATAGCGCCACTGC 6e-16 71.3 38

    Then the conversion has been done with the command blast2sam (not blast2bam)

    blast2sam.pl blast_tab > blast.sam

    For the conversion we didn't use the default format, but the tabular format of the output of blast.

    In the conversion there aren't errors, but the output file blast.sam is empty.



    Where can be the error?
    Is there another tool to make the conversion or another alignment tool for which it is possible to specify the output format as .sam or .bam?

  • #2
    Try the new code mentioned in this thread at the end: https://www.biostars.org/p/53434/ You will need to have your blast results in XML format (based on the readme for the new code).

    Comment


    • #3
      I downloaded the code, but I'm not able to create the ref.dict. How can I do it?
      Then in the folder "src" there are two codes (blastSam.c and blastSam.h), so which one should I use?
      Thanks

      Comment


      • #4
        Originally posted by federica.r View Post
        I downloaded the code, but I'm not able to create the ref.dict. How can I do it?
        Then in the folder "src" there are two codes (blastSam.c and blastSam.h), so which one should I use?
        Thanks
        That is source code. You will need to compile it into an executable using a compiler (gcc). What OS are you using?

        You may also have to re-run your blast to get output in XML format (unless there is a tab to XML converter available).

        Comment


        • #5
          Originally posted by GenoMax View Post
          That is source code. You will need to compile it into an executable using a compiler (gcc). What OS are you using?

          You may also have to re-run your blast to get output in XML format (unless there is a tab to XML converter available).
          I am using Linux.
          Where can I find the tab to XML converter?

          Comment


          • #6
            Originally posted by federica.r View Post
            I am using Linux.
            Where can I find the tab to XML converter?
            Were you able to compile the program?

            XML converter: https://www.biostars.org/p/7981/

            Ideally you should re-run the blast and save output as XML.

            Comment


            • #7
              * ref.dict is created with picard CreateSequenceDictionary http://broadinstitute.github.io/picard/
              * to compile the C program you need : GNU make and the GCC 'C compiler'
              * "Where can I find the tab to XML converter? " you can't : it's like creating a cow from a steak.

              Comment


              • #8
                Originally posted by GenoMax View Post
                Were you able to compile the program?

                XML converter: https://www.biostars.org/p/7981/

                Ideally you should re-run the blast and save output as XML.
                We tried to compile the program, but there is the following error:

                xsltproc --output parseXML.c --stringparam fileType c schema2c.xsl schema.xml
                make: xsltproc: Command not found
                make: *** [parseXML.c] Error 127

                What does it mean?

                We are also trying to run blast to obtain the XML output but it's taking a really long time (almost 24 hour, probably because the output is very big).

                Thank you

                Comment


                • #9
                  as specified in the Requirements section of https://github.com/guyduche/Blast2Bam , xsltproc is required . On most linux there is a command to quickly install it . Something like 'sudo apt-get install xsltproc'

                  "probably because the output is very big" : yes, because the sequences are fetched+added. You can pipe the output into gzip to reduce the size of the XML, or directly pipe the output of blasn into blast2sam as shown in https://github.com/guyduche/Blast2Ba...ster/README.md

                  Comment


                  • #10
                    You may need to install libxslt (and perhaps libxml2 as well). Install instructions would vary depending on what kind of linux distro you are using.

                    Comment


                    • #11
                      It isn't ready yet, but the NCBI seem to be working on adding SAM output to BLAST+ itself:
                      Recently NCBI BLAST+ 2.2.31  was released, and it contains an undocumented "Easter Egg" - this is still very rough around the edges but they...

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      67 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X