Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • federica.r
    Junior Member
    • Jul 2015
    • 4

    How to convert blastn output file to sam/bam

    I am trying to convert the output of blast in a .sam or .bam file using the blast2sam tool.

    The alignement of the reads has been done with the command

    blastn -query 130205_UNC11-SN627_0280_AC1NEKACXX_TTAGGC_L004_1.fasta -db blast_ref -word_size 15 -outfmt "6 qseqid sseqid pident nident length mismatch positive gapopen gaps ppos qframe sframe sstrand qcovs qstart qend qseq sstart send sseq evalue bitscore score" -out blast_tab

    This is the first line of the output blast_tab:

    UNC11-SN627:280:C1NEKACXX:4:1101:11031:1976 sequenzadifusione 93.62 44 3 44 0 0 93.62 1 1 plus 98 2 48 TGAACCCGGGAGGTGGAGGTTGCAGTGAGCCGAGATTGCGCCACTGC 24710 24756 TGAACCCGGGAGGTGGAGGCTGCAGTGAGCTGAGATAGCGCCACTGC 6e-16 71.3 38

    Then the conversion has been done with the command blast2sam (not blast2bam)

    blast2sam.pl blast_tab > blast.sam

    For the conversion we didn't use the default format, but the tabular format of the output of blast.

    In the conversion there aren't errors, but the output file blast.sam is empty.



    Where can be the error?
    Is there another tool to make the conversion or another alignment tool for which it is possible to specify the output format as .sam or .bam?
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Try the new code mentioned in this thread at the end: https://www.biostars.org/p/53434/ You will need to have your blast results in XML format (based on the readme for the new code).

    Comment

    • federica.r
      Junior Member
      • Jul 2015
      • 4

      #3
      I downloaded the code, but I'm not able to create the ref.dict. How can I do it?
      Then in the folder "src" there are two codes (blastSam.c and blastSam.h), so which one should I use?
      Thanks

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Originally posted by federica.r View Post
        I downloaded the code, but I'm not able to create the ref.dict. How can I do it?
        Then in the folder "src" there are two codes (blastSam.c and blastSam.h), so which one should I use?
        Thanks
        That is source code. You will need to compile it into an executable using a compiler (gcc). What OS are you using?

        You may also have to re-run your blast to get output in XML format (unless there is a tab to XML converter available).

        Comment

        • federica.r
          Junior Member
          • Jul 2015
          • 4

          #5
          Originally posted by GenoMax View Post
          That is source code. You will need to compile it into an executable using a compiler (gcc). What OS are you using?

          You may also have to re-run your blast to get output in XML format (unless there is a tab to XML converter available).
          I am using Linux.
          Where can I find the tab to XML converter?

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Originally posted by federica.r View Post
            I am using Linux.
            Where can I find the tab to XML converter?
            Were you able to compile the program?

            XML converter: https://www.biostars.org/p/7981/

            Ideally you should re-run the blast and save output as XML.

            Comment

            • lindenb
              Senior Member
              • Apr 2010
              • 143

              #7
              * ref.dict is created with picard CreateSequenceDictionary http://broadinstitute.github.io/picard/
              * to compile the C program you need : GNU make and the GCC 'C compiler'
              * "Where can I find the tab to XML converter? " you can't : it's like creating a cow from a steak.

              Comment

              • federica.r
                Junior Member
                • Jul 2015
                • 4

                #8
                Originally posted by GenoMax View Post
                Were you able to compile the program?

                XML converter: https://www.biostars.org/p/7981/

                Ideally you should re-run the blast and save output as XML.
                We tried to compile the program, but there is the following error:

                xsltproc --output parseXML.c --stringparam fileType c schema2c.xsl schema.xml
                make: xsltproc: Command not found
                make: *** [parseXML.c] Error 127

                What does it mean?

                We are also trying to run blast to obtain the XML output but it's taking a really long time (almost 24 hour, probably because the output is very big).

                Thank you

                Comment

                • lindenb
                  Senior Member
                  • Apr 2010
                  • 143

                  #9
                  as specified in the Requirements section of https://github.com/guyduche/Blast2Bam , xsltproc is required . On most linux there is a command to quickly install it . Something like 'sudo apt-get install xsltproc'

                  "probably because the output is very big" : yes, because the sequences are fetched+added. You can pipe the output into gzip to reduce the size of the XML, or directly pipe the output of blasn into blast2sam as shown in https://github.com/guyduche/Blast2Ba...ster/README.md

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    You may need to install libxslt (and perhaps libxml2 as well). Install instructions would vary depending on what kind of linux distro you are using.

                    Comment

                    • maubp
                      Peter (Biopython etc)
                      • Jul 2009
                      • 1544

                      #11
                      It isn't ready yet, but the NCBI seem to be working on adding SAM output to BLAST+ itself:
                      Recently NCBI BLAST+ 2.2.31  was released, and it contains an undocumented "Easter Egg" - this is still very rough around the edges but they...

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      20 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      14 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-26-2026, 10:12 AM
                      0 responses
                      31 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...