Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BAM -> SAM conversion error

    Hi,

    I set up a pipeline to process sequence data from FASTQ.gz -> sorted, non-redundant, and indexted BAM files. After the pipeline finishes, I delete the SAM files to conserve space.

    In order to add @RG tags to some BAM files, I need to convert the BAM back to SAM and run a script to add the tags. I use samtools at the linux command line; I do not use picard--nor to I want to.

    The conversion was performed as follows:

    samtools view -h BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.bam > BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.sam

    The following error occured when after the conversion:

    samtools view BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.sam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.sam".

    The same error occurs if I use -H to view the header.

    Is there something I'm missing here?

    Cheers,
    Joe White

  • #2
    then your bam is not correct. Try use hexdump to see it.

    Comment


    • #3
      Originally posted by xied75 View Post
      then your bam is not correct. Try use hexdump to see it.
      As far as I can tell this is NOT the case. After the sam file is initially created, it is converted to BAM with samtools:

      samtools import ref.fai in.SAM out.BAM

      The file is readable by samtools view -H ...

      Some of the BAM files that are created during sorting and duplicate removal are not readable with samtools view, and generate errors like:

      samtools view BGL-112-003_ATCACG_R2.fastq.gz.bam
      [bam_header_read] bgzf_check_EOF: Invalid argument
      [bam_header_read] invalid BAM binary header (this is not a BAM file).
      [main_samview] fail to read the header from "BGL-112-003_ATCACG_R2.fastq.gz.bam".

      I don't understand why the samtools commands are not able to read their own output.

      Joe

      Comment


      • #4
        Need to look at your command line then.

        Comment


        • #5
          samtools import to make a .bam from a .sam? I think that's deprecated.

          Use samtools view instead.

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            samtools import to make a .bam from a .sam? I think that's deprecated.

            Use samtools view instead.
            What options would you use for the samtools view command?
            I see -b -u -S and -T in the help screen that might be relevant. Is -T necessary?

            What does -u do? Its for uncompressed BAM output. (An uncompressed binary format? No comprendo. )

            jw
            Last edited by jwhite; 02-06-2013, 10:08 AM.

            Comment


            • #7
              Originally posted by jwhite View Post
              What options would you use for the samtools view command?
              I see -b -u -S and -T in the help screen that might be relevant. Is -T necessary?

              What does -u do? Its for uncompressed BAM output. (An uncompressed binary format? No comprendo. )

              jw
              -T is not necessary. -u is only useful if you are going to pipe the output straight into some other command line.

              samtools view -bSh will do what you want.

              Comment


              • #8
                Hi Joe,

                If your file is missing the @SQ header lines, you will need your reference file, *.fa, in order to convert to BAM. The command will look like:
                samtools faidx your_ref.fa
                samtools view -bt your_ref.fa.fai your_sam.sam > your_bam.bam
                Let me know if that works.

                Cheers,

                Andrew

                Comment


                • #9
                  Originally posted by xied75 View Post
                  Need to look at your command line then.
                  # align with bwa
                  /opt/bwa/bin/bwa aln -t 12 $bwaidx $fq1 > $fq1.bwa
                  /opt/bwa/bin/bwa aln -t 12 $bwaidx $fq2 > $fq2.bwa
                  # Converting BWA output to SAM format ...
                  /opt/bwa/bin/bwa sampe $bwaidx $fq1.bwa $fq2.bwa $fq1 $fq2 > $fq2.sam

                  # Using index convert the SAM file to BAM file
                  /opt/samtools-0.1.18/samtools import $reffq.fai $fq2.sam $fq2.bam

                  Notes:
                  $bwaidx is the base name of the bwa index files for the genome fasta
                  $fq1 and $fq2 are forward and reverse reads files in fastq.gz form
                  $reffa.fai is the fasta index file for the genome fasta file

                  Joe

                  Comment


                  • #10
                    Hi, Joe,

                    Those commands look alright. When you said
                    Some of the BAM files that are created during sorting and duplicate removal are not readable with samtools view, and generate errors like:
                    You mean when you tried to do sorting and dup remove on your newly generated bam you get those error msg or you mean after you did sorting and dup remove some of the bam is no longer viewable? If the latter, we need to see the commands how you did sorting and dup remove.

                    Comment


                    • #11
                      Originally posted by xied75 View Post
                      Hi, Joe,

                      Those commands look alright. When you said

                      You mean when you tried to do sorting and dup remove on your newly generated bam you get those error msg or you mean after you did sorting and dup remove some of the bam is no longer viewable? If the latter, we need to see the commands how you did sorting and dup remove.
                      These commands--and the previously sent--are all included in a shell script that I launch on our cluster's batch queue. The pipeline process runs without fail. The files are readable with vi and less, but samtools view fails to read some of the BAM files created during processing.

                      # Sort the BAM file
                      /opt/samtools-0.1.18/samtools sort $fq2.bam $fq2.srt

                      # Remove duplicates from the BAM file
                      /opt/samtools-0.1.18/samtools rmdup $fq2.srt.bam $fq2.nodup.bam

                      # Sort the BAM file, again!
                      /opt/samtools-0.1.18/samtools sort $fq2.nodup.bam $fq2.nodup.srt

                      # Index the BAM file
                      /opt/samtools-0.1.18/samtools index $fq2.nodup.srt.bam

                      After this step, pileup, depth and VCF file generation all use the indexed BAM file.

                      The reason I have to go back to SAM files is to add @RG tags to the BAM files in order to use the GATK toolkit that our PI wants to use.

                      Joe

                      Comment


                      • #12
                        Originally posted by jwhite View Post
                        # align with bwa
                        /opt/bwa/bin/bwa aln -t 12 $bwaidx $fq1 > $fq1.bwa
                        /opt/bwa/bin/bwa aln -t 12 $bwaidx $fq2 > $fq2.bwa
                        # Converting BWA output to SAM format ...
                        /opt/bwa/bin/bwa sampe $bwaidx $fq1.bwa $fq2.bwa $fq1 $fq2 > $fq2.sam

                        # Using index convert the SAM file to BAM file
                        /opt/samtools-0.1.18/samtools import $reffq.fai $fq2.sam $fq2.bam

                        Notes:
                        $bwaidx is the base name of the bwa index files for the genome fasta
                        $fq1 and $fq2 are forward and reverse reads files in fastq.gz form
                        $reffa.fai is the fasta index file for the genome fasta file

                        Joe
                        Again, I'm pretty sure import is deprecated. So even though it's been working for you, well, it's not working for you now. It's hard to get help when you use deprecated commands.

                        If all you want to do is add readgroup data, Picard can do that to .bams. That's a lot easier than redoing all the alignments

                        And as a general rule, making a .sam file of your whole project is a waste of space.

                        You should pipe the .sam file output to samtools to convert it to .bam on the fly:

                        Code:
                        bwa sampe $ref $out1.sai $out2.sai $fq1.fq.gz $fq2.fq.gz | samtools view -bShr ‘@RG\tID:foo\tSM:bar’ - > out.bam

                        Comment


                        • #13
                          Originally posted by swbarnes2 View Post
                          Again, I'm pretty sure import is deprecated. So even though it's been working for you, well, it's not working for you now. It's hard to get help when you use deprecated commands.

                          If all you want to do is add readgroup data, Picard can do that to .bams. That's a lot easier than redoing all the alignments

                          And as a general rule, making a .sam file of your whole project is a waste of space.

                          You should pipe the .sam file output to samtools to convert it to .bam on the fly:

                          Code:
                          bwa sampe $ref $out1.sai $out2.sai $fq1.fq.gz $fq2.fq.gz | samtools view -bShr ‘@RG\tID:foo\tSM:bar’ - > out.bam
                          Hi,

                          Thanks for your comments. I think you may have killed 3 birds with one stone here.
                          This could prove really helpful. I'll let you know.

                          Joe

                          Comment


                          • #14
                            Originally posted by jwhite View Post
                            Hi,

                            Thanks for your comments. I think you may have killed 3 birds with one stone here.
                            This could prove really helpful. I'll let you know.

                            Joe
                            Hi swbarnes2,

                            The command you listed worked well for generating samtools readable BAM files with headers. The part with @RG didn't produce @RG tags; since this was scripted, I have to check the code.

                            The command includes:

                            ... | /<path>/samtools view -bShr "@RG\tID:$rg\tSM:$rg" - > $fq2.bam

                            where $rg = "BGL-160-014"

                            Please let me know if you see anything overtly wrong with the above.

                            Thanks again; this is very helpful.

                            Joe

                            Comment


                            • #15
                              Hi,

                              I am purplexed by the behavior of the samtools suite. When I create valid BAM files and read them with samtools view, and then output the results to a sam file, I expect that samtools will be able to read the sam file. Here are the commands used:

                              samtools view -h BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.bam > test.sam

                              addReadGroup.pl -i test.sam -o test.sam.out -r "RG4" -s "BGL-160-014" -l "BGL-160-014" -p "Hieq 2000"

                              samtools view -h test.sam.out |more
                              [bam_header_read] EOF marker is absent. The input is probably truncated.
                              [bam_header_read] invalid BAM binary header (this is not a BAM file).
                              [main_samview] fail to read the header from "test.sam.out".

                              I can view the the output file; it has the necessary @RG tags in the header and on each line.

                              Anyone have any idea why this is happening?

                              Cheers,

                              Joe

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X