SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sam to bam conversion error, no @SQ lines in the header, missing header? efoss Bioinformatics 17 12-03-2015 04:28 AM
bowtie2 output, sam to bam conversion error. a_mt Bioinformatics 3 03-29-2015 01:23 AM
samtools: parse error in SAM to BAM conversion chrisW Bioinformatics 12 01-14-2013 06:16 PM
sam 2 bam conversion error DavyK Bioinformatics 2 01-14-2013 06:13 PM
Tophat v1.1.4 potential error with sam to bam conversion? jb2 Bioinformatics 6 11-17-2011 12:52 AM

Reply
 
Thread Tools
Old 02-06-2013, 06:05 AM   #1
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default BAM -> SAM conversion error

Hi,

I set up a pipeline to process sequence data from FASTQ.gz -> sorted, non-redundant, and indexted BAM files. After the pipeline finishes, I delete the SAM files to conserve space.

In order to add @RG tags to some BAM files, I need to convert the BAM back to SAM and run a script to add the tags. I use samtools at the linux command line; I do not use picard--nor to I want to.

The conversion was performed as follows:

samtools view -h BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.bam > BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.sam

The following error occured when after the conversion:

samtools view BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.sam
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.sam".

The same error occurs if I use -H to view the header.

Is there something I'm missing here?

Cheers,
Joe White
jwhite is offline   Reply With Quote
Old 02-06-2013, 07:09 AM   #2
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

then your bam is not correct. Try use hexdump to see it.
xied75 is offline   Reply With Quote
Old 02-06-2013, 07:29 AM   #3
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Quote:
Originally Posted by xied75 View Post
then your bam is not correct. Try use hexdump to see it.
As far as I can tell this is NOT the case. After the sam file is initially created, it is converted to BAM with samtools:

samtools import ref.fai in.SAM out.BAM

The file is readable by samtools view -H ...

Some of the BAM files that are created during sorting and duplicate removal are not readable with samtools view, and generate errors like:

samtools view BGL-112-003_ATCACG_R2.fastq.gz.bam
[bam_header_read] bgzf_check_EOF: Invalid argument
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "BGL-112-003_ATCACG_R2.fastq.gz.bam".

I don't understand why the samtools commands are not able to read their own output.

Joe
jwhite is offline   Reply With Quote
Old 02-06-2013, 07:37 AM   #4
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

Need to look at your command line then.
xied75 is offline   Reply With Quote
Old 02-06-2013, 08:45 AM   #5
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

samtools import to make a .bam from a .sam? I think that's deprecated.

Use samtools view instead.
swbarnes2 is offline   Reply With Quote
Old 02-06-2013, 08:55 AM   #6
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Quote:
Originally Posted by swbarnes2 View Post
samtools import to make a .bam from a .sam? I think that's deprecated.

Use samtools view instead.
What options would you use for the samtools view command?
I see -b -u -S and -T in the help screen that might be relevant. Is -T necessary?

What does -u do? Its for uncompressed BAM output. (An uncompressed binary format? No comprendo. )

jw

Last edited by jwhite; 02-06-2013 at 09:08 AM.
jwhite is offline   Reply With Quote
Old 02-06-2013, 10:49 AM   #7
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by jwhite View Post
What options would you use for the samtools view command?
I see -b -u -S and -T in the help screen that might be relevant. Is -T necessary?

What does -u do? Its for uncompressed BAM output. (An uncompressed binary format? No comprendo. )

jw
-T is not necessary. -u is only useful if you are going to pipe the output straight into some other command line.

samtools view -bSh will do what you want.
swbarnes2 is offline   Reply With Quote
Old 02-06-2013, 11:22 AM   #8
arm55
Junior Member
 
Location: Seattle, WA

Join Date: Feb 2013
Posts: 3
Default

Hi Joe,

If your file is missing the @SQ header lines, you will need your reference file, *.fa, in order to convert to BAM. The command will look like:
Quote:
samtools faidx your_ref.fa
samtools view -bt your_ref.fa.fai your_sam.sam > your_bam.bam
Let me know if that works.

Cheers,

Andrew
arm55 is offline   Reply With Quote
Old 02-06-2013, 06:34 PM   #9
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Quote:
Originally Posted by xied75 View Post
Need to look at your command line then.
# align with bwa
/opt/bwa/bin/bwa aln -t 12 $bwaidx $fq1 > $fq1.bwa
/opt/bwa/bin/bwa aln -t 12 $bwaidx $fq2 > $fq2.bwa
# Converting BWA output to SAM format ...
/opt/bwa/bin/bwa sampe $bwaidx $fq1.bwa $fq2.bwa $fq1 $fq2 > $fq2.sam

# Using index convert the SAM file to BAM file
/opt/samtools-0.1.18/samtools import $reffq.fai $fq2.sam $fq2.bam

Notes:
$bwaidx is the base name of the bwa index files for the genome fasta
$fq1 and $fq2 are forward and reverse reads files in fastq.gz form
$reffa.fai is the fasta index file for the genome fasta file

Joe
jwhite is offline   Reply With Quote
Old 02-07-2013, 02:11 AM   #10
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

Hi, Joe,

Those commands look alright. When you said
Quote:
Some of the BAM files that are created during sorting and duplicate removal are not readable with samtools view, and generate errors like:
You mean when you tried to do sorting and dup remove on your newly generated bam you get those error msg or you mean after you did sorting and dup remove some of the bam is no longer viewable? If the latter, we need to see the commands how you did sorting and dup remove.
xied75 is offline   Reply With Quote
Old 02-07-2013, 04:40 AM   #11
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Quote:
Originally Posted by xied75 View Post
Hi, Joe,

Those commands look alright. When you said

You mean when you tried to do sorting and dup remove on your newly generated bam you get those error msg or you mean after you did sorting and dup remove some of the bam is no longer viewable? If the latter, we need to see the commands how you did sorting and dup remove.
These commands--and the previously sent--are all included in a shell script that I launch on our cluster's batch queue. The pipeline process runs without fail. The files are readable with vi and less, but samtools view fails to read some of the BAM files created during processing.

# Sort the BAM file
/opt/samtools-0.1.18/samtools sort $fq2.bam $fq2.srt

# Remove duplicates from the BAM file
/opt/samtools-0.1.18/samtools rmdup $fq2.srt.bam $fq2.nodup.bam

# Sort the BAM file, again!
/opt/samtools-0.1.18/samtools sort $fq2.nodup.bam $fq2.nodup.srt

# Index the BAM file
/opt/samtools-0.1.18/samtools index $fq2.nodup.srt.bam

After this step, pileup, depth and VCF file generation all use the indexed BAM file.

The reason I have to go back to SAM files is to add @RG tags to the BAM files in order to use the GATK toolkit that our PI wants to use.

Joe
jwhite is offline   Reply With Quote
Old 02-07-2013, 11:54 AM   #12
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by jwhite View Post
# align with bwa
/opt/bwa/bin/bwa aln -t 12 $bwaidx $fq1 > $fq1.bwa
/opt/bwa/bin/bwa aln -t 12 $bwaidx $fq2 > $fq2.bwa
# Converting BWA output to SAM format ...
/opt/bwa/bin/bwa sampe $bwaidx $fq1.bwa $fq2.bwa $fq1 $fq2 > $fq2.sam

# Using index convert the SAM file to BAM file
/opt/samtools-0.1.18/samtools import $reffq.fai $fq2.sam $fq2.bam

Notes:
$bwaidx is the base name of the bwa index files for the genome fasta
$fq1 and $fq2 are forward and reverse reads files in fastq.gz form
$reffa.fai is the fasta index file for the genome fasta file

Joe
Again, I'm pretty sure import is deprecated. So even though it's been working for you, well, it's not working for you now. It's hard to get help when you use deprecated commands.

If all you want to do is add readgroup data, Picard can do that to .bams. That's a lot easier than redoing all the alignments

And as a general rule, making a .sam file of your whole project is a waste of space.

You should pipe the .sam file output to samtools to convert it to .bam on the fly:

Code:
bwa sampe $ref $out1.sai $out2.sai $fq1.fq.gz $fq2.fq.gz | samtools view -bShr ‘@RG\tID:foo\tSM:bar’ - > out.bam
swbarnes2 is offline   Reply With Quote
Old 02-07-2013, 12:29 PM   #13
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Quote:
Originally Posted by swbarnes2 View Post
Again, I'm pretty sure import is deprecated. So even though it's been working for you, well, it's not working for you now. It's hard to get help when you use deprecated commands.

If all you want to do is add readgroup data, Picard can do that to .bams. That's a lot easier than redoing all the alignments

And as a general rule, making a .sam file of your whole project is a waste of space.

You should pipe the .sam file output to samtools to convert it to .bam on the fly:

Code:
bwa sampe $ref $out1.sai $out2.sai $fq1.fq.gz $fq2.fq.gz | samtools view -bShr ‘@RG\tID:foo\tSM:bar’ - > out.bam
Hi,

Thanks for your comments. I think you may have killed 3 birds with one stone here.
This could prove really helpful. I'll let you know.

Joe
jwhite is offline   Reply With Quote
Old 02-11-2013, 05:36 AM   #14
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Quote:
Originally Posted by jwhite View Post
Hi,

Thanks for your comments. I think you may have killed 3 birds with one stone here.
This could prove really helpful. I'll let you know.

Joe
Hi swbarnes2,

The command you listed worked well for generating samtools readable BAM files with headers. The part with @RG didn't produce @RG tags; since this was scripted, I have to check the code.

The command includes:

... | /<path>/samtools view -bShr "@RG\tID:$rg\tSM:$rg" - > $fq2.bam

where $rg = "BGL-160-014"

Please let me know if you see anything overtly wrong with the above.

Thanks again; this is very helpful.

Joe
jwhite is offline   Reply With Quote
Old 02-12-2013, 11:20 AM   #15
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Hi,

I am purplexed by the behavior of the samtools suite. When I create valid BAM files and read them with samtools view, and then output the results to a sam file, I expect that samtools will be able to read the sam file. Here are the commands used:

samtools view -h BGL-160-014_TGACCA_R2.fastq.gz.nodup.srt.bam > test.sam

addReadGroup.pl -i test.sam -o test.sam.out -r "RG4" -s "BGL-160-014" -l "BGL-160-014" -p "Hieq 2000"

samtools view -h test.sam.out |more
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "test.sam.out".

I can view the the output file; it has the necessary @RG tags in the header and on each line.

Anyone have any idea why this is happening?

Cheers,

Joe
jwhite is offline   Reply With Quote
Old 02-12-2013, 01:03 PM   #16
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

The samtools view expects a .bam file. test.sam.out is a sam file. You can just look at it with head or whatever to see if the read groups took.

Also, if that perl script isn't working, just try something else. Picard does so many useful things, you might as well get it if you don't have it already.

And converting the .bam to a .sam is a little silly. Can't you pipe the .bam through samtools view into the perl script?

Last edited by swbarnes2; 02-12-2013 at 01:07 PM.
swbarnes2 is offline   Reply With Quote
Old 02-13-2013, 05:32 AM   #17
jwhite
Member
 
Location: Boston

Join Date: Jun 2012
Posts: 33
Default

Quote:
Originally Posted by swbarnes2 View Post
The samtools view expects a .bam file. test.sam.out is a sam file. You can just look at it with head or whatever to see if the read groups took.

Also, if that perl script isn't working, just try something else. Picard does so many useful things, you might as well get it if you don't have it already.

And converting the .bam to a .sam is a little silly. Can't you pipe the .bam through samtools view into the perl script?
Thanks for your help. I thought samtools view also viewed sam files--even though 'less' works too.

Yes, I can get Picard. I've avoided it because I did not want to deal with java for this pipeline.

Yes, I can pipe the view output into the perl script. Hadn't though about that.

But my main issue has been that samtools view has been throwing up these errors when I supply BAM files as imput. These BAM files were generated by samtools; that's what I don't get. Something has been altered in the BAM files such that they are no longer readable by samtools.

Joe

Last edited by jwhite; 02-13-2013 at 05:37 AM.
jwhite is offline   Reply With Quote
Reply

Tags
bam coversion

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO