SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sam to bam conversion error, no @SQ lines in the header, missing header? efoss Bioinformatics 17 12-03-2015 04:28 AM
the header of SAM and BAM missing dongshenglulv Bioinformatics 5 10-23-2011 11:44 PM
how to convert sam to bam with EOF marker in header jianfeng.mao Bioinformatics 2 12-17-2010 05:56 AM
BWA: specifying SAM/BAM file header fields before read alignment? nora Bioinformatics 3 12-04-2010 09:11 PM
sam/bam header lines keebs42 Bioinformatics 1 08-21-2009 11:25 AM

Reply
 
Thread Tools
Old 05-02-2011, 10:07 AM   #1
emilyjia2000
Member
 
Location: usa

Join Date: May 2011
Posts: 59
Default .SAM to .BAM with SAM file header @PG

Hi
I used export2sam.pl to convert export.txt to .sam. I checked the newly generated SAM file with header @PG. When I tried to use command line, like
" samtools view -b in.sam -o out.bam "
to generate BAM file, it occurs errors:

[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "in.sam".

Does anybody know what's wrong with it? What command line I should use for converting SAM to BAM

Thanks
emilyjia2000 is offline   Reply With Quote
Old 05-02-2011, 10:27 AM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

use -S parameter

Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]]

Options: -b output BAM
-h print header for the SAM output
-H print header only (no alignments)
-S input is SAM
-u uncompressed BAM output (force -b)
-1 fast compression (force -b)
-x output FLAG in HEX (samtools-C specific)
-X output FLAG in string (samtools-C specific)
-c print only the count of matching records
-L FILE output alignments overlapping the input BED FILE [null]
-t FILE list of reference names and lengths (force -S) [null]
-T FILE reference sequence file (force -S) [null]
-o FILE output file name [stdout]
-R FILE list of read groups to be outputted [null]
-f INT required flag, 0 for unset [0]
-F INT filtering flag, 0 for unset [0]
-q INT minimum mapping quality [0]
-l STR only output reads in library STR [null]
-r STR only output reads in read group STR [null]
-? longer help

Richard Finney is offline   Reply With Quote
Old 05-02-2011, 11:08 AM   #3
emilyjia2000
Member
 
Location: usa

Join Date: May 2011
Posts: 59
Default

Hi Richard,

I do want to convert SAM to BAM, it output error when I used "samtools view -b in.sam -o out.bam". I checked the header of SAM file, it comes with @PG. I don't know how to deal with it?

Thanks
emilyjia2000 is offline   Reply With Quote
Old 05-03-2011, 05:24 AM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by emilyjia2000 View Post
Hi Richard,

I do want to convert SAM to BAM, it output error when I used "samtools view -b in.sam -o out.bam". I checked the header of SAM file, it comes with @PG. I don't know how to deal with it?

Thanks
Emily,

As Richard said you need to us the -S option (in addition to your other options) to tell samtools view that the INPUT is in SAM format. By default samtools view expects a BAM file as input but you are giving it a SAM file, that's what is causing an error.
kmcarr is offline   Reply With Quote
Old 06-14-2011, 08:00 AM   #5
SDBP
Member
 
Location: USA

Join Date: Jan 2011
Posts: 12
Default

I am dealing with the same kind of SAM files - header @PG.
I tried -S option, it didn't work.
First I saw the segmentation fault. When I fixed that and ran

samtools view -bt my.fa.fai my.sam > my.bam - It showed the following

[sam_read1] reference 'chr3.fa' is recognized as '*'.
[sam_read1] reference 'chr1.fa' is recognized as '*'.
[sam_read1] reference 'chr19.fa' is recognized as '*'.
[sam_read1] reference 'chr3.fa' is recognized as '*'.

Then I did a sed s/.fa// on the input file before doing export2sam.pl and ran export2sam.pl, it throws the following errors:

ERROR: Unexpected number of fields in export record on line 285 of read1 export file. Found 21 fields but expected 22.
...erroneous export record:
ABC-GA2 1 4 1 3 1347 0 1 TTTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC

Any insight will be helpful.
Any other SAM to BAM tools known for sam files with @PG ?????
SDBP is offline   Reply With Quote
Old 06-14-2011, 08:25 AM   #6
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

What is the full command you are using for export2sam.pl ?
Beware that the input is supposed to be a "GERALD" type of file (also know as "illumina export file").
Richard Finney is offline   Reply With Quote
Old 06-14-2011, 09:11 AM   #7
SDBP
Member
 
Location: USA

Join Date: Jan 2011
Posts: 12
Default

perl export2sam.pl --read1=my_export.txt > my_export.sam
SDBP is offline   Reply With Quote
Old 06-14-2011, 09:13 AM   #8
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

What version of samtools?
Richard Finney is offline   Reply With Quote
Old 06-14-2011, 11:31 AM   #9
SDBP
Member
 
Location: USA

Join Date: Jan 2011
Posts: 12
Default

samtools-0.1.16
SDBP is offline   Reply With Quote
Old 06-14-2011, 11:38 AM   #10
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

The perl code is ...

if(scalar(@t) < EXPORT_SIZE) {
my $msg="\nERROR: Unexpected number of fields in export record on line $line_no of read$read_no export file. Found " . scalar(@t) . " fields but expected " . EXPORT_SIZE . ".\n";
$msg.="\t...erroneous export record:\n" . $line . "\n\n";
die($msg);

EXPORT_SIZE is 22 ( EXPORT_SIZE => 22 )

It's complaining that line 285 has only 21 fields.

What are on lines 284 and 285 ?
Richard Finney is offline   Reply With Quote
Old 06-14-2011, 11:58 AM   #11
SDBP
Member
 
Location: USA

Join Date: Jan 2011
Posts: 12
Default

Line 284:

ABC-DE2 1 4 1 3 119 0 1 GAGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC N


Line 285:
ABC-DE2 1 4 1 3 1347 0 1 TTTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN fa_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC N


I see that there is the extra 'fa' in line 285.....
can I try deleting it?
will deleting it work?
SDBP is offline   Reply With Quote
Old 06-14-2011, 12:10 PM   #12
SDBP
Member
 
Location: USA

Join Date: Jan 2011
Posts: 12
Default

Sorry, the above was from the file where I did not remove the .fa

Below is from the file which I am working on:

ABC-DE2 1 4 1 3 1347 0 1 TTTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC
SDBP is offline   Reply With Quote
Old 06-14-2011, 12:19 PM   #13
SDBP
Member
 
Location: USA

Join Date: Jan 2011
Posts: 12
Default

On another line I see :

ABC-DE2 1 4 1 3 1978 0 1 CAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN _]_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC

This way I will have to go through the whole file?
SDBP is offline   Reply With Quote
Old 06-14-2011, 12:21 PM   #14
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Hmmmm.

You might be messing up with the sed command:

sed s/.fa//

that's saying change "anychar+f+a" to nothing.

"f" and "a" appear to be legitimate GERALD (or whatever, "export") quality value, so they'll get unintentionally changed to null , as well as the intended strings likes "chr1.fa" --> "chr1"

Glance at the input file for legitimate quality values (the field after the sequence field)

In sed language , putting a backslash before dot (i.e. \. ) means "period" to distinguish from the sole dot (i.e. .) which means "any character".

Last edited by Richard Finney; 06-14-2011 at 12:24 PM.
Richard Finney is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:17 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO