SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sam to bam conversion error, no @SQ lines in the header, missing header? efoss Bioinformatics 17 12-03-2015 04:28 AM
Can GSNAP generate SAM/BAM files? efoss Bioinformatics 4 10-16-2011 08:11 PM
.SAM to .BAM with SAM file header @PG emilyjia2000 Bioinformatics 13 06-14-2011 12:21 PM
problem with gsnap sam output lindseyjane Bioinformatics 2 11-04-2010 04:27 AM
sam/bam header lines keebs42 Bioinformatics 1 08-21-2009 11:25 AM

Reply
 
Thread Tools
Old 10-17-2014, 01:43 AM   #1
JadeB
Junior Member
 
Location: Toulouse, France

Join Date: Oct 2014
Posts: 5
Default GSNAP and SAM header

Dear all,

I'm currently trying to align reads on a reference sequence with gsnap with this code :

Quote:
gsnap --gunzip -D ./ref/cristata -d cristata ./data/cristata/Pig-GC01_AGTTCC_L005_R1.fastq.gz ./data/cristata/Pig-GC01_AGTTCC_L005_R2.fastq.gz -t 8 -B 4 -A sam --nofails --clip-overlap --split-output GC01
I'm working with very short reads, it's why i'm using the clip-overlap command.

Work is finishing without error message, but my sam header is not complete, so I can't use my results files.

For example :

Quote:
head ./GC01.concordant_uniq

@SQ SN:GC01-(1934) LN:17090
HWI-ST314:2581WBYACXX:5:1101:3036:2447 99 GC01-(1934) 11568 40 58M42H = 11626 116 GACCCTTCCTACCCACATCACGTCCATCCAAAATTCAACCACACGAGAGCACCTCCTT CCCFFFFFHHHHHJJJJJJIJJHIIIGJJJJJIJJJJJJJIJJJGJJJJJJJJJJJIJ MD:Z:58 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
[...]
I only have the @SQ line, and no @HD line in the header.

Do you see something I missed ?

Thanks a lot in advance for your help !!

Jade
JadeB is offline   Reply With Quote
Old 10-17-2014, 02:00 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You can just add it when converting to BAM. Just make a file with the @HD line and then
Code:
cat HD.txt file.sam | samtools view -Sbo file.bam -
dpryan is offline   Reply With Quote
Old 10-17-2014, 02:02 AM   #3
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

The @HD line is just the header and not mandatory for any downstream applications (as far as I know). BWA is also not creating it. If you need it, just add it like dpryan suggested, if not, just leave it.
WhatsOEver is offline   Reply With Quote
Old 10-17-2014, 02:09 AM   #4
JadeB
Junior Member
 
Location: Toulouse, France

Join Date: Oct 2014
Posts: 5
Default

Ok. Thanks a lot for your answers.

If I don't need it, it's ok. But I thought my next error message was due to this lack.

In fact, when I do this :

Quote:
for file in ./data/cristata/GC01.*.sam
do
samtools view -Sb $file > $file.bam
done
or this :

Quote:
for file in ./data/cristata/GC01.*.sam
do
samtools view -Sb -T ./ref/cristata/cristata.fasta $file > $file.bam
done
My job fails, with such a message :

Quote:
[samopen] SAM header is present: 1 sequences.
[sam_read1] reference 'SN:GC01-(1934) LN:17090
' is recognized as '*'.
[main_samview] truncated file.
[...]
But it doesn't seem to be truncated :

Quote:
tail -n 3 GC01.concordant_uniq

HWI-ST314:2581WBYACXX:5:2316:14327:100908 163 GC01-(1934) 3724 40 52M48H = 3776 105 ACCAACTTATACACCTTCTCTGAAAGAGCTTCTTGCCACTAACACTGGCACT @C@FFFFFHGHHHJJGIIIIJJGHIJJJIJJJIJGIJIFHIICGIIIII)?D MD:Z:49C2 NH:i:1 HI:i:1 NM:i:1 SM:i:40 XQ:i:40 X2:i:0
HWI-ST314:2581WBYACXX:5:2316:17260:100910 83 GC01-(1934) 2574 40 44H56M = 2518 -112 GTTCGTTTGTTCAACGATTAATAGTCCTACGTGATCTGAGTTCAGACCGGAGTAAT IJJGJJJJJJJJJIHGDFDIIIJJJJJJIIHHHIGJIJIJIGIFHHGHFDFFFCCC MD:Z:56 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST314:2581WBYACXX:5:2316:17260:100910 163 GC01-(1934) 2518 40 56M44H = 2574 112 GGTTTACGACCTCGATGTTGGATCAGGACATCCTAATGGTGCAGCAGCTGTTAAGG @CBFFFFFHHHHHJGGIHJJJEHHIIJGIIIJJJIGJJIDGGHIGGHII3=EFGHI MD:Z:49A6 NH:i:1 HI:i:1 NM:i:1 SM:i:40 XQ:i:40 X2:i:0
Any ideas ?
JadeB is offline   Reply With Quote
Old 10-17-2014, 02:15 AM   #5
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

What samtools version are you using?
Was there something like "missing EOF marker" in the "error message"?

This is a quite common bug in version 0.1.19 (I think it has been resolved in the newer ones?!). It's actually just a warning and means nothing if you know that your input file was not truncated
WhatsOEver is offline   Reply With Quote
Old 10-17-2014, 02:23 AM   #6
JadeB
Junior Member
 
Location: Toulouse, France

Join Date: Oct 2014
Posts: 5
Default

I'm actually using the 0.1.19-44428cd one.

But, I don't have something like "missing EOF marker", only lines as above (for files which are not empty), and "Your job has been killed."

I have bam files finally, but I'm not sure I can use them...
JadeB is offline   Reply With Quote
Old 10-17-2014, 03:02 AM   #7
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Ok, maybe the following helps...

1) Is there a particular reason for using the "-T" parameter. I never use it and I actually don't what it is useful for. Although it don't think it will change anything, you could try the conversion without that.

2) Your "GC01.concordant_uniq" file is a sam file created from one of the bam files using samtools view -h file.sam > GC01.concordant_uniq, isn't it? A more accurate comparison would be to use "diff file1 file2". If the output is empty, your files are equal.

3) If (1) doesn't help and (2) doesn't give you enough confidence, upgrade samtools to version 1.1 and check if the warning remains.
WhatsOEver is offline   Reply With Quote
Old 10-17-2014, 03:12 AM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

@WhatsOEver: -T is useful when a file is completely missing a header. Otherwise, it doesn't do much.

@JadeB: The "[sam_read1] reference 'SN:GC01-(1934) LN:17090" means that samtools is seeing part of your header as a read. This is pretty unusual, but you're correct that this will only happen if it's not parsing the header correctly (internally, sam_hdr_read() moves the file pointer to just past the header). We're up to samtools 1.1 now, so go ahead and upgrade in any case, though I suspect there's something wrong with the header (maybe spaces rather than tabs) that will also prevent the newer versions from working. If 1.1 also doesn't work, post the first 100 or so lines (just "head -n 100 file.sam > file.txt") or so as a text file and I can have a look.
dpryan is offline   Reply With Quote
Old 10-17-2014, 03:20 AM   #9
JadeB
Junior Member
 
Location: Toulouse, France

Join Date: Oct 2014
Posts: 5
Default

Quote:
Originally Posted by WhatsOEver View Post
2) Your "GC01.concordant_uniq" file is a sam file created from one of the bam files using samtools view -h file.sam > GC01.concordant_uniq, isn't it? A more accurate comparison would be to use "diff file1 file2". If the output is empty, your files are equal.
No, my sam files are gsnap output, not made with samtools view, then I'm trying to have bam files...

I'm currently trying an other thing : error could maybe be linked with different names between my fasta file and inside my fasta file, in the header. I homogenize everything, and if it doesn't change anything, I'll ask for an update of samtools version to the bioinformatics server I use...

Thanks again !
JadeB is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO