SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to convert sorted.txt files from the Illumina pipeline v1.3.4 to bam or sam? crazyhottommy Bioinformatics 7 04-20-2015 06:54 AM
Cufflinks refuses to operate on Tophat2 created bam or sam files due to sorting error amrezans Bioinformatics 1 06-24-2013 12:54 PM
convert base call files (*.bcl) into files (*_qseq.txt) giampe Bioinformatics 12 10-20-2011 08:45 AM
NEw to Chip-seq and have .bam/.sam/.bam.bai files... then what? NGS newbie Bioinformatics 11 05-25-2011 07:48 AM
Convert from baf to bam files Mercutio Bioinformatics 2 12-07-2010 10:45 AM

Reply
 
Thread Tools
Old 03-11-2014, 06:33 AM   #1
awayihaha
Junior Member
 
Location: oxford

Join Date: Mar 2014
Posts: 7
Default sam files convert to bam files error

hi all,

when I use samtools to get bam file from sam file? I met the following problems:
samtools view -h -F 4 -q 1 -bS C.filsa.sam >C.filsa.bam
[samopen] SAM header is present: 7 sequences.
[sam_read1] reference 'SR' is recognized as '*'.
[main_samview] truncated file.

I also met "missing colon in auxiliary data " and "CIGAR and sequence length are inconsistent" in individual rows. My sam files came from the results of gsnap. I am not sure these problem caused by gsnap or samtools. how can i deal with them?

Any suggestions and answers are appreciated. thank you.
awayihaha is offline   Reply With Quote
Old 03-11-2014, 06:41 AM   #2
awayihaha
Junior Member
 
Location: oxford

Join Date: Mar 2014
Posts: 7
Default

The following is my sam sample. I don't understand where is the reference 'SR'?
SRR019035.130 16 Chr5 9804788 40 36M * 0 0 CAGCCTCAAACGGCGCCGTCTTATACGGTGAGTTAC IIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1 NM:i:0
SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.131 16 Chr1 753661 40 30M * 0 0 TGAAGATATTGAACCTCTCCGTTAGGGAAC IIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:30 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40
X2:i:0 XO:Z:UU PG:Z:A
SRR019035.132 16 Chr3 7844307 40 36M * 0 0 ATGCTGGTAATTCACGAGCTTGATGAAACATTTCAC I3IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1 NM:i:0
SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.133 0 Chr1 28835502 40 36M * 0 0 GTTTTAGTTTCGTCTGCAACTGAGTCATCACCTACT IIIIIIIIIIIIIIIIIIIIIIDIIIIIIDIII-II MD:Z:36 NH:i:1 HI:i:1
NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.134 0 Chr1 28836313 40 36M * 0 0 GAAAATTTCAGGTCTGGTTCAGAATTGGTTCCGAAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII7II MD:Z:36 NH:i:1 HI:i:1
NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.135 0 Chr5 22542176 40 25M * 0 0 CGTGGTTCTAGGACATCATCTGATA IIIIIIIIIIIIIIIIIIIIIIIII MD:Z:25 NH:i:1 HI:i:1 NM:i:0 SM:i:40
XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.136 0 ChrC 100327 3 36M * 0 0 GAATAAAGGATTAATCCGTATCATCTTGACTTGGTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:2 HI:i:1 NM:i:0
SM:i:3 XQ:i:40 X2:i:40 XO:Z:UM PG:Z:A
SRR019035.136 272 ChrC 138287 3 36M * 0 0 AACCAAGTCAAGATGATACGGATTAATCCTTTATTC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:2 HI:i:2 NM:i:0
SM:i:3 XQ:i:40 X2:i:40 XO:Z:UM PG:Z:A
SRR019035.137 16 Chr1 28835623 40 36M * 0 0 TATTTTCGTCGTCTCTAGAGTTTGAAGCATCAGTCC IIBI61IIIIIHIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1
NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.138 16 Chr5 19304066 40 36M * 0 0 ATCAATGATATGTTTAAGCAAGACGACTCTTTCAGC IIIII?IIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1
NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.139 0 Chr4 162871 40 26M * 0 0 TGATTTCGTTGTGCTATGTAAACTTT IIIIIIIIIIIIIIIIIIII1IIIII MD:Z:26 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40
X2:i:0 XO:Z:UU PG:Z:A
awayihaha is offline   Reply With Quote
Old 03-11-2014, 07:17 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The SR... stuff is just the name of the read, which I see you downloaded from SRA (or ENA). Out of curiousity, what happens if you just:

Code:
samtools view -F 0x4 -q 1 -Sbo C.filsa.bam C.filsa.sam
I wonder if giving the -h option is just screwing things up (it shouldn't do anything when you write a BAM file).
dpryan is offline   Reply With Quote
Old 03-11-2014, 07:51 AM   #4
awayihaha
Junior Member
 
Location: oxford

Join Date: Mar 2014
Posts: 7
Default

Thanks dpryan.
I try your code, but "reference 'SR' is recognized as '*'. still occurred. my SRA data download from http://www.ncbi.nlm.nih.gov/sra/?term=SRR019035。
awayihaha is offline   Reply With Quote
Old 03-11-2014, 07:55 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If the first 1000 lines or so are sufficient to reproduce this, could you attach that (you have to edit in "advanced" mode and click on the paperclip)? That'd provide a reproducible example. To get the first 1000 (or whatever) lines, just:

Code:
head -n 1000 file.sam > excerpt.txt
dpryan is offline   Reply With Quote
Old 03-11-2014, 08:46 AM   #6
awayihaha
Junior Member
 
Location: oxford

Join Date: Mar 2014
Posts: 7
Default

I try the first 1000 raws, It's no problem. So I attach the first 500 raws and the tail 500 raws for you. but I am not sure the problems will appear.

Every time, when I deal with large sam files, only very few lines has some problems such as 'missing colon in auxiliary data' or 'CIGAR and sequence length are inconsistent', but these two problem always illustrate the specific lines and I could found the problems. Only 'reference *** is recognized as '*,I couldn't found which lines have problems?

because my sam files are got from gsnap alignment. So I am confused the problems are caused from the gsnap or samtools? if they are caused by gsnap, 99% data is OK. how can I avoid these problem and filter these low quality data in advance.
Attached Files
File Type: zip excerpt.zip (14.7 KB, 2 views)
awayihaha is offline   Reply With Quote
Old 03-11-2014, 08:58 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

That doesn't seem to reproduce the problem either. It's very likely that the problem is with gsnap, which apparently is producing corrupt output on occasion. You might consider upgrading if that's an option or report the issue to the developer.
dpryan is offline   Reply With Quote
Old 03-11-2014, 09:03 AM   #8
awayihaha
Junior Member
 
Location: oxford

Join Date: Mar 2014
Posts: 7
Default

Thank you for your good advise, It indeed help me.
awayihaha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO