Seqanswers Leaderboard Ad

**awayihaha** · 03-11-2014, 06:41 AM

The following is my sam sample. I don't understand where is the reference 'SR'?
SRR019035.130 16 Chr5 9804788 40 36M * 0 0 CAGCCTCAAACGGCGCCGTCTTATACGGTGAGTTAC IIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1 NM:i:0
SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.131 16 Chr1 753661 40 30M * 0 0 TGAAGATATTGAACCTCTCCGTTAGGGAAC IIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:30 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40
X2:i:0 XO:Z:UU PG:Z:A
SRR019035.132 16 Chr3 7844307 40 36M * 0 0 ATGCTGGTAATTCACGAGCTTGATGAAACATTTCAC I3IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1 NM:i:0
SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.133 0 Chr1 28835502 40 36M * 0 0 GTTTTAGTTTCGTCTGCAACTGAGTCATCACCTACT IIIIIIIIIIIIIIIIIIIIIIDIIIIIIDIII-II MD:Z:36 NH:i:1 HI:i:1
NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.134 0 Chr1 28836313 40 36M * 0 0 GAAAATTTCAGGTCTGGTTCAGAATTGGTTCCGAAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII7II MD:Z:36 NH:i:1 HI:i:1
NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.135 0 Chr5 22542176 40 25M * 0 0 CGTGGTTCTAGGACATCATCTGATA IIIIIIIIIIIIIIIIIIIIIIIII MD:Z:25 NH:i:1 HI:i:1 NM:i:0 SM:i:40
XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.136 0 ChrC 100327 3 36M * 0 0 GAATAAAGGATTAATCCGTATCATCTTGACTTGGTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:2 HI:i:1 NM:i:0
SM:i:3 XQ:i:40 X2:i:40 XO:Z:UM PG:Z:A
SRR019035.136 272 ChrC 138287 3 36M * 0 0 AACCAAGTCAAGATGATACGGATTAATCCTTTATTC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:2 HI:i:2 NM:i:0
SM:i:3 XQ:i:40 X2:i:40 XO:Z:UM PG:Z:A
SRR019035.137 16 Chr1 28835623 40 36M * 0 0 TATTTTCGTCGTCTCTAGAGTTTGAAGCATCAGTCC IIBI61IIIIIHIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1
NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.138 16 Chr5 19304066 40 36M * 0 0 ATCAATGATATGTTTAAGCAAGACGACTCTTTCAGC IIIII?IIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1
NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
SRR019035.139 0 Chr4 162871 40 26M * 0 0 TGATTTCGTTGTGCTATGTAAACTTT IIIIIIIIIIIIIIIIIIII1IIIII MD:Z:26 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40
X2:i:0 XO:Z:UU PG:Z:A

**dpryan** · 03-11-2014, 07:17 AM

The SR... stuff is just the name of the read, which I see you downloaded from SRA (or ENA). Out of curiousity, what happens if you just:

Code:

samtools view -F 0x4 -q 1 -Sbo C.filsa.bam C.filsa.sam

I wonder if giving the -h option is just screwing things up (it shouldn't do anything when you write a BAM file).

**awayihaha** · 03-11-2014, 07:51 AM

Thanks dpryan.
I try your code, but "reference 'SR' is recognized as '*'.” still occurred. my SRA data download from http://www.ncbi.nlm.nih.gov/sra/?term=SRR019035。

**dpryan** · 03-11-2014, 07:55 AM

If the first 1000 lines or so are sufficient to reproduce this, could you attach that (you have to edit in "advanced" mode and click on the paperclip)? That'd provide a reproducible example. To get the first 1000 (or whatever) lines, just:

Code:

head -n 1000 file.sam > excerpt.txt

**awayihaha** · 03-11-2014, 08:46 AM

I try the first 1000 raws, It's no problem. So I attach the first 500 raws and the tail 500 raws for you. but I am not sure the problems will appear.

Every time, when I deal with large sam files, only very few lines has some problems such as 'missing colon in auxiliary data' or 'CIGAR and sequence length are inconsistent', but these two problem always illustrate the specific lines and I could found the problems. Only 'reference *** is recognized as '*‘’，I couldn't found which lines have problems?

because my sam files are got from gsnap alignment. So I am confused the problems are caused from the gsnap or samtools? if they are caused by gsnap, 99% data is OK. how can I avoid these problem and filter these low quality data in advance.

Attached Files

excerpt.zip (14.7 KB, 27 views)

**dpryan** · 03-11-2014, 08:58 AM

That doesn't seem to reproduce the problem either. It's very likely that the problem is with gsnap, which apparently is producing corrupt output on occasion. You might consider upgrading if that's an option or report the issue to the developer.

**awayihaha** · 03-11-2014, 09:03 AM

Thank you for your good advise, It indeed help me.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

sam files convert to bam files error

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News