![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to identify incorrectly fused gene models | Hobbe | Bioinformatics | 1 | 08-01-2013 11:36 AM |
samtools picard SamFormatConverter | Bio.X2Y | Bioinformatics | 3 | 07-08-2013 08:46 AM |
How to get a numeric vector of POS? | yuchioj | Bioinformatics | 0 | 10-10-2012 02:33 PM |
PubMed: rNA: a fast and accurate short reads numerical aligner. | Newsbot! | Literature Watch | 0 | 08-28-2012 03:00 AM |
cufflinks probem: start pos of exons | cbil | Bioinformatics | 2 | 03-25-2011 02:20 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Seattle Join Date: Jul 2011
Posts: 98
|
![]()
In trying to convert a sam file to a bam file using Picard's SamFormatConverter, I get the following error:
Code:
[Tue Dec 18 12:49:04 PST 2012] net.sf.picard.sam.SamFormatConverter INPUT=AML_PAERAH_73407_RNA.sam OUTPUT=AML_PAERAH_73407_RNA.bam VALIDATION_STRINGENCY=LENIENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Tue Dec 18 12:49:04 PST 2012] Executing as efoss@gizmod60 on Linux 3.2.0-24-generic amd64; OpenJDK 64-Bit Server VM 1.6.0_24-b24 Ignoring SAM validation error: ERROR: Read name HWI-ST156_294:2:46:13125:2318:0, CIGAR M operator maps off end of reference Ignoring SAM validation error: ERROR: Read name HWI-ST156_294:2:5:11648:3249:0, CIGAR M operator maps off end of reference [Tue Dec 18 12:49:13 PST 2012] net.sf.picard.sam.SamFormatConverter done. Elapsed time: 0.14 minutes. Runtime.totalMemory()=696385536 Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Non-numeric value in POS column; Line 539389 Line: HWI-ST156_294:2:21:6547:4182:0 99 MT 4294967264 39 51M = 93 -176 CCACACGTTCCCCTTAAATAAGACATCACGATGGATCACAGGTCTATCACC HHHHHHHHHHHHHHHHHFHHHHEHHGHHGHGEEFGHHDHHBGDFGDFGFFH MD:Z:51 NH:i:1 HI:i:1 NM:i:0 SM:i:39 XQ:i:40 X2:i:0 Thanks. Eric |
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Yeah, it reads 4294967264. That's 4,294,967,264, or 4 billion.
Did you really submit a chromosome that large? I think your real error is farther up the output: Quote:
|
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Seattle Join Date: Jul 2011
Posts: 98
|
![]()
Hi swbarnes2,
Thanks very much for the suggestion. No, I didn't submit a chromosome that long. I aligned with gsnap, and the contents of the .chromosome file created by gsnap are pasted below. Do you have any thoughts on how this can happen? Thanks again. Eric Code:
1 1..249250621 249250621 2 249250622..492449994 243199373 3 492449995..690472424 198022430 4 690472425..881626700 191154276 5 881626701..1062541960 180915260 6 1062541961..1233657027 171115067 7 1233657028..1392795690 159138663 8 1392795691..1539159712 146364022 9 1539159713..1680373143 141213431 10 1680373144..1815907890 135534747 11 1815907891..1950914406 135006516 12 1950914407..2084766301 133851895 13 2084766302..2199936179 115169878 14 2199936180..2307285719 107349540 15 2307285720..2409817111 102531392 16 2409817112..2500171864 90354753 17 2500171865..2581367074 81195210 18 2581367075..2659444322 78077248 19 2659444323..2718573305 59128983 20 2718573306..2781598825 63025520 21 2781598826..2829728720 48129895 22 2829728721..2881033286 51304566 X 2881033287..3036303846 155270560 Y 3036303847..3095677412 59373566 MT 3095677413..3095693981 16569 GL000191.1 3095693982..3095800414 106433 GL000192.1 3095800415..3096347910 547496 GL000193.1 3096347911..3096537699 189789 GL000194.1 3096537700..3096729168 191469 GL000195.1 3096729169..3096912064 182896 GL000196.1 3096912065..3096950978 38914 GL000197.1 3096950979..3096988153 37175 GL000198.1 3096988154..3097078238 90085 GL000199.1 3097078239..3097248112 169874 GL000200.1 3097248113..3097435147 187035 GL000201.1 3097435148..3097471295 36148 GL000202.1 3097471296..3097511398 40103 GL000203.1 3097511399..3097548896 37498 GL000204.1 3097548897..3097630206 81310 GL000205.1 3097630207..3097804794 174588 GL000206.1 3097804795..3097845795 41001 GL000207.1 3097845796..3097850057 4262 GL000208.1 3097850058..3097942746 92689 GL000209.1 3097942747..3098101915 159169 GL000210.1 3098101916..3098129597 27682 GL000211.1 3098129598..3098296163 166566 GL000212.1 3098296164..3098483021 186858 GL000213.1 3098483022..3098647260 164239 GL000214.1 3098647261..3098784978 137718 GL000215.1 3098784979..3098957523 172545 GL000216.1 3098957524..3099129817 172294 GL000217.1 3099129818..3099301966 172149 GL000218.1 3099301967..3099463113 161147 GL000219.1 3099463114..3099642311 179198 GL000220.1 3099642312..3099804113 161802 GL000221.1 3099804114..3099959510 155397 GL000222.1 3099959511..3100146371 186861 GL000223.1 3100146372..3100326826 180455 GL000224.1 3100326827..3100506519 179693 GL000225.1 3100506520..3100717692 211173 GL000226.1 3100717693..3100732700 15008 GL000227.1 3100732701..3100861074 128374 GL000228.1 3100861075..3100990194 129120 GL000229.1 3100990195..3101010107 19913 GL000230.1 3101010108..3101053798 43691 GL000231.1 3101053799..3101081184 27386 GL000232.1 3101081185..3101121836 40652 GL000233.1 3101121837..3101167777 45941 GL000234.1 3101167778..3101208308 40531 GL000235.1 3101208309..3101242782 34474 GL000236.1 3101242783..3101284716 41934 GL000237.1 3101284717..3101330583 45867 GL000238.1 3101330584..3101370522 39939 GL000239.1 3101370523..3101404346 33824 GL000240.1 3101404347..3101446279 41933 GL000241.1 3101446280..3101488431 42152 GL000242.1 3101488432..3101531954 43523 GL000243.1 3101531955..3101575295 43341 GL000244.1 3101575296..3101615224 39929 GL000245.1 3101615225..3101651875 36651 GL000246.1 3101651876..3101690029 38154 GL000247.1 3101690030..3101726451 36422 GL000248.1 3101726452..3101766237 39786 GL000249.1 3101766238..3101804739 38502 |
![]() |
![]() |
![]() |
#4 |
Member
Location: Seattle Join Date: Jul 2011
Posts: 98
|
![]()
For what it's worth, here are the two lines that match that read name:
Code:
HWI-ST156_294:2:46:13125:2318:0 99 MT 16336 39 51M = 16539 254 GCACATTACAGTCAAATCCCTTCTCGTCCCCATGGATGACCCCCCTCAGAT HHHHHGHHHHHHHHHHHH.HGGGGGHHHHHHHHHGHFHHFHHDHHDEHDFE MD:Z:51 NH:i:1 HI:i:1 NM:i:0 SM:i:39 XQ:i:40 X2:i:0 HWI-ST156_294:2:46:13125:2318:0 147 MT 16539 39 35M16S = 16336 -254 ACACGTTCCCCTTAAATAAGACATCACGATGGATCACAGGTCTATCACCCT CHE8EEFFEGGEDHHHFGGGBHHHEFHHHHHFHHGHHHFHGHHHHEEGGHG MD:Z:35 NH:i:1 HI:i:1 NM:i:0 SM:i:39 XQ:i:40 X2:i:0 |
![]() |
![]() |
![]() |
#5 |
Member
Location: Seattle Join Date: Jul 2011
Posts: 98
|
![]()
To follow up on this: The 35M16S CIGAR tells me that the read is indeed hanging off the end. (My understanding of hard clipping versus soft clipping is that in both instances, some bases don't match the reference genome, but in hard clipping you remove those bases from the read whereas with soft clipping, you note that the bases don't match but leave them in the read.) So I can write a script to remove these offending lines, but that seems like kind of a clumsy approach.
Eric |
![]() |
![]() |
![]() |
#6 |
Member
Location: California Join Date: Sep 2008
Posts: 45
|
![]()
That number is also the max unsigned integer - 1 so this smells like a Picard bug.
|
![]() |
![]() |
![]() |
#7 |
Member
Location: Seattle Join Date: Jul 2011
Posts: 98
|
![]() |
![]() |
![]() |
![]() |
#8 |
Member
Location: Seattle Join Date: Jul 2011
Posts: 98
|
![]()
I found out what the problem is with the 4.3 billion POS column: There is a bug in the version of gsnap that I was using that can lead to this type of thing with circular chromosomes. (The chromosome in question was the mitochondrial chromosome.) This bug has been fixed in the current version of gsnap, which was released just a few days ago.
Eric |
![]() |
![]() |
![]() |
Thread Tools | |
|
|