Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard's SamFormatConverter incorrectly complains about a non-numerical pos column

    In trying to convert a sam file to a bam file using Picard's SamFormatConverter, I get the following error:

    Code:
    [Tue Dec 18 12:49:04 PST 2012] net.sf.picard.sam.SamFormatConverter INPUT=AML_PAERAH_73407_RNA.sam OUTPUT=AML_PAERAH_73407_RNA.bam VALIDATION_STRINGENCY=LENIENT    VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Tue Dec 18 12:49:04 PST 2012] Executing as efoss@gizmod60 on Linux 3.2.0-24-generic amd64; OpenJDK 64-Bit Server VM 1.6.0_24-b24
    Ignoring SAM validation error: ERROR: Read name HWI-ST156_294:2:46:13125:2318:0, CIGAR M operator maps off end of reference
    Ignoring SAM validation error: ERROR: Read name HWI-ST156_294:2:5:11648:3249:0, CIGAR M operator maps off end of reference
    [Tue Dec 18 12:49:13 PST 2012] net.sf.picard.sam.SamFormatConverter done. Elapsed time: 0.14 minutes.
    Runtime.totalMemory()=696385536
    Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Non-numeric value in POS column; Line 539389
    Line: HWI-ST156_294:2:21:6547:4182:0    99      MT      4294967264      39      51M     =       93      -176    CCACACGTTCCCCTTAAATAAGACATCACGATGGATCACAGGTCTATCACC      HHHHHHHHHHHHHHHHHFHHHHEHHGHHGHGEEFGHHDHHBGDFGDFGFFH     MD:Z:51 NH:i:1  HI:i:1  NM:i:0  SM:i:39 XQ:i:40 X2:i:0
    The pos column is the fourth column, and you can see that it contains 4294967264. So why is this counted as not numeric?

    Thanks.

    Eric

  • #2
    Yeah, it reads 4294967264. That's 4,294,967,264, or 4 billion.

    Did you really submit a chromosome that large?

    I think your real error is farther up the output:

    Ignoring SAM validation error: ERROR: Read name HWI-ST156_294:2:46:13125:2318:0, CIGAR M operator maps off end of reference

    Comment


    • #3
      Hi swbarnes2,

      Thanks very much for the suggestion. No, I didn't submit a chromosome that long. I aligned with gsnap, and the contents of the .chromosome file created by gsnap are pasted below. Do you have any thoughts on how this can happen?

      Thanks again.

      Eric

      Code:
      1       1..249250621    249250621
      2       249250622..492449994    243199373
      3       492449995..690472424    198022430
      4       690472425..881626700    191154276
      5       881626701..1062541960   180915260
      6       1062541961..1233657027  171115067
      7       1233657028..1392795690  159138663
      8       1392795691..1539159712  146364022
      9       1539159713..1680373143  141213431
      10      1680373144..1815907890  135534747
      11      1815907891..1950914406  135006516
      12      1950914407..2084766301  133851895
      13      2084766302..2199936179  115169878
      14      2199936180..2307285719  107349540
      15      2307285720..2409817111  102531392
      16      2409817112..2500171864  90354753
      17      2500171865..2581367074  81195210
      18      2581367075..2659444322  78077248
      19      2659444323..2718573305  59128983
      20      2718573306..2781598825  63025520
      21      2781598826..2829728720  48129895
      22      2829728721..2881033286  51304566
      X       2881033287..3036303846  155270560
      Y       3036303847..3095677412  59373566
      MT      3095677413..3095693981  16569
      GL000191.1      3095693982..3095800414  106433
      GL000192.1      3095800415..3096347910  547496
      GL000193.1      3096347911..3096537699  189789
      GL000194.1      3096537700..3096729168  191469
      GL000195.1      3096729169..3096912064  182896
      GL000196.1      3096912065..3096950978  38914
      GL000197.1      3096950979..3096988153  37175
      GL000198.1      3096988154..3097078238  90085
      GL000199.1      3097078239..3097248112  169874
      GL000200.1      3097248113..3097435147  187035
      GL000201.1      3097435148..3097471295  36148
      GL000202.1      3097471296..3097511398  40103
      GL000203.1      3097511399..3097548896  37498
      GL000204.1      3097548897..3097630206  81310
      GL000205.1      3097630207..3097804794  174588
      GL000206.1      3097804795..3097845795  41001
      GL000207.1      3097845796..3097850057  4262
      GL000208.1      3097850058..3097942746  92689
      GL000209.1      3097942747..3098101915  159169
      GL000210.1      3098101916..3098129597  27682
      GL000211.1      3098129598..3098296163  166566
      GL000212.1      3098296164..3098483021  186858
      GL000213.1      3098483022..3098647260  164239
      GL000214.1      3098647261..3098784978  137718
      GL000215.1      3098784979..3098957523  172545
      GL000216.1      3098957524..3099129817  172294
      GL000217.1      3099129818..3099301966  172149
      GL000218.1      3099301967..3099463113  161147
      GL000219.1      3099463114..3099642311  179198
      GL000220.1      3099642312..3099804113  161802
      GL000221.1      3099804114..3099959510  155397
      GL000222.1      3099959511..3100146371  186861
      GL000223.1      3100146372..3100326826  180455
      GL000224.1      3100326827..3100506519  179693
      GL000225.1      3100506520..3100717692  211173
      GL000226.1      3100717693..3100732700  15008
      GL000227.1      3100732701..3100861074  128374
      GL000228.1      3100861075..3100990194  129120
      GL000229.1      3100990195..3101010107  19913
      GL000230.1      3101010108..3101053798  43691
      GL000231.1      3101053799..3101081184  27386
      GL000232.1      3101081185..3101121836  40652
      GL000233.1      3101121837..3101167777  45941
      GL000234.1      3101167778..3101208308  40531
      GL000235.1      3101208309..3101242782  34474
      GL000236.1      3101242783..3101284716  41934
      GL000237.1      3101284717..3101330583  45867
      GL000238.1      3101330584..3101370522  39939
      GL000239.1      3101370523..3101404346  33824
      GL000240.1      3101404347..3101446279  41933
      GL000241.1      3101446280..3101488431  42152
      GL000242.1      3101488432..3101531954  43523
      GL000243.1      3101531955..3101575295  43341
      GL000244.1      3101575296..3101615224  39929
      GL000245.1      3101615225..3101651875  36651
      GL000246.1      3101651876..3101690029  38154
      GL000247.1      3101690030..3101726451  36422
      GL000248.1      3101726452..3101766237  39786
      GL000249.1      3101766238..3101804739  38502

      Comment


      • #4
        For what it's worth, here are the two lines that match that read name:

        Code:
        HWI-ST156_294:2:46:13125:2318:0 99      MT      16336   39      51M     =       16539   254     GCACATTACAGTCAAATCCCTTCTCGTCCCCATGGATGACCCCCCTCAGAT      HHHHHGHHHHHHHHHHHH.HGGGGGHHHHHHHHHGHFHHFHHDHHDEHDFE     MD:Z:51 NH:i:1  HI:i:1  NM:i:0  SM:i:39 XQ:i:40 X2:i:0
        HWI-ST156_294:2:46:13125:2318:0 147     MT      16539   39      35M16S  =       16336   -254    ACACGTTCCCCTTAAATAAGACATCACGATGGATCACAGGTCTATCACCCT      CHE8EEFFEGGEDHHHFGGGBHHHEFHHHHHFHHGHHHFHGHHHHEEGGHG     MD:Z:35 NH:i:1  HI:i:1  NM:i:0  SM:i:39 XQ:i:40 X2:i:0

        Comment


        • #5
          To follow up on this: The 35M16S CIGAR tells me that the read is indeed hanging off the end. (My understanding of hard clipping versus soft clipping is that in both instances, some bases don't match the reference genome, but in hard clipping you remove those bases from the read whereas with soft clipping, you note that the bases don't match but leave them in the read.) So I can write a script to remove these offending lines, but that seems like kind of a clumsy approach.

          Eric

          Comment


          • #6
            That number is also the max unsigned integer - 1 so this smells like a Picard bug.

            Comment


            • #7
              Originally posted by aaronh View Post
              That number is also the max unsigned integer - 1 so this smells like a Picard bug.
              Hi aaronh,

              Yes, I think it is a Picard bug as well. I can create bam files from these sam files using samtools' view function just fine.

              Eric

              Comment


              • #8
                I found out what the problem is with the 4.3 billion POS column: There is a bug in the version of gsnap that I was using that can lead to this type of thing with circular chromosomes. (The chromosome in question was the mitochondrial chromosome.) This bug has been fixed in the current version of gsnap, which was released just a few days ago.

                Eric

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                29 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X