Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SRMA Error

    Hello everyone, I'm attempting to use the SRMA tool and am running into a problem. I've followed the instructions in the user guide but can't seem to get past this error, "SAMRecord contig does not match the current reference sequence contig". Here's the command and full error message. Thanks for the help.

    -David Murdock

    java -Xmx2g -jar /users/bainbrid/projects/NimblegenCapturePipeline/projects/SRMA/srma-0.1.7/srma-0.1.7.jar I=NS.1.dupesmarked.bam O=NS.1.realign.bam R=/users/bainbrid/projects/NimblegenCapturePipeline/bucket/human.build36.fa


    [Thu Sep 02 21:14:18 CDT 2010] srma.SRMA REFERENCE=/users/bainbrid/projects/NimblegenCapturePipeline/bucket/human.build36.fa OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_ALLELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 CORRECT_BASES=false USE_SEQUENCE_QUALITIES=true QUIET_STDERR=false MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 NUM_THREADS=1 TMP_DIR=/tmp/dm147882 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000
    Allele coverage cutoffs:
    coverage: 1 minimum allele coverage: 0
    coverage: 2 minimum allele coverage: 0
    coverage: 3 minimum allele coverage: 0
    coverage: 4 minimum allele coverage: 1
    coverage: 5 minimum allele coverage: 1
    coverage: 6 minimum allele coverage: 1
    coverage: 7 minimum allele coverage: 2
    coverage: 8 minimum allele coverage: 2
    coverage: 9 minimum allele coverage: 3
    coverage: >9 minimum allele coverage: 3
    java.lang.Exception: SAMRecord contig does not match the current reference sequence contig
    at srma.Graph.addSAMRecord(Graph.java:49)
    at srma.SRMA$GraphThread.run(SRMA.java:596)
    Please report bugs to [email protected]

  • #2
    Could you post the reference sequence contig names, as well as the SAM header? There may be a mismatch of names going on in your input files. Once I rule that out, I can start debugging SRMA. Thank-you for your patience.

    Comment


    • #3
      Nils, here's the sam header:

      @HD VN:1.0 GO:none SO:coordinate
      @SQ SN:chr1 LN:247249719
      @SQ SN:chr2 LN:242951149
      @SQ SN:chr3 LN:199501827
      @SQ SN:chr4 LN:191273063
      @SQ SN:chr5 LN:180857866
      @SQ SN:chr6 LN:170899992
      @SQ SN:chr7 LN:158821424
      @SQ SN:chr8 LN:146274826
      @SQ SN:chr9 LN:140273252
      @SQ SN:chr10 LN:135374737
      @SQ SN:chr11 LN:134452384
      @SQ SN:chr12 LN:132349534
      @SQ SN:chr13 LN:114142980
      @SQ SN:chr14 LN:106368585
      @SQ SN:chr15 LN:100338915
      @SQ SN:chr16 LN:88827254
      @SQ SN:chr17 LN:78774742
      @SQ SN:chr18 LN:76117153
      @SQ SN:chr19 LN:63811651
      @SQ SN:chr20 LN:62435964
      @SQ SN:chr21 LN:46944323
      @SQ SN:chr22 LN:49691432
      @SQ SN:chrX LN:154913754
      @SQ SN:chrY LN:57772954
      @SQ SN:chrM LN:16571
      @PG ID:bfast VN:0.6.4d

      And here's the reference contig names:
      >chr10
      >chr11
      >chr12
      >chr13
      >chr14
      >chr15
      >chr16
      >chr17
      >chr18
      >chr19
      >chr1
      >chr20
      >chr21
      >chr22
      >chr2
      >chr3
      >chr4
      >chr5
      >chr6
      >chr7
      >chr8
      >chr9
      >chrM
      >chrX
      >chrY

      The only thing I see is that the ref's contigs aren't sorted. Could this be the problem? Thanks.
      -David

      Comment


      • #4
        Originally posted by dmurdock View Post
        The only thing I see is that the ref's contigs aren't sorted. Could this be the problem? Thanks.
        -David
        That's it. The reference should be in the same order as the SAM header (not sure why it isn't?).

        Comment


        • #5
          Thanks Nils. I'll make the change to the ref and let you know how it goes.
          David

          Comment


          • #6
            Thanks for the suggestion, it worked after sorting the reference! I'm now having an issue when using the RANGES option with a file containing different regions to realign. I have generated such a file but srma seems to exit when the chromosome changes. It runs successfully without generating an error but it just doesn't go beyond the first chr listed. If I make a separate file for each chr then they run fine. It's just when they're together. Here's the region file where it stops in bold.

            chr1 1889866 1890066
            chr1 12561395 12561595
            chr1 34999494 34999694
            chr1 43681831 43682031
            chr1 74810345 74810545
            chr1 74810352 74810552
            chr1 89245929 89246129
            chr1 143585219 143585419
            chr1 144037256 144037456
            chr1 150462252 150462452
            chr1 156418029 156418229
            chr1 169823406 169823606
            chr1 227839825 227840025
            chr1 232667941 232668141
            chr2 15481809 15482009
            chr2 24240582 24240782
            chr2 26330529 26330729
            chr2 38054620 38054820
            chr2 73528635 73528835
            chr2 95210667 95210867

            And here's the output:
            Allele coverage cutoffs:
            coverage: 1 minimum allele coverage: 0
            coverage: 2 minimum allele coverage: 0
            coverage: 3 minimum allele coverage: 0
            coverage: 4 minimum allele coverage: 1
            coverage: 5 minimum allele coverage: 1
            coverage: 6 minimum allele coverage: 1
            coverage: 7 minimum allele coverage: 2
            coverage: 8 minimum allele coverage: 2
            coverage: 9 minimum allele coverage: 3
            coverage: >9 minimum allele coverage: 3
            ^MRecords processsed: 265 (last chr1:1890066-1890115)^MRecords processsed: 265 (last chr1:1890066-1890115)^MRecords processsed: 328 (last chr1:12561570-12561619)^MRecords processsed: 328 (last chr1:12561570-12561619)^MRecords processsed: 426 (last chr1:34999694-34999743)^MRecords processsed: 426 (last chr1:34999694-34999743)^MRecords processsed: 506 (last chr1:43682031-43682080)^MRecords processsed: 506 (last chr1:43682031-43682080)^MRecords processsed: 644 (last chr1:74810552-74810601)^MRecords processsed: 644 (last chr1:74810552-74810601)^MRecords processsed: 888 (last chr1:89246127-89246176)^MRecords processsed: 888 (last chr1:89246127-89246176)^MRecords processsed: 915 (last chr1:143585419-143585468)^MRecords processsed: 915 (last chr1:143585419-143585468)^MRecords processsed: 923 (last chr1:144037447-144037496)^MRecords processsed: 923 (last chr1:144037447-144037496)^MRecords processsed: 1022 (last chr1:150462408-150462457)^MRecords processsed: 1022 (last chr1:150462408-150462457)^MRecords processsed: 1055 (last chr1:156418200-156418249)^MRecords processsed: 1055 (last chr1:156418200-156418249)^MRecords processsed: 1109 (last chr1:169823567-169823616)^MRecords processsed: 1109 (last chr1:169823567-169823616)^MRecords processsed: 1210 (last chr1:227840018-227840067)^MRecords processsed: 1210 (last chr1:227840018-227840067)^MRecords processsed: 1257 (last chr1:232668141-232668190)^M^MRecords processsed: 1257 (last chr1:232668141-232668190)
            SRMA complete
            Total memory usage: 249MB
            Total execution time: 0h : 1m : 7s

            [Tue Sep 07 11:42:05 CDT 2010] srma.SRMA REFERENCE=/users/bainbrid/projects/NimblegenCapturePipeline/projects/mendelianDisease/ref/hsap_36.1_hg18.fa RANGES=realign.coords OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_ALLELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 CORRECT_BASES=false USE_SEQUENCE_QUALITIES=true QUIET_STDERR=false MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 NUM_THREADS=1 TMP_DIR=/tmp/dm147882 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000

            I'm wondering if it's some sort of sorting issue but I can't seem to figure it out. Thanks!
            -David

            Comment


            • #7
              I think you have found a bug in Picard (I just sent an email to their developers mailing list). Picard seems to want the index of a "aln.bam" file names "aln.bai", whereas samtools produces them with the name "aln.bam.bai". Like I said, I have initiated a discussion with the Picard developers. A quick hack would be to create a symbolic link ("ln -s aln.bam.bai aln.bai"). Let me know if that doesn't work for you.

              Comment


              • #8
                Originally posted by nilshomer View Post
                I think you have found a bug in Picard (I just sent an email to their developers mailing list). Picard seems to want the index of a "aln.bam" file names "aln.bai", whereas samtools produces them with the name "aln.bam.bai". Like I said, I have initiated a discussion with the Picard developers. A quick hack would be to create a symbolic link ("ln -s aln.bam.bai aln.bai"). Let me know if that doesn't work for you.
                My criticism of Picard is unfounded, it works with the latest GIT/SVN repositories. Can you try the latest SRMA GIT version? Also, could you check that there are reads mapped to chromosome 2 and maybe a "RANGES" file with two ranges on either chromosome?

                Comment


                • #9
                  I installed srma-0.1.8.jar and unfortunately i'm still having the same problem.
                  I've found that if the ranges file is sorted by chr and coordinate then srma will run on multiple chromosomes. However if a region's coordinates are not after the previous region's (regardless of which chr it came from) it will not realign it. Thus the following will work completely:

                  chr1 1889766 1890166
                  chr1 12561295 12561695
                  chr2 15481709 15482109
                  chr2 24240482 24240882
                  chr3 49338000 49338400
                  chr3 49544055 49544455

                  But the following will stop at the last chr 2 region:

                  chr1 1889766 1890166
                  chr1 12561295 12561695
                  chr2 15481709 15482109
                  chr2 24240482 24240882
                  chr2 95210567 95210967
                  chr3 49338000 49338400
                  chr3 49544055 49544455

                  It won't throw an error but just doesn't include the latter regions in the output file. Any thoughts?
                  -David

                  Comment


                  • #10
                    The solution for now is to run one "RANGE" command per region.

                    Originally posted by dmurdock View Post
                    I've found that if the ranges file is sorted by chr and coordinate then srma will run on multiple chromosomes. However if a region's coordinates are not after the previous region's (regardless of which chr it came from) it will not realign it. Thus the following will work completely:

                    chr1 1889766 1890166
                    chr1 12561295 12561695
                    chr2 15481709 15482109
                    chr2 24240482 24240882
                    chr3 49338000 49338400
                    chr3 49544055 49544455

                    But the following will stop at the last chr 2 region:

                    chr1 1889766 1890166
                    chr1 12561295 12561695
                    chr2 15481709 15482109
                    chr2 24240482 24240882
                    chr2 95210567 95210967
                    chr3 49338000 49338400
                    chr3 49544055 49544455

                    It won't throw an error but just doesn't include the latter regions in the output file. Any thoughts?
                    -David
                    I see it now, I will see what I can do.
                    Last edited by nilshomer; 09-08-2010, 12:47 PM. Reason: Enlightenment

                    Comment


                    • #11
                      Thanks, I'll do that!
                      -David

                      Comment


                      • #12
                        I was able to reproduce the bug so it should be fixed now. Once you confirm I will package up a new release. Thank-you for your patience!

                        Comment


                        • #13
                          It works great! I was able to realign ~ 150 small regions across the whole genome. Thanks for your help.
                          -David

                          Comment


                          • #14
                            No problem. Bugs/features would be fixed/added without users like you. Having it open source makes it easier to fix and release.

                            Comment


                            • #15
                              Originally posted by dmurdock View Post
                              chr1 1889866 1890066
                              chr1 12561395 12561595
                              chr1 34999494 34999694
                              chr1 43681831 43682031
                              chr1 74810345 74810545
                              chr1 74810352 74810552
                              chr1 89245929 89246129
                              chr1 143585219 143585419
                              chr1 144037256 144037456
                              chr1 150462252 150462452
                              chr1 156418029 156418229
                              chr1 169823406 169823606
                              chr1 227839825 227840025
                              chr1 232667941 232668141
                              chr2 15481809 15482009
                              chr2 24240582 24240782
                              chr2 26330529 26330729
                              chr2 38054620 38054820
                              chr2 73528635 73528835
                              chr2 95210667 95210867
                              i am a freshman in this field, i just wanna ask how to generate this file? which tool should i use?
                              i will really appreciate your help if somebody give some advices.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              39 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              41 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X