Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convertor for ABI mapping file to SAM / BAM

    Hi
    I am wondering if anyone knows of a convertor for ABI's mapping file to SAM BAM format?

    it looks like this.

    [ $ ] tail Sample1_subset_F3.csfasta.ma
    Code:
    >2356_2045_1618_F3,2_9780368.0:(45.2.0):q61,13_-46421072.1:(25.1.0):q0
    T3003303120330031323000033003.0321110222300.2310.32
    >2356_2045_1635_F3
    T3003321031213222211030331223.1132002223023.2310.21
    >2356_2045_1742_F3
    T3332030300202331130102230323.2030332101022.2012.00
    >2356_2046_1295_F3,18_-23217631.2:(41.5.0):q36
    T1113023310100030330330220332.023.332231100.2330.33
    >2356_2046_1911_F3
    T3313301232013003321312230013.111.303323133.2303.23
    Last edited by KevinLam; 03-21-2010, 08:16 PM.
    http://kevin-gattaca.blogspot.com/

  • #2
    You are going to have to convert first to GFF and then to SAM.
    You can find both tools on: http://solidsoftwaretools.com.

    Look for matogff and gfftosam.

    Let me know if you can't find them.
    -drd

    Comment


    • #3
      Thanks drio!
      found it

      matogff

      gfftosam


      with a patch here Convertion of SOLiD3 gff to SAM/BAM for IGV browser
      http://kevin-gattaca.blogspot.com/

      Comment


      • #4
        Ahh.. didn't know about that patch.... it will be useful if I ever use gff
        -drd

        Comment


        • #5
          Hi,

          I am also trying to convert my mapping .ma file (same format as presented in the first post) to GFF format. I've downloaded the matogff converter from the posted link, installed it, and tried running it. It seems to work for the first entry in my .ma file, but then it fails and spits out this error message:

          ---------------------------------------------------------
          Error in AnalysisModuleUtils.run:
          java.lang.NumberFormatException: For input string: "329"
          at java.lang.NumberFormatException.forInputString(Unknown Source)
          at java.lang.Integer.parseInt(Unknown Source)
          at java.lang.Integer.valueOf(Unknown Source)
          at com.apldbio.aga.analysis.secondary.modules.FastaRecord.isMappedUniquely(FastaRecord.java:263)
          at com.apldbio.aga.analysis.secondary.modules.MatchingStatsCalculator.tabulateData(MatchingStatsCalculator.java:74)
          at com.apldbio.aga.analysis.secondary.modules.MaToGff.filter(MaToGff.java:749)
          at com.apldbio.aga.analysis.secondary.modules.MaToGff.run(MaToGff.java:648)
          at com.apldbio.aga.analysis.util.AnalysisModuleUtils.run(AnalysisModuleUtils.java:143)
          at com.apldbio.aga.analysis.secondary.modules.MaToGff.main(MaToGff.java:1071)
          ---------------------------------------------------------

          I don't really know java, so I don't know if this is some easily-fixed error or if matogff is having an issue reading the .ma file.

          I've gotten matogff to run successfully on some other .ma files I have that use a different format -
          >1613_183_167_F3,5_3438040.2,5_3436441.2,5_3434596.2
          T02202103031121301002310313331203112
          - but the data that I'm interested in now has the format of the original post.

          Any insight would be greatly appreciated...

          Comment


          • #6
            Well, if it is not obvious, Java does not like the number contained in "329" since it is not a number. However in your initial example I can not find "329" in it. I suspect a bad data file of some sorts. Could you look through your file for "329" and paste the couple of lines before/after where "329" occurs?
            Last edited by westerman; 04-12-2010, 08:45 AM. Reason: Or maybe it is my browser showing the smileys.

            Comment


            • #7
              Here are the last two lines of the header of the file (lines starting with #) and the first five reads with annotated mapping sites (lines starting with >). MaToGff is failing at the very first hit annotation, "329", for the second read.

              --------------------------------------------------------------------------
              ...
              #/usr/local/bioscope/corona-1.0.1r0-4/bin/map /home/mafree/Input/temp.01/temp.01/ST316_E2_F3.csfasta /tmp/matchingnoAAfI/.tmpfile.reference1268692730A6jDES T=30 L=29 C=1 E=/tmp/matchingnoAAfI/.TmpfilE1268692730pxGTn6 F=0 D=1 np=1 V=15.000000 u=1 r=0 n=1 Z=100 P="11001010100101011000011110010" M=0 U=0.000000 H=0 B=1 m=-20 s=0 > /tmp/matchingnoAAfI/.TmpfilE1268692730pxGTn6.out.13
              #/usr/local/bioscope/corona-1.0.1r0-4/bin/map /home/mafree/Input/temp.01/temp.01/ST316_E2_F3.csfasta /tmp/matchingnoAAfI/.tmpfile.reference1268692730A6jDES T=30 L=29 C=1 E=/tmp/matchingnoAAfI/.TmpfilE1268692730pxGTn6 F=0 D=1 np=1 V=15.000000 u=1 r=0 n=1 Z=100 P="11101111111000001001110000000" M=0 U=0.000000 H=0 B=1 m=-20 s=0 > /tmp/matchingnoAAfI/.TmpfilE1268692730pxGTn6.out.14
              >1378_6_1041_F3
              T22222222202220222120202002202022202
              >1378_7_565_F3,4_8062027.329.3.0):q2:q2:q3,2_4130435.329.3.0):q2:q2:q3
              T11111111111111111111210112101031131
              >1378_8_591_F3,6_15275410.232.2.0):q1:q1:q1,6_15653332.329.3.0):q0:q0:q0,6_15653330.331.3.0):q0:q0:q0,6_15653328.332.3.0):q0:q0:q0,6_15653326.332.3.0):q0:q0:q0,6_15653324.332.3.0):q0:q0:q0,6_15653322.332.3.0):q0:q0:q0,6_156
              53320.332.3.0):q0:q0:q0,6_15653318.332.3.0):q0:q0:q0,6_15653316.332.3.0):q0:q0:q0,6_15653314.332.3.0):q0:q0:q0,6_15653312.332.3.0):q0:q0:q0
              T11111112211111111111111113111111131
              >1378_8_879_F3
              T32021222222222002302010320222333030
              >1378_8_1343_F3
              T30030030000001301013013310033333333
              ...
              --------------------------------------------------------------------------


              Here is my java version info:
              java version "1.6.0_05"
              Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
              Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)

              Maybe the java version I'm using is the issue?

              Comment


              • #8
                It looks like you are using Bioscope? Your output isn't formatted like classic mapreads format, so they must have changed it in the update. You'll probably need to either find an updated matogff, or find an option in the Bioscope mapping program that makes the output backwards-compatible. Or, if you can find a spec for the new output format, writing a little script to convert to the older format would work also.

                Comment


                • #9
                  What 'ondovb' says. You have bioscope output which is not the same format as corona lite. And thus not compatible with matogff. Bioscope will generate SAM output for you so you should have the gff and sam file or at least be able to generate them. One of my last runs has the files:

                  Solid0036_20090401_1.sortedMates.gff3

                  and

                  Solid0036_20090401_1.sortedMates.gff3.sam

                  Comment


                  • #10
                    Sorry for not mentioning, but yes, I used bioscope to map my SOLiD reads; however, I cannot get the pairing or matogff parts of the pipeline to work (and thus I can't run diBayes which is what I really want to do). When I run bioscope for pairing or converting, it executes and prints "Finished successfully" to the screen, but no output files are produced.

                    So then I preceded to download MaToGff separately, wrote a script to convert my .ma files to the corona lite format, and was able to get .gff files. I downloaded diBayes separately, and tried to call SNPs from my .gff files, but I get a ton of errors and warnings such as:

                    -------------------------------------------------------------------------------------
                    mutation:ERROR -- maqtotext.cpp: Base Position (5910923) is LESS THAN previous base position (6810170) in line number 2 of input file. Sort GFF file by genome position and re-run.
                    -------------------------------------------------------------------------------------
                    At this point, I'm getting frustrated...

                    I am a guest user on someone else's bioscope cluster, so I don't have access to any documentation. I really just want to be able to go from .csfasta files to calling SNPs without having to coordinate all of these steps in between - isn't that what bioscope was developed for? haha

                    I think I just don't know how to use bioscope correctly. I created separate .ini files for each step (i.e. mapping, pairing, matogff converting, etc.) and the mapping one works, but the others do not. Is there some secret trick to getting bioscope to work? Or do I just need a little more patience...

                    Comment


                    • #11
                      A couple of observations.

                      1) We need to get you running Bioscope all of the way from mapping to diBayes. This partial Bioscope then handmade conversion to Corona format will just drive you crazy.

                      2) Bioscope can be frustrating to set up. Been there. Done that.

                      3) Since you are a guest on another person's cluster then perhaps they can help you?

                      4) You sound like someone who just wants to get his results and not fiddle with the pipeline. Don't blame you at all. It is very frustrating to have the programs get in the way.

                      5) If you want then send me your ini files via private email (westerman at purdue dot edu) and I will take a look at them (Tuesday) and try to give help.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      32 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X