Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 MIDs

    Hi everyone. I've searched everywhere but haven't quite found the solution to my problem so bare with me if I'm asking a simple question.

    I just got back from 454 data that we generated using the Fluidigm access array. I am having difficulties parsing out the MIDs. I've converted the .sff files to fasta files and tried parsing out the MIDs with fastx tools. For some reason, the barcode splitter option is not working. I'm sure my syntax is correct so I think something is wrong with my fasta file. I know the keytags and adapters are still on my sequences. Is this the problem? I'm not sure what I should do. We are not big fans of Roche's software especially AVA so we are trying to find a different solution. Also, have people been successful using novoalign for mapping 454 data?


    Any information is greatly appreciated!!!!! Thank you!

    Ali

  • #2
    Originally posted by aligenie View Post
    Hi everyone. I've searched everywhere but haven't quite found the solution to my problem so bare with me if I'm asking a simple question.

    I just got back from 454 data that we generated using the Fluidigm access array. I am having difficulties parsing out the MIDs. I've converted the .sff files to fasta files and tried parsing out the MIDs with fastx tools. For some reason, the barcode splitter option is not working. I'm sure my syntax is correct so I think something is wrong with my fasta file. I know the keytags and adapters are still on my sequences. Is this the problem? I'm not sure what I should do. We are not big fans of Roche's software especially AVA so we are trying to find a different solution. Also, have people been successful using novoalign for mapping 454 data?


    Any information is greatly appreciated!!!!! Thank you!

    Ali

    Hey,

    Yep unless you have code to do this it isnt easy (what with you perhaps having a bunch of sorting parameters you may be interested in).

    Geneious, offers a free trial, is easy to use, and super cheap for students. You can do it there. http://www.geneious.com/


    I used jMHC, to parse mine as I found their parsing criteria particularly stringent (No Ns in primers or sequence, and 1bp = new allele) http://code.google.com/p/jmhc/

    I found that with jMHC parsing could take hours, despite running on a VERY powserful desktop PC. Ran it on my Mac laptop -wizzed through it in minutes!

    I attempted to get SESAME up and running, but it is a fiddly process and got tired of trouble shooting... http://bioinformatics.oxfordjournals...2/277.abstract

    Good luck!

    J

    Comment


    • #3
      Usually, when you get back data from a multiplexed run, the MIDs are already removed from the SFF file (better: offsets are shifted by the length of the MIDs used).
      So if you extract your reads with e.g. 'sffinfo' you get the "clipped" sequence (unless your are using the flag '-n'). Your fasta files do not contain the MIDs anymore.
      You should keep this in mind, also when using other tools for extraction.
      What tool have you been using for sff->fasta extraction?

      cheers,
      Sven

      Comment


      • #4
        Originally posted by sklages View Post
        Usually, when you get back data from a multiplexed run, the MIDs are already removed from the SFF file (better: offsets are shifted by the length of the MIDs used).
        So if you extract your reads with e.g. 'sffinfo' you get the "clipped" sequence (unless your are using the flag '-n'). Your fasta files do not contain the MIDs anymore.
        You should keep this in mind, also when using other tools for extraction.
        What tool have you been using for sff->fasta extraction?

        cheers,
        Sven
        If a sequencing company removed the MIDs which I had attached to ID individuals, I wouldn't pay them. The whole point of MIDs is so they can be used to sort sequences.
        I have never had MIDs removed from my data, only 454 adapter sequences.

        You may as well parse the data using your FASTA and QUAL files. These will have the MIDs. If your sequences do not contain MIDs, you either didn't ligate them properly or the sequencing company shouldn't be paid. I highly doubt they would remove MIDs.

        Comment


        • #5
          Well, when I "remove MIDs", I do this usually by splitting a SFF file from either region of the 454 into individual SFF files (according to their MID); in this step, the MID is removed (== offset shifted in SFF file). That's a normal process when working with multiplexed data.

          No need to refuse paying ;-)

          cheers,
          Sven

          Comment


          • #6
            Originally posted by sklages View Post
            Well, when I "remove MIDs", I do this usually by splitting a SFF file from either region of the 454 into individual SFF files (according to their MID); in this step, the MID is removed (== offset shifted in SFF file). That's a normal process when working with multiplexed data.

            No need to refuse paying ;-)

            cheers,
            Sven
            Right, but you remove the MIDs once you have sorted by them. I think this was the original question "how can i sort by barcodes?"

            Comment


            • #7
              Originally posted by JackieBadger View Post
              Right, but you remove the MIDs once you have sorted by them. I think this was the original question "how can i sort by barcodes?"
              Yes, that's what you usually do: split a run SFF file into individual SFF files according to their barcodes/MIDs. If you use the Roche tools for this task, then the offsets are getting shiftet. We usually send these SFF files to our customers, MID removed, files sorted.
              The OP didn't mention what kind of SFF he received .. individual ones? Whole region SFFs?

              cheers,
              Sven

              Comment


              • #8
                Ahhh so you preprocess the MIDs for the customer?
                How nice of you! haha I'm sure most I know would charge $ for this.

                Anyway, the programs I mentioned are a great way for a non-code based approach.

                Cheers

                J

                Comment


                • #9
                  Originally posted by sklages View Post
                  Yes, that's what you usually do: split a run SFF file into individual SFF files according to their barcodes/MIDs. If you use the Roche tools for this task, then the offsets are getting shiftet. We usually send these SFF files to our customers, MID removed, files sorted.
                  The OP didn't mention what kind of SFF he received .. individual ones? Whole region SFFs?

                  cheers,
                  Sven
                  I wish our core did this!! LOL they definitely don't. I just received several .sff files but they are not parsed by MID or for anything else for that matter. I guess I will look into Roche tools for separating by MID and trimming. Sfffile can do something like this I think although I find the syntax very confusing. Is this what you use?

                  Comment


                  • #10
                    Originally posted by aligenie View Post
                    I wish our core did this!! LOL they definitely don't. I just received several .sff files but they are not parsed by MID or for anything else for that matter. I guess I will look into Roche tools for separating by MID and trimming. Sfffile can do something like this I think although I find the syntax very confusing. Is this what you use?
                    jMHC and Geneious (links above) are super easy to use, with graphical interfaces.
                    You designate your primer, adapter length, and Bob's your uncle!

                    Comment


                    • #11
                      Originally posted by aligenie View Post
                      I wish our core did this!! LOL they definitely don't. I just received several .sff files but they are not parsed by MID or for anything else for that matter. I guess I will look into Roche tools for separating by MID and trimming. Sfffile can do something like this I think although I find the syntax very confusing. Is this what you use?
                      Yes, sfffile is very fast and reliable. I don't know the other tools mentioned, but the advantage of sfffile is, that it works on the SFF file and generates new SFF files. If you are done with MID clipping, then you extract the fasta sequences from the newly created SFF files (without MID, except when you use '-n' with sffinfo).

                      cheers,
                      Sven

                      Comment


                      • #12
                        Originally posted by sklages View Post
                        Yes, sfffile is very fast and reliable. I don't know the other tools mentioned, but the advantage of sfffile is, that it works on the SFF file and generates new SFF files. If you are done with MID clipping, then you extract the fasta sequences from the newly created SFF files (without MID, except when you use '-n' with sffinfo).

                        cheers,
                        Sven
                        Hi Sven, thanks for your help. Unfortunately I still cannot get sfffile to parse by MID. I used sfffile -mcf barcode.txt -s read.sff and I get errors. My barcode file looks like this
                        barcode
                        {
                        mid = "MID1", "ACGAGTGCGT", 2;
                        mid = "MID2", "ACGCTCGACA", 2;
                        mid = "MID3", "AGACGCACTC", 2;
                        mid = "MID5", "ATCAGACACG", 2;
                        mid = "MID6", "ATATCGCGAG", 2;
                        mid = "MID7", "CGTGTCTCTA", 2;
                        mid = "MID8", "CTCGCGTGTC", 2;
                        mid = "MID10", "TCTCTATGCG", 2;
                        mid = "MID11", "TGATACGTCT", 2;
                        mid = "MID13", "CATAGTAGTG", 2;
                        mid = "MID14", "CGAGAGATAC", 2;
                        mid = "MID15", "ATACGACGTA", 2;
                        mid = "MID16", "TCACGTACTA", 2;
                        mid = "MID17", "CGTCTAGTAC", 2;
                        mid = "MID18", "TCTACGTAGC", 2;
                        mid = "MID19", "TGTACTACTC", 2;
                        mid = "MID20", "ACGACTACAG", 2;
                        mid = "MID21", "CGTAGACTAG", 2;
                        mid = "MID22", "TACGAGTATG", 2;
                        mid = "MID23", "TACTCTCGTG", 2;
                        mid = "MID24", "TAGAGACGAG", 2;
                        mid = "MID25", "TCGTCGCTCG", 2;
                        mid = "MID26", "ACATACGCGT", 2;
                        mid = "MID27", "ACGCGAGTAT", 2;
                        mid = "MID28", "ACTACTATGT", 2;
                        mid = "MID68", "TCGCTGCGTA", 2;
                        mid = "MID30", "AGACTATACT", 2;
                        mid = "MID31", "AGCGTCGTCT", 2;
                        mid = "MID32", "AGTACGCTAT", 2;
                        mid = "MID33", "ATAGAGTACT", 2;
                        mid = "MID34", "CACGCTACGT", 2;
                        mid = "MID35", "CAGTAGACGT", 2;
                        mid = "MID36", "CGACGTGACT", 2;
                        mid = "MID37", "TACACACACT", 2;
                        mid = "MID38", "TACACGTGAT", 2;
                        mid = "MID39", "TACAGATCGT", 2;
                        mid = "MID40", "TACGCTGTCT", 2;
                        mid = "MID69", "TCTGACGTCA", 2;
                        mid = "MID42", "TCGATCACGT", 2;
                        mid = "MID43", "TCGCACTAGT", 2;
                        mid = "MID44", "TCTAGCGACT", 2;
                        mid = "MID45", "TCTATACTAT", 2;
                        mid = "MID46", "TGACGTATGT", 2;
                        mid = "MID47", "TGTGAGTAGT", 2;
                        mid = "MID48", "ACAGTATATA", 2;
                        mid = "MID49", "ACGCGATCGA", 2;
                        mid = "MID50", "ACTAGCAGTA", 2;
                        mid = "MID67", "TCGATAGTGA", 2;
                        }

                        Any idea with the -mcf function isn't working? Is my syntax wrong? sorry for all the questions but this is frustrating!!

                        I find geneious to be really slow....

                        Cheers

                        Comment


                        • #13
                          Originally posted by aligenie View Post
                          Hi Sven, thanks for your help. Unfortunately I still cannot get sfffile to parse by MID. I used sfffile -mcf barcode.txt -s read.sff and I get errors. My barcode file looks like this
                          barcode
                          {
                          mid = "MID1", "ACGAGTGCGT", 2;
                          [...] mid = "MID67", "TCGATAGTGA", 2;
                          }

                          Any idea with the -mcf function isn't working? Is my syntax wrong? sorry for all the questions but this is frustrating!!

                          I find geneious to be really slow....

                          Cheers
                          What error do you get? The syntax looks ok.
                          How did you call 'sfffile' (command line)?

                          And Geneious, that's my impression too, very nice looking but too slow for NGS.

                          cheers,
                          Sven

                          Comment


                          • #14
                            Hi aligenie,

                            Since you define the name for the set of barcodes as 'barcode', which is the line above {, would the following command work?

                            sfffile -mcf barcode.txt -s barcode read.sff


                            In my case, for my own customized adapters, I use a barcode file with example in the comment lines (change the x with your barcodes):

                            /* User-defined MID sets for the 8 Y adapters...

                            An example:

                            sfffile -s Y -mcf Yscheme.txt -o region1 NameOfYourSFFfile1.sff > MIDyieldR1.txt
                            sfffile -s Y -mcf Yscheme.txt -o region2 NameOfYourSFFfile2.sff > MIDyieldR2.txt

                            */
                            Y
                            {
                            mid = "Y3", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y5", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y8", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y9", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y10", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y11", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Ya1", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Ya2", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            }

                            Comment


                            • #15
                              Originally posted by sklages View Post
                              What error do you get? The syntax looks ok.
                              How did you call 'sfffile' (command line)?

                              And Geneious, that's my impression too, very nice looking but too slow for NGS.

                              cheers,
                              Sven
                              Their latest release 5.4.1 is supposed to be designed for NGS, but yes I agree that loading and moving files around is SLOW, and can cause the program to hang!
                              jMHC operates on a much better level for barcodes.

                              Cheers
                              j

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              31 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X