Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • spliting to mids- with mids on both sides

    Hi.
    I need help with splitting my reads to the samples they refer to, but each sample was tagged with a mid at the 5' end and at the 3' end . for example: sample 1 is tagged with mids 1 at the beginning and 2 at the end, and sample 2 was tagged with mid 1 at the beginning and mid 3 at the end etc. So I need to have reads with mid1-read-mid2 separated from the reads mid1-read-mid3 etc...
    I saw you can split reads using sfffile, but I couldn't find there a way to solve this issue of mids at both ends. Thank you!!!!

  • #2
    The 454 SFF tools will split data sets by MIDs and remove the MIDs as well. If you have a data set with MIDs at the 5'-end, then it only removes them from the 5'-end. If you have a data set with MIDs at both end, then it will remove both MIDs and also split accordingly. If not, contact the 454 support.

    Comment


    • #3
      splitting to mids on both sides

      Hi Robs,
      Thank you, but I saw that if I use for example: mid2@sffFile, then all the samples with mid2 at the beginning will be chosen, so how do I define to sfffile just to present all the options with both ends and to call it accordingly. for example to result in files like this: file of mid 2-mid5 (=all the reads with mid 2 at the 5 end and mid 5 at the 3 end), etc..

      Comment


      • #4
        Sequence quality is gradually reduced toward the end of a read.

        What is the length distribution of your library molecules? If < 400 bp, it would be OK.

        The Titanium libraries are typically 600 - 900 bp long, but you only get the first ~400 bp in your reads, meaning that if you put a barcode at the end, it would not be sequenced or would not survived the quality filter.

        Anyway, you can try to sort your reads by two steps. First, as usual way, the forward sorting. Second, reverse your reads, sort again. In the second step you will probably miss many reads).

        Comment


        • #5
          Originally posted by robs View Post
          The 454 SFF tools will split data sets by MIDs and remove the MIDs as well. If you have a data set with MIDs at the 5'-end, then it only removes them from the 5'-end. If you have a data set with MIDs at both end, then it will remove both MIDs and also split accordingly. If not, contact the 454 support.
          Not, unfortunately. As far as I know, sff tools will only be able to split on and remove the 5' MID...

          Hey, this makes up my 100th post ;-)
          Last edited by flxlex; 07-16-2010, 02:58 AM. Reason: Found out it is my post # 100...

          Comment


          • #6
            I remember that it does it. Maybe, you might want to upgrade to the latest version that allows the detection of both ends. You can check if your version scans for both ends by just looking at the MID definitions of your current version.

            Maybe, the 454 support can make a comment on this, since I don't have access to this kind of data at the moment.

            Comment


            • #7
              Originally posted by robs View Post
              You can check if your version scans for both ends by just looking at the MID definitions of your current version.
              No, sfffile will not sort reads based on two MIDs. In the original library preparation method the MID tags are only placed at the 5' end of the read. The new rapid library preparation method will result in the same MID tag being added to the both ends of the fragment. In theory the fragment should be longer than the read generated by the instrument so you would not run into the MID tag at the 3' end. However some of your inserts will be short and read all the way through the inserted fragment. The software has always recognized this possibility and would trim off any B adapter sequence it found at the 3' end of a read. Now the MID sorting software also has to deal with this. The second sequence in the MID configuration file for Rapid Library MIDs is only there to tell the software to trim off this sequence if it encounters it at the 3' end. It only sorts the reads based on the MID sequence at the 5' end.

              Comment


              • #8
                mids

                but if I use the 454 gui software , it is possible to build the multiplexer so that it will distinguish between 2 samples which have the same mid at the 5 end and different mids at the 3 end, isn't it? For example: samp1: mid1-sequence-mid2
                samp2: mid1-sequence-mid7

                Comment


                • #9
                  Originally posted by litali View Post
                  but if I use the 454 gui software , it is possible to build the multiplexer so that it will distinguish between 2 samples which have the same mid at the 5 end and different mids at the 3 end, isn't it? For example: samp1: mid1-sequence-mid2
                  samp2: mid1-sequence-mid7
                  No, that can't be done with the GUI or on the command line.

                  When you are adding read files to an assembly or mapping project with the GUI and you select multiplexing you select a MID scheme (e.g. GSMIDS, RLMIDS) from your MID config file. The GUI then presents you with a list of MIDs in that scheme and you select which MIDs to include in the filtering. The software only looks at the 5' end of the read for the MID and will only identify one MID per read.

                  If you wanted to use mixed, dual-end MIDs with 454 you would need to write your own script to sort the reads. You would also have to make sure that your sequence reads will reach all the way to the 3' MID. This means you could really only reliably use this method for amplicon sequencing where you know exactly the size of the product and that size is reachable within a 454 run.

                  Comment


                  • #10
                    Can someone explain to a 454-newbie what GSMIDs and RLMIDs are? I've got a dataset using RLMIDs and I'm just trying to learn a bit more about what this means. Anywhere useful y'all could point me?

                    Thanks!!

                    Comment


                    • #11
                      GSMIDS are the MIDs for the 'standard' shotgun library protocol, RLMIDs are for the Rapid Library protocol (available from Oct 2009). Nothing fancy to it...

                      Comment


                      • #12
                        Hi there!!
                        I am a newbie in 454 sequencing data analysis.
                        I hve learnt that sfffile program can trim off the MID tags while still retaining the .sff file format.
                        I am using the sfffile comand
                        sfffile -o roche454_trimmed.sff -s -nmft mid.fasta roche454.sff
                        but.. I am getting a pool of erors.
                        Can anyone please suggest the correct syntax for using this program?
                        Mostly, I am keen to know about the file format of mid file.

                        Comment


                        • #13
                          Originally posted by dina View Post
                          Hi.
                          I need help with splitting my reads to the samples they refer to, but each sample was tagged with a mid at the 5' end and at the 3' end . for example: sample 1 is tagged with mids 1 at the beginning and 2 at the end, and sample 2 was tagged with mid 1 at the beginning and mid 3 at the end etc. So I need to have reads with mid1-read-mid2 separated from the reads mid1-read-mid3 etc...
                          I saw you can split reads using sfffile, but I couldn't find there a way to solve this issue of mids at both ends. Thank you!!!!
                          Did you work out a solution for this problem? If so-could you post it?

                          Comment


                          • #14
                            Originally posted by kmcarr View Post
                            No, sfffile will not sort reads based on two MIDs. In the original library preparation method the MID tags are only placed at the 5' end of the read. The new rapid library preparation method will result in the same MID tag being added to the both ends of the fragment.
                            True, that's because this is inherent to their Y-shaped nature (so, unavoidable). But do you realize that because the MID on the right side is not found/trimmed by Roche tools, it may well remain in the final, "high-qual" sequence and dampen your assembly? That is wrong experimental design if you want to do shotgun sequencing. Stick to General Library Protocol if you want to use MIDs for sample barcoding and do shotgun sequencing followed by de-novo -- and for same reason use just a GSMID on the left end only with the General Library protocol.

                            Originally posted by kmcarr View Post
                            In theory the fragment should be longer than the read generated by the instrument so you would not run into the MID tag at the 3' end. However some of your inserts will be short and read all the way through the inserted fragment. The software has always recognized this possibility and would trim off any B adapter sequence it found at the 3' end of a read. Now the MID sorting software also has to deal with this. The second sequence in the MID configuration file for Rapid Library MIDs is only there to tell the software to trim off this sequence if it encounters it at the 3' end. It only sorts the reads based on the MID sequence at the 5' end.
                            I think the confusion is that there are different Roche tools doing pieces of the whole task. The processing pipeline finds only B-side adapters. They are, if everything goes right, put into the low-qual sequence. But, the immediately preceding rcRLMID or rcGSMID on the right end of a read is left in. Likewise, on the left end of a read, the MID is left in "high-qual" sequence.

                            After you manually re-process the .sff file with "sfffile -s" you yield an .sff file for each left MID. Each such file has the left MID in "low-qual" region (because it is ahead of the left-qual trim point). But, sfffile does not bother with eventual MIDs on the right side.

                            I have a tool that can do this, and much more. Currently, I offer only a data processing service. Do you want to place your order? ;-)

                            Comment


                            • #15
                              The new version of sfffile (2.9) has the option -both to remove MIDs at both ends (although only if they are the same).

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X