Seqanswers Leaderboard Ad

**robs** · 07-11-2010, 09:48 PM

The 454 SFF tools will split data sets by MIDs and remove the MIDs as well. If you have a data set with MIDs at the 5'-end, then it only removes them from the 5'-end. If you have a data set with MIDs at both end, then it will remove both MIDs and also split accordingly. If not, contact the 454 support.

**dina** · 07-11-2010, 10:26 PM

splitting to mids on both sides

Hi Robs,
Thank you, but I saw that if I use for example: mid2@sffFile, then all the samples with mid2 at the beginning will be chosen, so how do I define to sfffile just to present all the options with both ends and to call it accordingly. for example to result in files like this: file of mid 2-mid5 (=all the reads with mid 2 at the 5 end and mid 5 at the 3 end), etc..

**0Gen** · 07-12-2010, 02:50 AM

Sequence quality is gradually reduced toward the end of a read.

What is the length distribution of your library molecules? If < 400 bp, it would be OK.

The Titanium libraries are typically 600 - 900 bp long, but you only get the first ~400 bp in your reads, meaning that if you put a barcode at the end, it would not be sequenced or would not survived the quality filter.

Anyway, you can try to sort your reads by two steps. First, as usual way, the forward sorting. Second, reverse your reads, sort again. In the second step you will probably miss many reads).

**flxlex** · 07-16-2010, 02:57 AM

Originally posted by robs View Post

The 454 SFF tools will split data sets by MIDs and remove the MIDs as well. If you have a data set with MIDs at the 5'-end, then it only removes them from the 5'-end. If you have a data set with MIDs at both end, then it will remove both MIDs and also split accordingly. If not, contact the 454 support.

Not, unfortunately. As far as I know, sff tools will only be able to split on and remove the 5' MID...

Hey, this makes up my 100th post ;-)

**robs** · 07-16-2010, 09:24 AM

I remember that it does it. Maybe, you might want to upgrade to the latest version that allows the detection of both ends. You can check if your version scans for both ends by just looking at the MID definitions of your current version.

Maybe, the 454 support can make a comment on this, since I don't have access to this kind of data at the moment.

**kmcarr** · 07-16-2010, 10:12 AM

Originally posted by robs View Post

You can check if your version scans for both ends by just looking at the MID definitions of your current version.

No, sfffile will not sort reads based on two MIDs. In the original library preparation method the MID tags are only placed at the 5' end of the read. The new rapid library preparation method will result in the same MID tag being added to the both ends of the fragment. In theory the fragment should be longer than the read generated by the instrument so you would not run into the MID tag at the 3' end. However some of your inserts will be short and read all the way through the inserted fragment. The software has always recognized this possibility and would trim off any B adapter sequence it found at the 3' end of a read. Now the MID sorting software also has to deal with this. The second sequence in the MID configuration file for Rapid Library MIDs is only there to tell the software to trim off this sequence if it encounters it at the 3' end. It only sorts the reads based on the MID sequence at the 5' end.

**litali** · 07-18-2010, 02:17 AM

mids

but if I use the 454 gui software , it is possible to build the multiplexer so that it will distinguish between 2 samples which have the same mid at the 5 end and different mids at the 3 end, isn't it? For example: samp1: mid1-sequence-mid2
samp2: mid1-sequence-mid7

**kmcarr** · 07-18-2010, 04:38 AM

Originally posted by litali View Post

but if I use the 454 gui software , it is possible to build the multiplexer so that it will distinguish between 2 samples which have the same mid at the 5 end and different mids at the 3 end, isn't it? For example: samp1: mid1-sequence-mid2
samp2: mid1-sequence-mid7

No, that can't be done with the GUI or on the command line.

When you are adding read files to an assembly or mapping project with the GUI and you select multiplexing you select a MID scheme (e.g. GSMIDS, RLMIDS) from your MID config file. The GUI then presents you with a list of MIDs in that scheme and you select which MIDs to include in the filtering. The software only looks at the 5' end of the read for the MID and will only identify one MID per read.

If you wanted to use mixed, dual-end MIDs with 454 you would need to write your own script to sort the reads. You would also have to make sure that your sequence reads will reach all the way to the 3' MID. This means you could really only reliably use this method for amplicon sequencing where you know exactly the size of the product and that size is reachable within a 454 run.

**ewilbanks** · 12-30-2010, 05:57 PM

Can someone explain to a 454-newbie what GSMIDs and RLMIDs are? I've got a dataset using RLMIDs and I'm just trying to learn a bit more about what this means. Anywhere useful y'all could point me?

Thanks!!

**flxlex** · 01-04-2011, 07:46 AM

GSMIDS are the MIDs for the 'standard' shotgun library protocol, RLMIDs are for the Rapid Library protocol (available from Oct 2009). Nothing fancy to it...

**prisnirath** · 05-26-2011, 02:10 AM

Hi there!!
I am a newbie in 454 sequencing data analysis.
I hve learnt that sfffile program can trim off the MID tags while still retaining the .sff file format.
I am using the sfffile comand
sfffile -o roche454_trimmed.sff -s -nmft mid.fasta roche454.sff
but.. I am getting a pool of erors.
Can anyone please suggest the correct syntax for using this program?
Mostly, I am keen to know about the file format of mid file.

**454newbie** · 09-12-2011, 08:47 AM

Originally posted by dina View Post

Hi.
I need help with splitting my reads to the samples they refer to, but each sample was tagged with a mid at the 5' end and at the 3' end . for example: sample 1 is tagged with mids 1 at the beginning and 2 at the end, and sample 2 was tagged with mid 1 at the beginning and mid 3 at the end etc. So I need to have reads with mid1-read-mid2 separated from the reads mid1-read-mid3 etc...
I saw you can split reads using sfffile, but I couldn't find there a way to solve this issue of mids at both ends. Thank you!!!!

Did you work out a solution for this problem? If so-could you post it?

**martin2** · 01-07-2013, 08:54 AM

Originally posted by kmcarr View Post

No, sfffile will not sort reads based on two MIDs. In the original library preparation method the MID tags are only placed at the 5' end of the read. The new rapid library preparation method will result in the same MID tag being added to the both ends of the fragment.

True, that's because this is inherent to their Y-shaped nature (so, unavoidable). But do you realize that because the MID on the right side is not found/trimmed by Roche tools, it may well remain in the final, "high-qual" sequence and dampen your assembly? That is wrong experimental design if you want to do shotgun sequencing. Stick to General Library Protocol if you want to use MIDs for sample barcoding and do shotgun sequencing followed by de-novo -- and for same reason use just a GSMID on the left end only with the General Library protocol.

Originally posted by kmcarr View Post

In theory the fragment should be longer than the read generated by the instrument so you would not run into the MID tag at the 3' end. However some of your inserts will be short and read all the way through the inserted fragment. The software has always recognized this possibility and would trim off any B adapter sequence it found at the 3' end of a read. Now the MID sorting software also has to deal with this. The second sequence in the MID configuration file for Rapid Library MIDs is only there to tell the software to trim off this sequence if it encounters it at the 3' end. It only sorts the reads based on the MID sequence at the 5' end.

I think the confusion is that there are different Roche tools doing pieces of the whole task. The processing pipeline finds only B-side adapters. They are, if everything goes right, put into the low-qual sequence. But, the immediately preceding rcRLMID or rcGSMID on the right end of a read is left in. Likewise, on the left end of a read, the MID is left in "high-qual" sequence.

After you manually re-process the .sff file with "sfffile -s" you yield an .sff file for each left MID. Each such file has the left MID in "low-qual" region (because it is ahead of the left-qual trim point). But, sfffile does not bother with eventual MIDs on the right side.

I have a tool that can do this, and much more. Currently, I offer only a data processing service. Do you want to place your order? ;-)

**Roy** · 07-05-2013, 08:20 AM

The new version of sfffile (2.9) has the option -both to remove MIDs at both ends (although only if they are the same).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

spliting to mids- with mids on both sides

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News