SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
removing mids litali Bioinformatics 2 10-24-2011 04:18 AM
MIDs trimming Mali Salmon 454 Pyrosequencing 0 05-08-2011 10:22 PM
454 MIDs aligenie 454 Pyrosequencing 18 04-23-2011 12:07 AM
mids in a mapping project litali Bioinformatics 1 09-07-2010 05:14 AM
amplicon with mids 454 litali Bioinformatics 1 07-27-2010 08:19 AM

Reply
 
Thread Tools
Old 07-11-2010, 12:33 PM   #1
dina
Member
 
Location: israel

Join Date: Sep 2009
Posts: 34
Default spliting to mids- with mids on both sides

Hi.
I need help with splitting my reads to the samples they refer to, but each sample was tagged with a mid at the 5' end and at the 3' end . for example: sample 1 is tagged with mids 1 at the beginning and 2 at the end, and sample 2 was tagged with mid 1 at the beginning and mid 3 at the end etc. So I need to have reads with mid1-read-mid2 separated from the reads mid1-read-mid3 etc...
I saw you can split reads using sfffile, but I couldn't find there a way to solve this issue of mids at both ends. Thank you!!!!
dina is offline   Reply With Quote
Old 07-11-2010, 10:48 PM   #2
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

The 454 SFF tools will split data sets by MIDs and remove the MIDs as well. If you have a data set with MIDs at the 5'-end, then it only removes them from the 5'-end. If you have a data set with MIDs at both end, then it will remove both MIDs and also split accordingly. If not, contact the 454 support.
robs is offline   Reply With Quote
Old 07-11-2010, 11:26 PM   #3
dina
Member
 
Location: israel

Join Date: Sep 2009
Posts: 34
Default splitting to mids on both sides

Hi Robs,
Thank you, but I saw that if I use for example: mid2@sffFile, then all the samples with mid2 at the beginning will be chosen, so how do I define to sfffile just to present all the options with both ends and to call it accordingly. for example to result in files like this: file of mid 2-mid5 (=all the reads with mid 2 at the 5 end and mid 5 at the 3 end), etc..
dina is offline   Reply With Quote
Old 07-12-2010, 03:50 AM   #4
0Gen
Junior Member
 
Location: China

Join Date: Mar 2010
Posts: 8
Default

Sequence quality is gradually reduced toward the end of a read.

What is the length distribution of your library molecules? If < 400 bp, it would be OK.

The Titanium libraries are typically 600 - 900 bp long, but you only get the first ~400 bp in your reads, meaning that if you put a barcode at the end, it would not be sequenced or would not survived the quality filter.

Anyway, you can try to sort your reads by two steps. First, as usual way, the forward sorting. Second, reverse your reads, sort again. In the second step you will probably miss many reads).
0Gen is offline   Reply With Quote
Old 07-16-2010, 03:57 AM   #5
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Quote:
Originally Posted by robs View Post
The 454 SFF tools will split data sets by MIDs and remove the MIDs as well. If you have a data set with MIDs at the 5'-end, then it only removes them from the 5'-end. If you have a data set with MIDs at both end, then it will remove both MIDs and also split accordingly. If not, contact the 454 support.
Not, unfortunately. As far as I know, sff tools will only be able to split on and remove the 5' MID...

Hey, this makes up my 100th post ;-)

Last edited by flxlex; 07-16-2010 at 03:58 AM. Reason: Found out it is my post # 100...
flxlex is offline   Reply With Quote
Old 07-16-2010, 10:24 AM   #6
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

I remember that it does it. Maybe, you might want to upgrade to the latest version that allows the detection of both ends. You can check if your version scans for both ends by just looking at the MID definitions of your current version.

Maybe, the 454 support can make a comment on this, since I don't have access to this kind of data at the moment.
robs is offline   Reply With Quote
Old 07-16-2010, 11:12 AM   #7
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by robs View Post
You can check if your version scans for both ends by just looking at the MID definitions of your current version.
No, sfffile will not sort reads based on two MIDs. In the original library preparation method the MID tags are only placed at the 5' end of the read. The new rapid library preparation method will result in the same MID tag being added to the both ends of the fragment. In theory the fragment should be longer than the read generated by the instrument so you would not run into the MID tag at the 3' end. However some of your inserts will be short and read all the way through the inserted fragment. The software has always recognized this possibility and would trim off any B adapter sequence it found at the 3' end of a read. Now the MID sorting software also has to deal with this. The second sequence in the MID configuration file for Rapid Library MIDs is only there to tell the software to trim off this sequence if it encounters it at the 3' end. It only sorts the reads based on the MID sequence at the 5' end.
kmcarr is offline   Reply With Quote
Old 07-18-2010, 03:17 AM   #8
litali
Member
 
Location: us

Join Date: Jul 2010
Posts: 78
Default mids

but if I use the 454 gui software , it is possible to build the multiplexer so that it will distinguish between 2 samples which have the same mid at the 5 end and different mids at the 3 end, isn't it? For example: samp1: mid1-sequence-mid2
samp2: mid1-sequence-mid7
litali is offline   Reply With Quote
Old 07-18-2010, 05:38 AM   #9
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by litali View Post
but if I use the 454 gui software , it is possible to build the multiplexer so that it will distinguish between 2 samples which have the same mid at the 5 end and different mids at the 3 end, isn't it? For example: samp1: mid1-sequence-mid2
samp2: mid1-sequence-mid7
No, that can't be done with the GUI or on the command line.

When you are adding read files to an assembly or mapping project with the GUI and you select multiplexing you select a MID scheme (e.g. GSMIDS, RLMIDS) from your MID config file. The GUI then presents you with a list of MIDs in that scheme and you select which MIDs to include in the filtering. The software only looks at the 5' end of the read for the MID and will only identify one MID per read.

If you wanted to use mixed, dual-end MIDs with 454 you would need to write your own script to sort the reads. You would also have to make sure that your sequence reads will reach all the way to the 3' MID. This means you could really only reliably use this method for amplicon sequencing where you know exactly the size of the product and that size is reachable within a 454 run.
kmcarr is offline   Reply With Quote
Old 12-30-2010, 05:57 PM   #10
ewilbanks
Member
 
Location: Davis, CA

Join Date: Mar 2009
Posts: 82
Default

Can someone explain to a 454-newbie what GSMIDs and RLMIDs are? I've got a dataset using RLMIDs and I'm just trying to learn a bit more about what this means. Anywhere useful y'all could point me?

Thanks!!
ewilbanks is offline   Reply With Quote
Old 01-04-2011, 07:46 AM   #11
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

GSMIDS are the MIDs for the 'standard' shotgun library protocol, RLMIDs are for the Rapid Library protocol (available from Oct 2009). Nothing fancy to it...
flxlex is offline   Reply With Quote
Old 05-26-2011, 03:10 AM   #12
prisnirath
Member
 
Location: Newcastle upon Tyne

Join Date: May 2011
Posts: 19
Default

Hi there!!
I am a newbie in 454 sequencing data analysis.
I hve learnt that sfffile program can trim off the MID tags while still retaining the .sff file format.
I am using the sfffile comand
sfffile -o roche454_trimmed.sff -s -nmft mid.fasta roche454.sff
but.. I am getting a pool of erors.
Can anyone please suggest the correct syntax for using this program?
Mostly, I am keen to know about the file format of mid file.
prisnirath is offline   Reply With Quote
Old 09-12-2011, 09:47 AM   #13
454newbie
Member
 
Location: California

Join Date: Jun 2009
Posts: 17
Default

Quote:
Originally Posted by dina View Post
Hi.
I need help with splitting my reads to the samples they refer to, but each sample was tagged with a mid at the 5' end and at the 3' end . for example: sample 1 is tagged with mids 1 at the beginning and 2 at the end, and sample 2 was tagged with mid 1 at the beginning and mid 3 at the end etc. So I need to have reads with mid1-read-mid2 separated from the reads mid1-read-mid3 etc...
I saw you can split reads using sfffile, but I couldn't find there a way to solve this issue of mids at both ends. Thank you!!!!
Did you work out a solution for this problem? If so-could you post it?
454newbie is offline   Reply With Quote
Old 01-07-2013, 08:54 AM   #14
martin2
Member
 
Location: Prague, Czech Republic

Join Date: Nov 2010
Posts: 40
Default

Quote:
Originally Posted by kmcarr View Post
No, sfffile will not sort reads based on two MIDs. In the original library preparation method the MID tags are only placed at the 5' end of the read. The new rapid library preparation method will result in the same MID tag being added to the both ends of the fragment.
True, that's because this is inherent to their Y-shaped nature (so, unavoidable). But do you realize that because the MID on the right side is not found/trimmed by Roche tools, it may well remain in the final, "high-qual" sequence and dampen your assembly? That is wrong experimental design if you want to do shotgun sequencing. Stick to General Library Protocol if you want to use MIDs for sample barcoding and do shotgun sequencing followed by de-novo -- and for same reason use just a GSMID on the left end only with the General Library protocol.

Quote:
Originally Posted by kmcarr View Post
In theory the fragment should be longer than the read generated by the instrument so you would not run into the MID tag at the 3' end. However some of your inserts will be short and read all the way through the inserted fragment. The software has always recognized this possibility and would trim off any B adapter sequence it found at the 3' end of a read. Now the MID sorting software also has to deal with this. The second sequence in the MID configuration file for Rapid Library MIDs is only there to tell the software to trim off this sequence if it encounters it at the 3' end. It only sorts the reads based on the MID sequence at the 5' end.
I think the confusion is that there are different Roche tools doing pieces of the whole task. The processing pipeline finds only B-side adapters. They are, if everything goes right, put into the low-qual sequence. But, the immediately preceding rcRLMID or rcGSMID on the right end of a read is left in. Likewise, on the left end of a read, the MID is left in "high-qual" sequence.

After you manually re-process the .sff file with "sfffile -s" you yield an .sff file for each left MID. Each such file has the left MID in "low-qual" region (because it is ahead of the left-qual trim point). But, sfffile does not bother with eventual MIDs on the right side.

I have a tool that can do this, and much more. Currently, I offer only a data processing service. Do you want to place your order? ;-)
martin2 is offline   Reply With Quote
Old 07-05-2013, 09:20 AM   #15
Roy
Member
 
Location: Sheffield, UK

Join Date: Oct 2009
Posts: 17
Default

The new version of sfffile (2.9) has the option -both to remove MIDs at both ends (although only if they are the same).
Roy is offline   Reply With Quote
Old 07-05-2013, 09:33 AM   #16
Roy
Member
 
Location: Sheffield, UK

Join Date: Oct 2009
Posts: 17
Default

Quote:
Originally Posted by Roy View Post
The new version of sfffile (2.9) has the option -both to remove MIDs at both ends (although only if they are the same).
Actually, after playing around a bit more, it looks like you can specify different MIDs at either end by using a custom MID configuration file.
Roy is offline   Reply With Quote
Old 07-05-2013, 09:37 AM   #17
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 380
Default

Use jMHC. Does all of the sequence quantifications during the process
JackieBadger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO