Seqanswers Leaderboard Ad

**JackieBadger** · 04-18-2011, 11:45 AM

Originally posted by aligenie View Post

Hi everyone. I've searched everywhere but haven't quite found the solution to my problem so bare with me if I'm asking a simple question.

I just got back from 454 data that we generated using the Fluidigm access array. I am having difficulties parsing out the MIDs. I've converted the .sff files to fasta files and tried parsing out the MIDs with fastx tools. For some reason, the barcode splitter option is not working. I'm sure my syntax is correct so I think something is wrong with my fasta file. I know the keytags and adapters are still on my sequences. Is this the problem? I'm not sure what I should do. We are not big fans of Roche's software especially AVA so we are trying to find a different solution. Also, have people been successful using novoalign for mapping 454 data?

Any information is greatly appreciated!!!!! Thank you!

Ali

Hey,

Yep unless you have code to do this it isnt easy (what with you perhaps having a bunch of sorting parameters you may be interested in).

Geneious, offers a free trial, is easy to use, and super cheap for students. You can do it there. http://www.geneious.com/

I used jMHC, to parse mine as I found their parsing criteria particularly stringent (No Ns in primers or sequence, and 1bp = new allele) http://code.google.com/p/jmhc/

I found that with jMHC parsing could take hours, despite running on a VERY powserful desktop PC. Ran it on my Mac laptop -wizzed through it in minutes!

I attempted to get SESAME up and running, but it is a fiddly process and got tired of trouble shooting... http://bioinformatics.oxfordjournals...2/277.abstract

Good luck!

J

**sklages** · 04-19-2011, 06:31 AM

Usually, when you get back data from a multiplexed run, the MIDs are already removed from the SFF file (better: offsets are shifted by the length of the MIDs used).
So if you extract your reads with e.g. 'sffinfo' you get the "clipped" sequence (unless your are using the flag '-n'). Your fasta files do not contain the MIDs anymore.
You should keep this in mind, also when using other tools for extraction.
What tool have you been using for sff->fasta extraction?

cheers,
Sven

**JackieBadger** · 04-19-2011, 07:10 AM

Originally posted by sklages View Post

Usually, when you get back data from a multiplexed run, the MIDs are already removed from the SFF file (better: offsets are shifted by the length of the MIDs used).
So if you extract your reads with e.g. 'sffinfo' you get the "clipped" sequence (unless your are using the flag '-n'). Your fasta files do not contain the MIDs anymore.
You should keep this in mind, also when using other tools for extraction.
What tool have you been using for sff->fasta extraction?

cheers,
Sven

If a sequencing company removed the MIDs which I had attached to ID individuals, I wouldn't pay them. The whole point of MIDs is so they can be used to sort sequences.
I have never had MIDs removed from my data, only 454 adapter sequences.

You may as well parse the data using your FASTA and QUAL files. These will have the MIDs. If your sequences do not contain MIDs, you either didn't ligate them properly or the sequencing company shouldn't be paid. I highly doubt they would remove MIDs.

**sklages** · 04-19-2011, 07:56 AM

Well, when I "remove MIDs", I do this usually by splitting a SFF file from either region of the 454 into individual SFF files (according to their MID); in this step, the MID is removed (== offset shifted in SFF file). That's a normal process when working with multiplexed data.

No need to refuse paying ;-)

cheers,
Sven

**JackieBadger** · 04-19-2011, 09:12 AM

Originally posted by sklages View Post

Well, when I "remove MIDs", I do this usually by splitting a SFF file from either region of the 454 into individual SFF files (according to their MID); in this step, the MID is removed (== offset shifted in SFF file). That's a normal process when working with multiplexed data.

No need to refuse paying ;-)

cheers,
Sven

Right, but you remove the MIDs once you have sorted by them. I think this was the original question "how can i sort by barcodes?"

**sklages** · 04-19-2011, 09:29 AM

Originally posted by JackieBadger View Post

Right, but you remove the MIDs once you have sorted by them. I think this was the original question "how can i sort by barcodes?"

Yes, that's what you usually do: split a run SFF file into individual SFF files according to their barcodes/MIDs. If you use the Roche tools for this task, then the offsets are getting shiftet. We usually send these SFF files to our customers, MID removed, files sorted.
The OP didn't mention what kind of SFF he received .. individual ones? Whole region SFFs?

cheers,
Sven

**JackieBadger** · 04-19-2011, 10:04 AM

Ahhh so you preprocess the MIDs for the customer?
How nice of you! haha I'm sure most I know would charge $ for this.

Anyway, the programs I mentioned are a great way for a non-code based approach.

Cheers

J

**aligenie** · 04-19-2011, 03:17 PM

Originally posted by sklages View Post

Yes, that's what you usually do: split a run SFF file into individual SFF files according to their barcodes/MIDs. If you use the Roche tools for this task, then the offsets are getting shiftet. We usually send these SFF files to our customers, MID removed, files sorted.
The OP didn't mention what kind of SFF he received .. individual ones? Whole region SFFs?

cheers,
Sven

I wish our core did this!! LOL they definitely don't. I just received several .sff files but they are not parsed by MID or for anything else for that matter. I guess I will look into Roche tools for separating by MID and trimming. Sfffile can do something like this I think although I find the syntax very confusing. Is this what you use?

**JackieBadger** · 04-19-2011, 04:51 PM

Originally posted by aligenie View Post

I wish our core did this!! LOL they definitely don't. I just received several .sff files but they are not parsed by MID or for anything else for that matter. I guess I will look into Roche tools for separating by MID and trimming. Sfffile can do something like this I think although I find the syntax very confusing. Is this what you use?

jMHC and Geneious (links above) are super easy to use, with graphical interfaces.
You designate your primer, adapter length, and Bob's your uncle!

**sklages** · 04-19-2011, 08:20 PM

Originally posted by aligenie View Post

I wish our core did this!! LOL they definitely don't. I just received several .sff files but they are not parsed by MID or for anything else for that matter. I guess I will look into Roche tools for separating by MID and trimming. Sfffile can do something like this I think although I find the syntax very confusing. Is this what you use?

Yes, sfffile is very fast and reliable. I don't know the other tools mentioned, but the advantage of sfffile is, that it works on the SFF file and generates new SFF files. If you are done with MID clipping, then you extract the fasta sequences from the newly created SFF files (without MID, except when you use '-n' with sffinfo).

cheers,
Sven

**aligenie** · 04-20-2011, 06:42 PM

Originally posted by sklages View Post

Yes, sfffile is very fast and reliable. I don't know the other tools mentioned, but the advantage of sfffile is, that it works on the SFF file and generates new SFF files. If you are done with MID clipping, then you extract the fasta sequences from the newly created SFF files (without MID, except when you use '-n' with sffinfo).

cheers,
Sven

Hi Sven, thanks for your help. Unfortunately I still cannot get sfffile to parse by MID. I used sfffile -mcf barcode.txt -s read.sff and I get errors. My barcode file looks like this
barcode
{
mid = "MID1", "ACGAGTGCGT", 2;
mid = "MID2", "ACGCTCGACA", 2;
mid = "MID3", "AGACGCACTC", 2;
mid = "MID5", "ATCAGACACG", 2;
mid = "MID6", "ATATCGCGAG", 2;
mid = "MID7", "CGTGTCTCTA", 2;
mid = "MID8", "CTCGCGTGTC", 2;
mid = "MID10", "TCTCTATGCG", 2;
mid = "MID11", "TGATACGTCT", 2;
mid = "MID13", "CATAGTAGTG", 2;
mid = "MID14", "CGAGAGATAC", 2;
mid = "MID15", "ATACGACGTA", 2;
mid = "MID16", "TCACGTACTA", 2;
mid = "MID17", "CGTCTAGTAC", 2;
mid = "MID18", "TCTACGTAGC", 2;
mid = "MID19", "TGTACTACTC", 2;
mid = "MID20", "ACGACTACAG", 2;
mid = "MID21", "CGTAGACTAG", 2;
mid = "MID22", "TACGAGTATG", 2;
mid = "MID23", "TACTCTCGTG", 2;
mid = "MID24", "TAGAGACGAG", 2;
mid = "MID25", "TCGTCGCTCG", 2;
mid = "MID26", "ACATACGCGT", 2;
mid = "MID27", "ACGCGAGTAT", 2;
mid = "MID28", "ACTACTATGT", 2;
mid = "MID68", "TCGCTGCGTA", 2;
mid = "MID30", "AGACTATACT", 2;
mid = "MID31", "AGCGTCGTCT", 2;
mid = "MID32", "AGTACGCTAT", 2;
mid = "MID33", "ATAGAGTACT", 2;
mid = "MID34", "CACGCTACGT", 2;
mid = "MID35", "CAGTAGACGT", 2;
mid = "MID36", "CGACGTGACT", 2;
mid = "MID37", "TACACACACT", 2;
mid = "MID38", "TACACGTGAT", 2;
mid = "MID39", "TACAGATCGT", 2;
mid = "MID40", "TACGCTGTCT", 2;
mid = "MID69", "TCTGACGTCA", 2;
mid = "MID42", "TCGATCACGT", 2;
mid = "MID43", "TCGCACTAGT", 2;
mid = "MID44", "TCTAGCGACT", 2;
mid = "MID45", "TCTATACTAT", 2;
mid = "MID46", "TGACGTATGT", 2;
mid = "MID47", "TGTGAGTAGT", 2;
mid = "MID48", "ACAGTATATA", 2;
mid = "MID49", "ACGCGATCGA", 2;
mid = "MID50", "ACTAGCAGTA", 2;
mid = "MID67", "TCGATAGTGA", 2;
}

Any idea with the -mcf function isn't working? Is my syntax wrong? sorry for all the questions but this is frustrating!!

I find geneious to be really slow....

Cheers

**sklages** · 04-20-2011, 10:15 PM

Originally posted by aligenie View Post

Hi Sven, thanks for your help. Unfortunately I still cannot get sfffile to parse by MID. I used sfffile -mcf barcode.txt -s read.sff and I get errors. My barcode file looks like this
barcode
{
mid = "MID1", "ACGAGTGCGT", 2;
[...] mid = "MID67", "TCGATAGTGA", 2;
}

Any idea with the -mcf function isn't working? Is my syntax wrong? sorry for all the questions but this is frustrating!!

I find geneious to be really slow....

Cheers

What error do you get? The syntax looks ok.
How did you call 'sfffile' (command line)?

And Geneious, that's my impression too, very nice looking but too slow for NGS.

cheers,
Sven

**zhengz** · 04-21-2011, 12:23 AM

Hi aligenie,

Since you define the name for the set of barcodes as 'barcode', which is the line above {, would the following command work?

sfffile -mcf barcode.txt -s barcode read.sff

In my case, for my own customized adapters, I use a barcode file with example in the comment lines (change the x with your barcodes):

/* User-defined MID sets for the 8 Y adapters...

An example:

sfffile -s Y -mcf Yscheme.txt -o region1 NameOfYourSFFfile1.sff > MIDyieldR1.txt
sfffile -s Y -mcf Yscheme.txt -o region2 NameOfYourSFFfile2.sff > MIDyieldR2.txt

*/
Y
{
mid = "Y3", "xxxxxxxxxx", 1, "xxxxxxxxxx";
mid = "Y5", "xxxxxxxxxx", 1, "xxxxxxxxxx";
mid = "Y8", "xxxxxxxxxx", 1, "xxxxxxxxxx";
mid = "Y9", "xxxxxxxxxx", 1, "xxxxxxxxxx";
mid = "Y10", "xxxxxxxxxx", 1, "xxxxxxxxxx";
mid = "Y11", "xxxxxxxxxx", 1, "xxxxxxxxxx";
mid = "Ya1", "xxxxxxxxxx", 1, "xxxxxxxxxx";
mid = "Ya2", "xxxxxxxxxx", 1, "xxxxxxxxxx";
}

**JackieBadger** · 04-21-2011, 02:20 AM

Originally posted by sklages View Post

What error do you get? The syntax looks ok.
How did you call 'sfffile' (command line)?

And Geneious, that's my impression too, very nice looking but too slow for NGS.

cheers,
Sven

Their latest release 5.4.1 is supposed to be designed for NGS, but yes I agree that loading and moving files around is SLOW, and can cause the program to hang!
jMHC operates on a much better level for barcodes.

Cheers
j

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

454 MIDs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News