SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mothur and fastq file format elizondas Bioinformatics 5 06-24-2014 03:02 AM
mothur extract the .flow data from sff file zhaopeihua Metagenomics 0 08-28-2013 08:22 PM
Amplicon processing arkilis Bioinformatics 2 07-17-2013 07:44 PM
questions on amplicon-processing vs shotgun processing WSL 454 Pyrosequencing 4 10-01-2012 08:02 PM
amplicon preprocessing vs.shotgun processing litali 454 Pyrosequencing 4 02-16-2011 11:44 AM

Reply
 
Thread Tools
Old 06-28-2015, 08:46 PM   #1
vinnie_assemble
Junior Member
 
Location: USA

Join Date: Jun 2015
Posts: 2
Unhappy Help in sff file processing in amplicon analysis via mothur

Hi,

I am using my 454 data for OTU analysis in mothur. And I am confused after transform my sff to a fasta file. Sequencing information, platform 454 FLX, flow pattern TACG, barcode (AAAAAAAC) removed by sequencing center, primer: GAGTTTGATCNTGGCTCAG.

However, I have trouble to understand the sequence section (from 5th base to 12nd base). The primer started from 13 base. I attached the output fasta format from different toolkit.

sff_extract (from seq_crumbs toolkit) with clipping:

GAGTTTGATCCTGGCTCAGATTGAACGCTGG....

sff_extract (from seq_crumbs toolkit) without clipping:

tcagagagcgaaGAGTTTGATCCTGGCTCAGATTGAACGCTGG...

mothur output after denoise:

AGAGCGAAGAGTTTGATCCTGGCTCAGATTGAACGCTGG...

Does anyone can help to understand the sequence agagcgaa part? Base on the sequencing center information, it does not belong to barcode. And how should I deal with it? For example, it there a way to remove this region in mothur? Thank you!
vinnie_assemble is offline   Reply With Quote
Old 06-29-2015, 02:08 PM   #2
martin2
Member
 
Location: Prague, Czech Republic

Join Date: Nov 2010
Posts: 40
Default

The 'agagcgaa' sequence is probably a custom MID (barcode). They (somebody at the sequencing centre) properly used the left trimpoint to delineate it. sff_extract does the right job as far as I can see.

I do not believe that anybody used 'AAAAAAAC' as a barcode as you say. That would be a good joke ... to design a homopolymer into a barcode for this platform (and also IonTorrent). Either way, the barcode would have to be visible in the sequence you showed but it is not. Nobody wrote a tool to edit the SFF files and slice them (only tools to 'mask' the existing sequence exist) so that is another reason why I do not believe somebody gave you SFF files with barcodes physically removed. Also, your sequence starts with the sequencing key 'tcag' so another reason to believe this is just the original, raw read sequence.

Last edited by martin2; 06-29-2015 at 02:12 PM.
martin2 is offline   Reply With Quote
Old 07-01-2015, 08:36 AM   #3
vinnie_assemble
Junior Member
 
Location: USA

Join Date: Jun 2015
Posts: 2
Default

Hello martin2,

Thank you for your help. I tried process_sff.py in QIIME, and they will trim the AGAGCGAA automatically. And I contact the sequencing center, they said AAAAAAAC are the artificial barcode, which being add after sequencing.
vinnie_assemble is offline   Reply With Quote
Old 07-07-2015, 12:56 PM   #4
martin2
Member
 
Location: Prague, Czech Republic

Join Date: Nov 2010
Posts: 40
Default

Quote:
Originally Posted by vinnie_assemble View Post
Thank you for your help. I tried process_sff.py in QIIME, and they will trim the AGAGCGAA automatically.
That is expected, as I said, that is very likely the real sample barcode. Provided it is clearly present in your reads I do not see any reason why should you believe there is yet another sample barcode elsewhere.


Quote:
Originally Posted by vinnie_assemble View Post
And I contact the sequencing center, they said AAAAAAAC are the artificial barcode, which being add after sequencing.
Do you see the AAAAAAAC sequence anywhere? I don't at least in those reads you posted.

In case you do not agree with the 'agagcgaa' idea go ahead and complain to your sequencing centre and tell them they gave you files with data of somebody else. ;-) Either way, if somebody picked up a AAAAAAAC as a sample barcode for 454 or IonTorrent technology then do not use their services anymore. It would be often destroyed by sequencing errors.
martin2 is offline   Reply With Quote
Reply

Tags
16s 454 analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO