SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
454 Data cleaning Himalaya Bioinformatics 28 10-23-2013 01:33 PM
454 data nitinkumar Bioinformatics 4 02-23-2011 01:24 PM
Comparison of NGS analyser tech info gavin.oliver General 21 04-06-2010 12:30 AM
sff_extract: combining data from 454 Flx and Titanium data sets agroster Bioinformatics 7 01-14-2010 10:19 AM

Reply
 
Thread Tools
Old 03-16-2011, 07:07 AM   #1
Chuckytah
Member
 
Location: Barcelos, Braga, Portugal

Join Date: Mar 2011
Posts: 65
Exclamation [Help] - 454 data analyser

I'm currently in 2 semester of my 1 year of master degree and we have now to do project. I choose to analyse data from DNA sequencing from Quercus suber.

I want to come to this forum to learn with you all and to take some advises like wich are the best tools/softwares to use to analyse the data that was given to me, from 454 sequencing cause i don't know so much about it, it will be my first contact .


ps: sorry to repeat myself here and in my introductory post
Chuckytah is offline   Reply With Quote
Old 03-16-2011, 07:48 AM   #2
shandley
Member
 
Location: Saint Louis, MO

Join Date: Sep 2010
Posts: 58
Default

Hi Chuckytah,

That depends on what you are trying to do. What are the state of the data? Just reads? Or are they assembled? If you need to assemble them, then there are several options. If you already have them assembled and want to do more detailed analysis then there are several other options.

Typically there is no single "best" answer. There are usually several tools, but it is difficult to make a recommendation without knowing something about your goals.

SAH
shandley is offline   Reply With Quote
Old 03-16-2011, 08:14 AM   #3
Chuckytah
Member
 
Location: Barcelos, Braga, Portugal

Join Date: Mar 2011
Posts: 65
Default

Quote:
Originally Posted by shandley View Post
Hi Chuckytah,

That depends on what you are trying to do. What are the state of the data? Just reads? Or are they assembled? If you need to assemble them, then there are several options. If you already have them assembled and want to do more detailed analysis then there are several other options.

Typically there is no single "best" answer. There are usually several tools, but it is difficult to make a recommendation without knowing something about your goals.

SAH
Hello,

first of all thanks for your time to answer me
I think i need to assemble them first. cause i have a lot of small sub-sequences. i dunno if i'm making me clear.

I know there is a "best" answer for my needs but i wanted to work with a very intuitive software, dont have so many time to learn complexes ones
Chuckytah is offline   Reply With Quote
Old 03-16-2011, 09:24 AM   #4
shandley
Member
 
Location: Saint Louis, MO

Join Date: Sep 2010
Posts: 58
Default

If you have access to the Roche software (Newbler) that would be your easiest and best place to begin for 454 assembly. It does a great job and you should be able to contact whoever did the sequencing to run a basic assembly.

You can request a copy of Newbler from Roche. I don't believe it is available for download, but there should be instructions on their website on how to obtain the software. If that doesn't work you may want to consider the following open-source options.

1) FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Sequence data is never of equal quality for all reads. You will want to trim/filter some of your reads to enrich for high-quality data. FASTQC is a multi-platform application which will aid in your visualization of the quality of your data.

2) GALAXY and the FASTX Toolkit (http://main.g2.bx.psu.edu/ and http://hannonlab.cshl.edu/fastx_toolkit/). FASTQC should tell you things like how is the sequence quality at the 5' end of my reads. Frequently it will be low and you may want to exclude this sequence from subsequent analysis. Using tools available in GALAXY and the FASTX Toolkit you should be able to filter and trim your data to your hearts content. Both packages are well documented. GALAXY has more functionality than a swiss army knife wielding ninja and I recommend you take a look at the entire package as well as some of the web-tutorials. It offers an ideal platform for an entry level bioinformatician looking to do some work in genomics. A short tutorial on how to do QC on sequence data can be found here: http://www.molecularevolution.org/re..._data_activity along with a variety of other tools/tutorials that might be of interest to you.

3) Bowtie/VELVET/NEWBLER/AbYSS/MIRA: These programs should help you to assemble your filtered/trimmed data into something a bit more reasonable to handle. There are a large number of assemblers and mappers that can help you do this task. Assembling next-gen data is an under appreciated and challenging aspect of genomics to many biologists. Each data set has it's unique qualities making it no so easy to cookie cut. As I recommended above, I would use NEWBLER if you have access to it. If not, any of the ones listed here should get you started. Bowtie is easy to use and very fast. If you have a high-quality reference genome this may be the way to go. VELVET takes a lot of memory, but may be an option if you do not have a good reference genome. AbYSS and MIRA can work if you do or do not have a reference genome. Bastien Chevreux has done an excellent job at writing and documenting MIRA. It is worth going through some of his exercises just for the learning experience alone.

Again, each assembly has it's own nuances, so you may end up using multiple packages/techniques to get the job done. But I think any of these programs should get you started.

Depending on what you are able to assemble you will then need to decide what interests you about the data. Comparison with other species? Polymorphism analysis? Do you need to gather more data? Gene finding? synteny? Getting the data to a reasonable quality, assembling it and taking a look should help you to answer these questions.

SAH
shandley is offline   Reply With Quote
Old 03-17-2011, 10:15 AM   #5
Chuckytah
Member
 
Location: Barcelos, Braga, Portugal

Join Date: Mar 2011
Posts: 65
Default

Quote:
Originally Posted by shandley View Post
If you have access to the Roche software (Newbler) that would be your easiest and best place to begin for 454 assembly. It does a great job and you should be able to contact whoever did the sequencing to run a basic assembly.

You can request a copy of Newbler from Roche. I don't believe it is available for download, but there should be instructions on their website on how to obtain the software. If that doesn't work you may want to consider the following open-source options.

1) FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Sequence data is never of equal quality for all reads. You will want to trim/filter some of your reads to enrich for high-quality data. FASTQC is a multi-platform application which will aid in your visualization of the quality of your data.

2) GALAXY and the FASTX Toolkit (http://main.g2.bx.psu.edu/ and http://hannonlab.cshl.edu/fastx_toolkit/). FASTQC should tell you things like how is the sequence quality at the 5' end of my reads. Frequently it will be low and you may want to exclude this sequence from subsequent analysis. Using tools available in GALAXY and the FASTX Toolkit you should be able to filter and trim your data to your hearts content. Both packages are well documented. GALAXY has more functionality than a swiss army knife wielding ninja and I recommend you take a look at the entire package as well as some of the web-tutorials. It offers an ideal platform for an entry level bioinformatician looking to do some work in genomics. A short tutorial on how to do QC on sequence data can be found here: http://www.molecularevolution.org/re..._data_activity along with a variety of other tools/tutorials that might be of interest to you.

3) Bowtie/VELVET/NEWBLER/AbYSS/MIRA: These programs should help you to assemble your filtered/trimmed data into something a bit more reasonable to handle. There are a large number of assemblers and mappers that can help you do this task. Assembling next-gen data is an under appreciated and challenging aspect of genomics to many biologists. Each data set has it's unique qualities making it no so easy to cookie cut. As I recommended above, I would use NEWBLER if you have access to it. If not, any of the ones listed here should get you started. Bowtie is easy to use and very fast. If you have a high-quality reference genome this may be the way to go. VELVET takes a lot of memory, but may be an option if you do not have a good reference genome. AbYSS and MIRA can work if you do or do not have a reference genome. Bastien Chevreux has done an excellent job at writing and documenting MIRA. It is worth going through some of his exercises just for the learning experience alone.

Again, each assembly has it's own nuances, so you may end up using multiple packages/techniques to get the job done. But I think any of these programs should get you started.

Depending on what you are able to assemble you will then need to decide what interests you about the data. Comparison with other species? Polymorphism analysis? Do you need to gather more data? Gene finding? synteny? Getting the data to a reasonable quality, assembling it and taking a look should help you to answer these questions.

SAH
Thanks a lot for the infos. I have tryied the FastQC and MIRA but i'm not sucessful. I forget to mention that i use Windows OS and that the files that were given to me are in .xlx, .txt, .BlastClust :S

I'm a little bit lost, lol. Now i'm starting to read some articles about the 454 method, theorical things, but i need to find a program to treat my data. :S
Chuckytah is offline   Reply With Quote
Old 03-18-2011, 07:36 AM   #6
shandley
Member
 
Location: Saint Louis, MO

Join Date: Sep 2010
Posts: 58
Default

Another package you may want to consider in order to QC, filter and trim your data is PRINSEQ: http://edwards.sdsu.edu/prinseq_beta/. I recommended it in another thread to someone else new to sequence analysis and it seemed to be a hit.

SAH
shandley is offline   Reply With Quote
Old 03-19-2011, 09:18 AM   #7
Chuckytah
Member
 
Location: Barcelos, Braga, Portugal

Join Date: Mar 2011
Posts: 65
Default

Quote:
Originally Posted by shandley View Post
Another package you may want to consider in order to QC, filter and trim your data is PRINSEQ: http://edwards.sdsu.edu/prinseq_beta/. I recommended it in another thread to someone else new to sequence analysis and it seemed to be a hit.

SAH

my excel are in this distribution: Contig name; Number EST; BlastClust 85%; Blast Info see image here: http://img848.imageshack.us/f/semttulogq.png/
Chuckytah is offline   Reply With Quote
Old 03-19-2011, 07:04 PM   #8
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

I would suggest that you try to get the raw data either as SFF file or as FASTQ file. From the SFF file, you can extract the sequence and quality data and convert it into FASTQ format using e.g. PRINSEQ (or upload the FASTA and QUAL files directly to its web interface).

I am not aware of a program that will process your data in an Excel spreadsheet. If you can't get the raw data, try to convert your spreadsheet into a FASTA file.
Looking at your screenshot, it looks like someone already run BLAST on the data. It also looks like the sequences are contigs (header in first column), which suggests that they are already assembled.
If you want to redo the analysis, start with the raw data and process it with PRINSEQ or an alternative. If you are not sure what parameters to use for the processing, take a look at the manual site of PRINSEQ (http://prinseq.sourceforge.net/manual.html).
robs is offline   Reply With Quote
Old 03-20-2011, 04:39 AM   #9
Chuckytah
Member
 
Location: Barcelos, Braga, Portugal

Join Date: Mar 2011
Posts: 65
Default

Thank you so much for the help. These files have come from the company that have made the pyrosequencing... and they only send excel, blastclus and txt files... :S

I think too that they may have made some assemble too.

Quote:
Originally Posted by robs View Post
I would suggest that you try to get the raw data either as SFF file or as FASTQ file. From the SFF file, you can extract the sequence and quality data and convert it into FASTQ format using e.g. PRINSEQ (or upload the FASTA and QUAL files directly to its web interface).

I am not aware of a program that will process your data in an Excel spreadsheet. If you can't get the raw data, try to convert your spreadsheet into a FASTA file.
Looking at your screenshot, it looks like someone already run BLAST on the data. It also looks like the sequences are contigs (header in first column), which suggests that they are already assembled.
If you want to redo the analysis, start with the raw data and process it with PRINSEQ or an alternative. If you are not sure what parameters to use for the processing, take a look at the manual site of PRINSEQ (http://prinseq.sourceforge.net/manual.html).
Chuckytah is offline   Reply With Quote
Old 03-25-2011, 10:18 AM   #10
Chuckytah
Member
 
Location: Barcelos, Braga, Portugal

Join Date: Mar 2011
Posts: 65
Default

Yes, my sequences are contigs and they are assembled already... what i need to do now is to separate the sequences as taxonomic diferent things... dunno if i made me clear.
Chuckytah is offline   Reply With Quote
Old 03-29-2011, 12:13 PM   #11
Chuckytah
Member
 
Location: Barcelos, Braga, Portugal

Join Date: Mar 2011
Posts: 65
Default

I have ESTs (from Quercus suber roots) and my main task is to separate them, by categories, for example, plants, fungi, etc... I need to do blast queries?
Chuckytah is offline   Reply With Quote
Reply

Tags
454, roche 454, software, tools and techniques

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO