Seqanswers Leaderboard Ad

**shandley** · 03-16-2011, 07:48 AM

Hi Chuckytah,

That depends on what you are trying to do. What are the state of the data? Just reads? Or are they assembled? If you need to assemble them, then there are several options. If you already have them assembled and want to do more detailed analysis then there are several other options.

Typically there is no single "best" answer. There are usually several tools, but it is difficult to make a recommendation without knowing something about your goals.

SAH

**Chuckytah** · 03-16-2011, 08:14 AM

Originally posted by shandley View Post

Hi Chuckytah,

That depends on what you are trying to do. What are the state of the data? Just reads? Or are they assembled? If you need to assemble them, then there are several options. If you already have them assembled and want to do more detailed analysis then there are several other options.

Typically there is no single "best" answer. There are usually several tools, but it is difficult to make a recommendation without knowing something about your goals.

SAH

Hello,

first of all thanks for your time to answer me

I think i need to assemble them first. cause i have a lot of small sub-sequences. i dunno if i'm making me clear.

I know there is a "best" answer for my needs but i wanted to work with a very intuitive software, dont have so many time to learn complexes ones

**shandley** · 03-16-2011, 09:24 AM

If you have access to the Roche software (Newbler) that would be your easiest and best place to begin for 454 assembly. It does a great job and you should be able to contact whoever did the sequencing to run a basic assembly.

You can request a copy of Newbler from Roche. I don't believe it is available for download, but there should be instructions on their website on how to obtain the software. If that doesn't work you may want to consider the following open-source options.

1) FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Sequence data is never of equal quality for all reads. You will want to trim/filter some of your reads to enrich for high-quality data. FASTQC is a multi-platform application which will aid in your visualization of the quality of your data.

2) GALAXY and the FASTX Toolkit (http://main.g2.bx.psu.edu/ and http://hannonlab.cshl.edu/fastx_toolkit/). FASTQC should tell you things like how is the sequence quality at the 5' end of my reads. Frequently it will be low and you may want to exclude this sequence from subsequent analysis. Using tools available in GALAXY and the FASTX Toolkit you should be able to filter and trim your data to your hearts content. Both packages are well documented. GALAXY has more functionality than a swiss army knife wielding ninja and I recommend you take a look at the entire package as well as some of the web-tutorials. It offers an ideal platform for an entry level bioinformatician looking to do some work in genomics. A short tutorial on how to do QC on sequence data can be found here: http://www.molecularevolution.org/re..._data_activity along with a variety of other tools/tutorials that might be of interest to you.

3) Bowtie/VELVET/NEWBLER/AbYSS/MIRA: These programs should help you to assemble your filtered/trimmed data into something a bit more reasonable to handle. There are a large number of assemblers and mappers that can help you do this task. Assembling next-gen data is an under appreciated and challenging aspect of genomics to many biologists. Each data set has it's unique qualities making it no so easy to cookie cut. As I recommended above, I would use NEWBLER if you have access to it. If not, any of the ones listed here should get you started. Bowtie is easy to use and very fast. If you have a high-quality reference genome this may be the way to go. VELVET takes a lot of memory, but may be an option if you do not have a good reference genome. AbYSS and MIRA can work if you do or do not have a reference genome. Bastien Chevreux has done an excellent job at writing and documenting MIRA. It is worth going through some of his exercises just for the learning experience alone.

Again, each assembly has it's own nuances, so you may end up using multiple packages/techniques to get the job done. But I think any of these programs should get you started.

Depending on what you are able to assemble you will then need to decide what interests you about the data. Comparison with other species? Polymorphism analysis? Do you need to gather more data? Gene finding? synteny? Getting the data to a reasonable quality, assembling it and taking a look should help you to answer these questions.

SAH

**Chuckytah** · 03-17-2011, 10:15 AM

Originally posted by shandley View Post

If you have access to the Roche software (Newbler) that would be your easiest and best place to begin for 454 assembly. It does a great job and you should be able to contact whoever did the sequencing to run a basic assembly.

You can request a copy of Newbler from Roche. I don't believe it is available for download, but there should be instructions on their website on how to obtain the software. If that doesn't work you may want to consider the following open-source options.

1) FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Sequence data is never of equal quality for all reads. You will want to trim/filter some of your reads to enrich for high-quality data. FASTQC is a multi-platform application which will aid in your visualization of the quality of your data.

2) GALAXY and the FASTX Toolkit (http://main.g2.bx.psu.edu/ and http://hannonlab.cshl.edu/fastx_toolkit/). FASTQC should tell you things like how is the sequence quality at the 5' end of my reads. Frequently it will be low and you may want to exclude this sequence from subsequent analysis. Using tools available in GALAXY and the FASTX Toolkit you should be able to filter and trim your data to your hearts content. Both packages are well documented. GALAXY has more functionality than a swiss army knife wielding ninja and I recommend you take a look at the entire package as well as some of the web-tutorials. It offers an ideal platform for an entry level bioinformatician looking to do some work in genomics. A short tutorial on how to do QC on sequence data can be found here: http://www.molecularevolution.org/re..._data_activity along with a variety of other tools/tutorials that might be of interest to you.

3) Bowtie/VELVET/NEWBLER/AbYSS/MIRA: These programs should help you to assemble your filtered/trimmed data into something a bit more reasonable to handle. There are a large number of assemblers and mappers that can help you do this task. Assembling next-gen data is an under appreciated and challenging aspect of genomics to many biologists. Each data set has it's unique qualities making it no so easy to cookie cut. As I recommended above, I would use NEWBLER if you have access to it. If not, any of the ones listed here should get you started. Bowtie is easy to use and very fast. If you have a high-quality reference genome this may be the way to go. VELVET takes a lot of memory, but may be an option if you do not have a good reference genome. AbYSS and MIRA can work if you do or do not have a reference genome. Bastien Chevreux has done an excellent job at writing and documenting MIRA. It is worth going through some of his exercises just for the learning experience alone.

Again, each assembly has it's own nuances, so you may end up using multiple packages/techniques to get the job done. But I think any of these programs should get you started.

Depending on what you are able to assemble you will then need to decide what interests you about the data. Comparison with other species? Polymorphism analysis? Do you need to gather more data? Gene finding? synteny? Getting the data to a reasonable quality, assembling it and taking a look should help you to answer these questions.

SAH

Thanks a lot for the infos. I have tryied the FastQC and MIRA but i'm not sucessful. I forget to mention that i use Windows OS

and that the files that were given to me are in .xlx, .txt, .BlastClust :S

I'm a little bit lost, lol. Now i'm starting to read some articles about the 454 method, theorical things, but i need to find a program to treat my data. :S

**shandley** · 03-18-2011, 07:36 AM

Another package you may want to consider in order to QC, filter and trim your data is PRINSEQ: http://edwards.sdsu.edu/prinseq_beta/. I recommended it in another thread to someone else new to sequence analysis and it seemed to be a hit.

SAH

**Chuckytah** · 03-19-2011, 09:18 AM

Originally posted by shandley View Post

Another package you may want to consider in order to QC, filter and trim your data is PRINSEQ: http://edwards.sdsu.edu/prinseq_beta/. I recommended it in another thread to someone else new to sequence analysis and it seemed to be a hit.

SAH

my excel are in this distribution: Contig name; Number EST; BlastClust 85%; Blast Info see image here: http://img848.imageshack.us/f/semttulogq.png/

**robs** · 03-19-2011, 07:04 PM

I would suggest that you try to get the raw data either as SFF file or as FASTQ file. From the SFF file, you can extract the sequence and quality data and convert it into FASTQ format using e.g. PRINSEQ (or upload the FASTA and QUAL files directly to its web interface).

I am not aware of a program that will process your data in an Excel spreadsheet. If you can't get the raw data, try to convert your spreadsheet into a FASTA file.
Looking at your screenshot, it looks like someone already run BLAST on the data. It also looks like the sequences are contigs (header in first column), which suggests that they are already assembled.
If you want to redo the analysis, start with the raw data and process it with PRINSEQ or an alternative. If you are not sure what parameters to use for the processing, take a look at the manual site of PRINSEQ (http://prinseq.sourceforge.net/manual.html).

**Chuckytah** · 03-20-2011, 04:39 AM

Thank you so much for the help. These files have come from the company that have made the pyrosequencing... and they only send excel, blastclus and txt files... :S

I think too that they may have made some assemble too.

Originally posted by robs View Post

I would suggest that you try to get the raw data either as SFF file or as FASTQ file. From the SFF file, you can extract the sequence and quality data and convert it into FASTQ format using e.g. PRINSEQ (or upload the FASTA and QUAL files directly to its web interface).

I am not aware of a program that will process your data in an Excel spreadsheet. If you can't get the raw data, try to convert your spreadsheet into a FASTA file.
Looking at your screenshot, it looks like someone already run BLAST on the data. It also looks like the sequences are contigs (header in first column), which suggests that they are already assembled.
If you want to redo the analysis, start with the raw data and process it with PRINSEQ or an alternative. If you are not sure what parameters to use for the processing, take a look at the manual site of PRINSEQ (http://prinseq.sourceforge.net/manual.html).

**Chuckytah** · 03-25-2011, 10:18 AM

Yes, my sequences are contigs and they are assembled already... what i need to do now is to separate the sequences as taxonomic diferent things... dunno if i made me clear.

**Chuckytah** · 03-29-2011, 12:13 PM

I have ESTs (from Quercus suber roots) and my main task is to separate them, by categories, for example, plants, fungi, etc... I need to do blast queries?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

[Help] - 454 data analyser

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News