Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Help] - 454 data analyser

    I'm currently in 2º semester of my 1º year of master degree and we have now to do project. I choose to analyse data from DNA sequencing from Quercus suber.

    I want to come to this forum to learn with you all and to take some advises like wich are the best tools/softwares to use to analyse the data that was given to me, from 454 sequencing cause i don't know so much about it, it will be my first contact .


    ps: sorry to repeat myself here and in my introductory post

  • #2
    Hi Chuckytah,

    That depends on what you are trying to do. What are the state of the data? Just reads? Or are they assembled? If you need to assemble them, then there are several options. If you already have them assembled and want to do more detailed analysis then there are several other options.

    Typically there is no single "best" answer. There are usually several tools, but it is difficult to make a recommendation without knowing something about your goals.

    SAH

    Comment


    • #3
      Originally posted by shandley View Post
      Hi Chuckytah,

      That depends on what you are trying to do. What are the state of the data? Just reads? Or are they assembled? If you need to assemble them, then there are several options. If you already have them assembled and want to do more detailed analysis then there are several other options.

      Typically there is no single "best" answer. There are usually several tools, but it is difficult to make a recommendation without knowing something about your goals.

      SAH
      Hello,

      first of all thanks for your time to answer me
      I think i need to assemble them first. cause i have a lot of small sub-sequences. i dunno if i'm making me clear.

      I know there is a "best" answer for my needs but i wanted to work with a very intuitive software, dont have so many time to learn complexes ones

      Comment


      • #4
        If you have access to the Roche software (Newbler) that would be your easiest and best place to begin for 454 assembly. It does a great job and you should be able to contact whoever did the sequencing to run a basic assembly.

        You can request a copy of Newbler from Roche. I don't believe it is available for download, but there should be instructions on their website on how to obtain the software. If that doesn't work you may want to consider the following open-source options.

        1) FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Sequence data is never of equal quality for all reads. You will want to trim/filter some of your reads to enrich for high-quality data. FASTQC is a multi-platform application which will aid in your visualization of the quality of your data.

        2) GALAXY and the FASTX Toolkit (http://main.g2.bx.psu.edu/ and http://hannonlab.cshl.edu/fastx_toolkit/). FASTQC should tell you things like how is the sequence quality at the 5' end of my reads. Frequently it will be low and you may want to exclude this sequence from subsequent analysis. Using tools available in GALAXY and the FASTX Toolkit you should be able to filter and trim your data to your hearts content. Both packages are well documented. GALAXY has more functionality than a swiss army knife wielding ninja and I recommend you take a look at the entire package as well as some of the web-tutorials. It offers an ideal platform for an entry level bioinformatician looking to do some work in genomics. A short tutorial on how to do QC on sequence data can be found here: http://www.molecularevolution.org/re..._data_activity along with a variety of other tools/tutorials that might be of interest to you.

        3) Bowtie/VELVET/NEWBLER/AbYSS/MIRA: These programs should help you to assemble your filtered/trimmed data into something a bit more reasonable to handle. There are a large number of assemblers and mappers that can help you do this task. Assembling next-gen data is an under appreciated and challenging aspect of genomics to many biologists. Each data set has it's unique qualities making it no so easy to cookie cut. As I recommended above, I would use NEWBLER if you have access to it. If not, any of the ones listed here should get you started. Bowtie is easy to use and very fast. If you have a high-quality reference genome this may be the way to go. VELVET takes a lot of memory, but may be an option if you do not have a good reference genome. AbYSS and MIRA can work if you do or do not have a reference genome. Bastien Chevreux has done an excellent job at writing and documenting MIRA. It is worth going through some of his exercises just for the learning experience alone.

        Again, each assembly has it's own nuances, so you may end up using multiple packages/techniques to get the job done. But I think any of these programs should get you started.

        Depending on what you are able to assemble you will then need to decide what interests you about the data. Comparison with other species? Polymorphism analysis? Do you need to gather more data? Gene finding? synteny? Getting the data to a reasonable quality, assembling it and taking a look should help you to answer these questions.

        SAH

        Comment


        • #5
          Originally posted by shandley View Post
          If you have access to the Roche software (Newbler) that would be your easiest and best place to begin for 454 assembly. It does a great job and you should be able to contact whoever did the sequencing to run a basic assembly.

          You can request a copy of Newbler from Roche. I don't believe it is available for download, but there should be instructions on their website on how to obtain the software. If that doesn't work you may want to consider the following open-source options.

          1) FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Sequence data is never of equal quality for all reads. You will want to trim/filter some of your reads to enrich for high-quality data. FASTQC is a multi-platform application which will aid in your visualization of the quality of your data.

          2) GALAXY and the FASTX Toolkit (http://main.g2.bx.psu.edu/ and http://hannonlab.cshl.edu/fastx_toolkit/). FASTQC should tell you things like how is the sequence quality at the 5' end of my reads. Frequently it will be low and you may want to exclude this sequence from subsequent analysis. Using tools available in GALAXY and the FASTX Toolkit you should be able to filter and trim your data to your hearts content. Both packages are well documented. GALAXY has more functionality than a swiss army knife wielding ninja and I recommend you take a look at the entire package as well as some of the web-tutorials. It offers an ideal platform for an entry level bioinformatician looking to do some work in genomics. A short tutorial on how to do QC on sequence data can be found here: http://www.molecularevolution.org/re..._data_activity along with a variety of other tools/tutorials that might be of interest to you.

          3) Bowtie/VELVET/NEWBLER/AbYSS/MIRA: These programs should help you to assemble your filtered/trimmed data into something a bit more reasonable to handle. There are a large number of assemblers and mappers that can help you do this task. Assembling next-gen data is an under appreciated and challenging aspect of genomics to many biologists. Each data set has it's unique qualities making it no so easy to cookie cut. As I recommended above, I would use NEWBLER if you have access to it. If not, any of the ones listed here should get you started. Bowtie is easy to use and very fast. If you have a high-quality reference genome this may be the way to go. VELVET takes a lot of memory, but may be an option if you do not have a good reference genome. AbYSS and MIRA can work if you do or do not have a reference genome. Bastien Chevreux has done an excellent job at writing and documenting MIRA. It is worth going through some of his exercises just for the learning experience alone.

          Again, each assembly has it's own nuances, so you may end up using multiple packages/techniques to get the job done. But I think any of these programs should get you started.

          Depending on what you are able to assemble you will then need to decide what interests you about the data. Comparison with other species? Polymorphism analysis? Do you need to gather more data? Gene finding? synteny? Getting the data to a reasonable quality, assembling it and taking a look should help you to answer these questions.

          SAH
          Thanks a lot for the infos. I have tryied the FastQC and MIRA but i'm not sucessful. I forget to mention that i use Windows OS and that the files that were given to me are in .xlx, .txt, .BlastClust :S

          I'm a little bit lost, lol. Now i'm starting to read some articles about the 454 method, theorical things, but i need to find a program to treat my data. :S

          Comment


          • #6
            Another package you may want to consider in order to QC, filter and trim your data is PRINSEQ: http://edwards.sdsu.edu/prinseq_beta/. I recommended it in another thread to someone else new to sequence analysis and it seemed to be a hit.

            SAH

            Comment


            • #7
              Originally posted by shandley View Post
              Another package you may want to consider in order to QC, filter and trim your data is PRINSEQ: http://edwards.sdsu.edu/prinseq_beta/. I recommended it in another thread to someone else new to sequence analysis and it seemed to be a hit.

              SAH

              my excel are in this distribution: Contig name; Number EST; BlastClust 85%; Blast Info see image here: http://img848.imageshack.us/f/semttulogq.png/

              Comment


              • #8
                I would suggest that you try to get the raw data either as SFF file or as FASTQ file. From the SFF file, you can extract the sequence and quality data and convert it into FASTQ format using e.g. PRINSEQ (or upload the FASTA and QUAL files directly to its web interface).

                I am not aware of a program that will process your data in an Excel spreadsheet. If you can't get the raw data, try to convert your spreadsheet into a FASTA file.
                Looking at your screenshot, it looks like someone already run BLAST on the data. It also looks like the sequences are contigs (header in first column), which suggests that they are already assembled.
                If you want to redo the analysis, start with the raw data and process it with PRINSEQ or an alternative. If you are not sure what parameters to use for the processing, take a look at the manual site of PRINSEQ (http://prinseq.sourceforge.net/manual.html).

                Comment


                • #9
                  Thank you so much for the help. These files have come from the company that have made the pyrosequencing... and they only send excel, blastclus and txt files... :S

                  I think too that they may have made some assemble too.

                  Originally posted by robs View Post
                  I would suggest that you try to get the raw data either as SFF file or as FASTQ file. From the SFF file, you can extract the sequence and quality data and convert it into FASTQ format using e.g. PRINSEQ (or upload the FASTA and QUAL files directly to its web interface).

                  I am not aware of a program that will process your data in an Excel spreadsheet. If you can't get the raw data, try to convert your spreadsheet into a FASTA file.
                  Looking at your screenshot, it looks like someone already run BLAST on the data. It also looks like the sequences are contigs (header in first column), which suggests that they are already assembled.
                  If you want to redo the analysis, start with the raw data and process it with PRINSEQ or an alternative. If you are not sure what parameters to use for the processing, take a look at the manual site of PRINSEQ (http://prinseq.sourceforge.net/manual.html).

                  Comment


                  • #10
                    Yes, my sequences are contigs and they are assembled already... what i need to do now is to separate the sequences as taxonomic diferent things... dunno if i made me clear.

                    Comment


                    • #11
                      I have ESTs (from Quercus suber roots) and my main task is to separate them, by categories, for example, plants, fungi, etc... I need to do blast queries?

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      32 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X