Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • hello and BLAST batch question

    Hi everybody,

    I am new with CLC genomics and 454 data. I am working on a non model species (a limpet) so I don't have any reference genome. I did a 454 run on cDNA library (transcriptome). I successfully did a trimming and alignement of the sequences. Now, I would like to blast the contigs against all organisms in NCBI using blastx or blastn to know which genes correspond in these contigs. I would like to know if I can do that directly with the NCBI BLAST available in CLC genomic or if I have to download RefSeq from NCBI to do a local BLAST. I have around 30 000 contigs to BLAST. I know that sometimes, when you blast to many sequences as a batch to NCBI using a software, you can be "black listed" and forbidden to use NCBI (it happened to a searcher from my previous lab who didn't know that before...). So I don't want this to happen... I guess it may depend on the software you use (maybe different ways to submit the batch according to the software, I am not a bioinformatician...)? Please, can you tell me if this is a problem I may have if I blast directly to NCBI using CLC genomic "NCBI BLAST link". If I need to use a local blast, then can you help me to find my way to download the nucleotide database for all organisms (RefSeq). Is it possible to do that in a laptop or I need a server?

    Thank you by advance for your help!
    All the best

    Sophie

  • #2
    Get yourself Blast2GO. It is very easy to use (no programming required) and does exactly what you want.


    However, the contigs output file has previous contigs from the isotig to which it belongs appended to it so you need to do some data manipulation (take a look at the size of the contig versus the length of the contig sequence for contigs with status=isotig and you will see what I mean). Below is a good example, as you can see the actual sequence length is 521 bp but the contig is listed as 125 bp the previous contig has 396 bp and has been appended to the start of this contig due to a programming error (Roche are aware of it).

    e.g.
    Code:
    >contig17281  length=125  numreads=55  gene=isogroup00117  status=isotig
    TCCTTCCATgTTGTTTACATGGGGATAAAACCGCCTTGTTTTTtCTAAAGAGGGATGAAa
    CCTATgCTCCCTAAAGCgtATGAATCcTGGgcGaCCAAAgTCCAATCcAcAtGGTACAAC
    TTTGaCATCTCTTTTTCTgAGTgCATAGTCTATAATaGCTTCATTCTCCGGAAtCATCAC
    aGAACAagTTGAGTAgACTACAAaTCCTCCTGATTTGGAaTTAgcGTCcACTAAATCAAT
    TGCTgCTAAAATCaGTtGCTTTTgAAgAAAAgCaCAATTTcGTACATCTTCAATGGACTT
    GGATGTTTTAATAGATTGTTGATCTGGGCATATAGTCCCACTGCCGGTGCAGGGAGCATC
    CAATAATACTCTATCAACAGAATTTAATCCAAGGATcttcggtagctccttcccATcata
    gTTCttCAGGTTGTTCTCCCTTGCTCTtGCAATACTGTGCTCCTCcAACCTTTCTcTtCT
    TCAAGGCTTTCCTTTTCTCCTCTCTGGCTTGTGAAATTTCC
    You would be far better off using the isotigs.fna file as these sequences are supposed to represent actual mRNAs (but in many instances you will see a single base difference between two isotigs from an isogroup because of a 454 sequencing error) and thus will have a better chance of matching to a protein using Blastx.

    Comment


    • #3
      thanks

      Thank you Jeremy for your reply, that was very helpfull.
      Blast2go is now running, doing exactly what I want!!!

      Sophie

      Comment


      • #4
        Hi, even I would like to perform blast x for my non-model plant species. I assembled some 72,000 reads into 29059 unigenes. I would like to know whether BLAST2GO can be performed even if the system is hibernated or in sleep. Also I would like to how to obtain the exact gene function from thse Unigenes. Because i tried for 1st 10 unigenes for a sample, and i could annotate and obtain pathwya info for only 2 unigenes. Kindly help me through this..
        -Swetha

        Comment


        • #5
          Originally posted by swe5191 View Post
          Hi, even I would like to perform blast x for my non-model plant species. I assembled some 72,000 reads into 29059 unigenes. I would like to know whether BLAST2GO can be performed even if the system is hibernated or in sleep. Also I would like to how to obtain the exact gene function from thse Unigenes. Because i tried for 1st 10 unigenes for a sample, and i could annotate and obtain pathwya info for only 2 unigenes. Kindly help me through this..
          -Swetha
          A sleeping/hibernated computer is not going to do any computation. You may have to look at multiple sources for annotating new sequences (and even then there is a possibility that you may not be able to assign a function for each unigenes). You don't say what you are going to blastx search against, but with a query of this size you would be better off doing this on a proper server/cluster.

          BTW: What kind of sequencing is this (72000 reads is pretty small for NGS, but a good size for sanger). 29000 unigenes from 72000 reads does not look very promising.

          Comment


          • #6
            I would like to perform blastx against nr database. I'm currently working on a non model plant transcriptome reads obtained by 454 sequencing. The datasets were downloaded from the database so personally I dont know much about the sequencing details.. But after pre processing and qc, i got 72018 reads from 81146 reads.. I performed de novo assembly and it assembled to 29051 unigenes. In the paper I referred, from the same number of raw reads, they have obtained 20000 unique sequences around 12k singletons and 8k contigs.. So I thought my assembly is also not that bad. What are the tools used to chk the quality of the assembly? How do I validate my assembly stats?

            Comment


            • #7
              Swetha: ~30K sequences is going to be a big blastx job to run against the nr db. You will need to use some sort of a cluster, if you have any hope of finishing in a reasonable period of time.

              I am not sure what you are trying to do (are you just trying to recreate the analysis reported in the paper?). If you are only interested in the contigs and do something else with that data consider contacting the authors of the paper to see if they can share the contig file.

              There are threads on this forum with tools for checking assembly quality (search for them).

              Comment


              • #8
                Yes I would like to perform the analysis, Im new to NGS DATA ANALYSIS, so I want to learn from qc, assembly all the basic steps.. also i requested the author for the supplementar files, but I didnt get any reply... My other question is - Can we perform BLASTX using cloud computing services and import the results into BLAST2GO for further annotation process? The BLASTX which I performed for 29501 sequences gave me output in txt format. How to import in BLAST 2 GO for further annotation steps? I know that B2G itself can perform BLASTX on its own, but the cloud services are pretty fast in obtaining the blastx results. so pls suggest me some cloud pipeline for annotation of de novo assembled unigenes.

                Comment


                • #9
                  What we do is do both blast and blast2go locally, but that would require access to a cluster.
                  I'm not familiar with cloud-computing, but I think you can translate the blast results from txt into xml and then feed it to the B2G pipeline.

                  Comment


                  • #10
                    thank you so much for your reply. Is there any online converters to convert txt file to xml format? When I searched, i came acroos, only xml to txt file converters... and by getting access to cluster means what does that mean?

                    Comment


                    • #11
                      @Swetha: Since your original blastx search finished quickly perhaps you can go back and re-run that and this time save output as XML (-outfmt 5).

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-27-2024, 06:37 PM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-27-2024, 06:07 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      68 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X