SEQanswers

Go Back   SEQanswers > Introductions



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 09:04 AM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 06:20 AM
batch job manager? Richard Finney Bioinformatics 2 04-26-2011 02:10 PM
Blast+ question andreitudor Bioinformatics 0 03-28-2011 09:14 AM
question on making BLAST db rdu Bioinformatics 4 01-13-2011 12:45 AM

Reply
 
Thread Tools
Old 01-18-2011, 01:10 PM   #1
SophieP
Junior Member
 
Location: USA

Join Date: Jan 2011
Posts: 2
Question hello and BLAST batch question

Hi everybody,

I am new with CLC genomics and 454 data. I am working on a non model species (a limpet) so I don't have any reference genome. I did a 454 run on cDNA library (transcriptome). I successfully did a trimming and alignement of the sequences. Now, I would like to blast the contigs against all organisms in NCBI using blastx or blastn to know which genes correspond in these contigs. I would like to know if I can do that directly with the NCBI BLAST available in CLC genomic or if I have to download RefSeq from NCBI to do a local BLAST. I have around 30 000 contigs to BLAST. I know that sometimes, when you blast to many sequences as a batch to NCBI using a software, you can be "black listed" and forbidden to use NCBI (it happened to a searcher from my previous lab who didn't know that before...). So I don't want this to happen... I guess it may depend on the software you use (maybe different ways to submit the batch according to the software, I am not a bioinformatician...)? Please, can you tell me if this is a problem I may have if I blast directly to NCBI using CLC genomic "NCBI BLAST link". If I need to use a local blast, then can you help me to find my way to download the nucleotide database for all organisms (RefSeq). Is it possible to do that in a laptop or I need a server?

Thank you by advance for your help!
All the best

Sophie
SophieP is offline   Reply With Quote
Old 01-18-2011, 07:33 PM   #2
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

Get yourself Blast2GO. It is very easy to use (no programming required) and does exactly what you want.
http://www.blast2go.org/start_blast2go

However, the contigs output file has previous contigs from the isotig to which it belongs appended to it so you need to do some data manipulation (take a look at the size of the contig versus the length of the contig sequence for contigs with status=isotig and you will see what I mean). Below is a good example, as you can see the actual sequence length is 521 bp but the contig is listed as 125 bp the previous contig has 396 bp and has been appended to the start of this contig due to a programming error (Roche are aware of it).

e.g.
Code:
>contig17281  length=125  numreads=55  gene=isogroup00117  status=isotig
TCCTTCCATgTTGTTTACATGGGGATAAAACCGCCTTGTTTTTtCTAAAGAGGGATGAAa
CCTATgCTCCCTAAAGCgtATGAATCcTGGgcGaCCAAAgTCCAATCcAcAtGGTACAAC
TTTGaCATCTCTTTTTCTgAGTgCATAGTCTATAATaGCTTCATTCTCCGGAAtCATCAC
aGAACAagTTGAGTAgACTACAAaTCCTCCTGATTTGGAaTTAgcGTCcACTAAATCAAT
TGCTgCTAAAATCaGTtGCTTTTgAAgAAAAgCaCAATTTcGTACATCTTCAATGGACTT
GGATGTTTTAATAGATTGTTGATCTGGGCATATAGTCCCACTGCCGGTGCAGGGAGCATC
CAATAATACTCTATCAACAGAATTTAATCCAAGGATcttcggtagctccttcccATcata
gTTCttCAGGTTGTTCTCCCTTGCTCTtGCAATACTGTGCTCCTCcAACCTTTCTcTtCT
TCAAGGCTTTCCTTTTCTCCTCTCTGGCTTGTGAAATTTCC
You would be far better off using the isotigs.fna file as these sequences are supposed to represent actual mRNAs (but in many instances you will see a single base difference between two isotigs from an isogroup because of a 454 sequencing error) and thus will have a better chance of matching to a protein using Blastx.
Jeremy is offline   Reply With Quote
Old 01-20-2011, 10:49 AM   #3
SophieP
Junior Member
 
Location: USA

Join Date: Jan 2011
Posts: 2
Default thanks

Thank you Jeremy for your reply, that was very helpfull.
Blast2go is now running, doing exactly what I want!!!

Sophie
SophieP is offline   Reply With Quote
Old 02-13-2014, 03:34 AM   #4
swe5191
Junior Member
 
Location: Chennai

Join Date: Dec 2013
Posts: 6
Default

Hi, even I would like to perform blast x for my non-model plant species. I assembled some 72,000 reads into 29059 unigenes. I would like to know whether BLAST2GO can be performed even if the system is hibernated or in sleep. Also I would like to how to obtain the exact gene function from thse Unigenes. Because i tried for 1st 10 unigenes for a sample, and i could annotate and obtain pathwya info for only 2 unigenes. Kindly help me through this..
-Swetha
swe5191 is offline   Reply With Quote
Old 02-13-2014, 03:45 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,091
Default

Quote:
Originally Posted by swe5191 View Post
Hi, even I would like to perform blast x for my non-model plant species. I assembled some 72,000 reads into 29059 unigenes. I would like to know whether BLAST2GO can be performed even if the system is hibernated or in sleep. Also I would like to how to obtain the exact gene function from thse Unigenes. Because i tried for 1st 10 unigenes for a sample, and i could annotate and obtain pathwya info for only 2 unigenes. Kindly help me through this..
-Swetha
A sleeping/hibernated computer is not going to do any computation. You may have to look at multiple sources for annotating new sequences (and even then there is a possibility that you may not be able to assign a function for each unigenes). You don't say what you are going to blastx search against, but with a query of this size you would be better off doing this on a proper server/cluster.

BTW: What kind of sequencing is this (72000 reads is pretty small for NGS, but a good size for sanger). 29000 unigenes from 72000 reads does not look very promising.
GenoMax is offline   Reply With Quote
Old 02-14-2014, 03:09 AM   #6
swe5191
Junior Member
 
Location: Chennai

Join Date: Dec 2013
Posts: 6
Default

I would like to perform blastx against nr database. I'm currently working on a non model plant transcriptome reads obtained by 454 sequencing. The datasets were downloaded from the database so personally I dont know much about the sequencing details.. But after pre processing and qc, i got 72018 reads from 81146 reads.. I performed de novo assembly and it assembled to 29051 unigenes. In the paper I referred, from the same number of raw reads, they have obtained 20000 unique sequences around 12k singletons and 8k contigs.. So I thought my assembly is also not that bad. What are the tools used to chk the quality of the assembly? How do I validate my assembly stats?
swe5191 is offline   Reply With Quote
Old 02-14-2014, 05:09 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,091
Default

Swetha: ~30K sequences is going to be a big blastx job to run against the nr db. You will need to use some sort of a cluster, if you have any hope of finishing in a reasonable period of time.

I am not sure what you are trying to do (are you just trying to recreate the analysis reported in the paper?). If you are only interested in the contigs and do something else with that data consider contacting the authors of the paper to see if they can share the contig file.

There are threads on this forum with tools for checking assembly quality (search for them).
GenoMax is offline   Reply With Quote
Old 02-17-2014, 09:59 AM   #8
swe5191
Junior Member
 
Location: Chennai

Join Date: Dec 2013
Posts: 6
Default

Yes I would like to perform the analysis, Im new to NGS DATA ANALYSIS, so I want to learn from qc, assembly all the basic steps.. also i requested the author for the supplementar files, but I didnt get any reply... My other question is - Can we perform BLASTX using cloud computing services and import the results into BLAST2GO for further annotation process? The BLASTX which I performed for 29501 sequences gave me output in txt format. How to import in BLAST 2 GO for further annotation steps? I know that B2G itself can perform BLASTX on its own, but the cloud services are pretty fast in obtaining the blastx results. so pls suggest me some cloud pipeline for annotation of de novo assembled unigenes.
swe5191 is offline   Reply With Quote
Old 02-18-2014, 02:05 AM   #9
yueluo
Member
 
Location: Guangzhou China

Join Date: Aug 2013
Posts: 82
Default

What we do is do both blast and blast2go locally, but that would require access to a cluster.
I'm not familiar with cloud-computing, but I think you can translate the blast results from txt into xml and then feed it to the B2G pipeline.
yueluo is offline   Reply With Quote
Old 02-18-2014, 02:32 AM   #10
swe5191
Junior Member
 
Location: Chennai

Join Date: Dec 2013
Posts: 6
Default

thank you so much for your reply. Is there any online converters to convert txt file to xml format? When I searched, i came acroos, only xml to txt file converters... and by getting access to cluster means what does that mean?
swe5191 is offline   Reply With Quote
Old 02-18-2014, 04:20 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,091
Default

@Swetha: Since your original blastx search finished quickly perhaps you can go back and re-run that and this time save output as XML (-outfmt 5).
GenoMax is offline   Reply With Quote
Reply

Tags
454, blast, ncbi

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO