SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Count BLAST hits and 'no hits' fibar Bioinformatics 1 12-17-2014 12:28 AM
snpEff database provenance and custom builds najoshi Bioinformatics 0 10-01-2013 05:54 PM
Count table from BAM file with custom gtf tamari Bioinformatics 3 09-05-2013 02:42 PM
error querying custom database using Blast+ jgeahlen Bioinformatics 3 12-29-2012 10:48 AM
Custom Blast database for MEGAN metagenomics OllyBolly Bioinformatics 2 12-19-2012 11:19 PM

Reply
 
Thread Tools
Old 12-17-2014, 08:15 AM   #1
marlaux
Junior Member
 
Location: Brazil

Join Date: Oct 2014
Posts: 3
Unhappy Help with custom database of a genera and count of hits (NGS, 454)

Hello biologists and bioinformaticians around the world!

I'm having serious problems to accomplish a task.

I need to perform a megablast search from metagenomic reads (454) in order to determine the number of hits and extract the reads that mached the reference sequences of a specific genera.

Performing this, I need to compare the results with Bowtie2.

The first goal is choose between megablast and Bowtie, through number of hits.

The second goal is remove these sequences, which belong to this specific genera, from the original set of reads, in order to analyze the community with and without this genera.

First of all I tried to construct my custom database for megablast.

I downloaded from NCBI the list of fasta sequences and the list of GI's from my genera.

Next step was to try run the formatdb for custom databases. My code was:

1. formatdb -i nt -o T -p F ---> format the original database with parsing active --> it's OK

2. formatdb -F microcystis_gis.txt -B mic_gis.gi --> format my GI list to binary format --> it's OK

3. formatdb -i nt -p F -L mic_gi -F mic_gis.gi -t my_mic_db --> format database generating a alias for my GI list

Here I have no success, the error was: [formatdb] FATAL ERROR: Unable to find mic_gis.gi

4. formatdb -i microcystis_fasta.fasta -p F -n mic_db --> no success again

Then I tried fastacmd:

5. fastacmd -d nt -p F -i microcystis_gis.txt -D 1 -o microcystis.fasta ---> after a long time of processing, the result was a file with all the sequences, there was'nt the GI's filtering.

Then I tried makeblastdb:

6. makeblastdb -in microcystis_fasta.fasta -dbtype 'nucl' –parse_seqids -title microcystis_db

and the error was: Error: Too many positional arguments (1), the offending value: –parse_seqids

I think that my problem can be the format of my input files. Maybe the header of my microcystis_fasta.fasta isn't correct, here it is:

>gi|469475955|gb|KC166868.1| Shewanella putrefaciens strain DCH-5 16S ribosomal RNA gene, partial sequence
GTTACCTACAGAAGAAGGACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTCCGAGCGTTA
ATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTGTTAAGCGAGATGTGAAAGCCCTGGGCTC......

and this file is with a empty line between each new sequence.

My GI list and my fasta file are the ones downloaded from NCBI, I dont understand how they can be in a wrong format.... Maybe it is necessary to change this files in some way? I can do this by Perl script, but I was wondering if you can tell something more effective before.

By constructing my database for Bowtie2, it worked, but with this warning:

Warning: Encountered empty reference sequence

But, as usually, Bowtie2 performed fine and worked.....

I think that this warning is a indication that there is something wrong with the format of my reference fasta file.

Thank you everyone, Brazilian greetings!!
marlaux is offline   Reply With Quote
Old 07-29-2015, 08:19 PM   #2
Anna2015
Junior Member
 
Location: CO

Join Date: Jul 2015
Posts: 1
Default

Hi Marlaux,

I am having a similar issue. I am trying to align synthetic sequences to a library of sequences. I made a fasta file, and just like you I have a space between 2 sequences. When I index, I get the message 'Warning: encountered empty reference sequence'. But Bowtie works in using this index to map to my reference.

I was wondering if you figured out what this error message means, and what the solution to this is. Any input from you is appreciated.

Thanks in advance!
Anna2015 is offline   Reply With Quote
Old 07-30-2015, 01:48 PM   #3
marlaux
Junior Member
 
Location: Brazil

Join Date: Oct 2014
Posts: 3
Default

Hello Anna, look it here:

https://www.biostars.org/p/124159/#124276

Good luck!
marlaux is offline   Reply With Quote
Reply

Tags
bowtie2, fastacmd, formatdb, makeblastdb, megablast

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:26 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO