SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Genome references database with full taxonomy vyahhi Bioinformatics 1 01-21-2012 04:22 PM
Chromosomal position clustered or not? ismael Bioinformatics 0 12-08-2011 02:03 AM
Align genomic DNA sequence to protein database? rdu Bioinformatics 0 11-03-2011 07:31 PM
Hunting in databases cascoamarillo Bioinformatics 0 11-05-2010 11:09 AM
The International Workshop on Genomic databases 2010 - Rio de Janeiro cmazzoni Events / Conferences 0 04-27-2010 03:08 AM

Reply
 
Thread Tools
Old 06-18-2012, 07:29 AM   #1
Arupsss
Member
 
Location: Trento, Italy

Join Date: May 2011
Posts: 44
Default Full Genomic Database and corresponding chromosomal databases

I am doing some experiment using BowTie and Q-Pick. However, one works with full Human genomic database (BowTie) and another works with it's corresponding chromosomal databases (for chromosome 1,2, 3....23). Now from here, I found full Human Genome Database for h19 (contains 23 chromosome files one for each chromosome i.e. chromFa.tar.gz archive). However, can't understand , if I concatenate all those 23 files in a single file (say using cat command) and give input to the BowTie tool, is it acceptable ? Means does concatenated all chromosome files = Full Genomic database ? More specifically, each chromosome starts with chr(chromosome number)>, should I include those while concatenating or remove those tags ?

Last edited by Arupsss; 06-18-2012 at 07:31 AM.
Arupsss is offline   Reply With Quote
Old 06-18-2012, 07:40 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

I am not sure I understand your question. Bowtie can work with an entire genome, with chromsomes or with parts of chromosomes. So there is no need to have one large file and plenty of reasons not to (e.g., ease of manipulation, ease of visualization, etc.) However ff you do wish to concatenate all of the chromsomes together into one large genome file then leave the '>' part in place. Good luck with your analysis.
westerman is offline   Reply With Quote
Old 06-18-2012, 08:01 AM   #3
Arupsss
Member
 
Location: Trento, Italy

Join Date: May 2011
Posts: 44
Default

Quote:
Originally Posted by westerman View Post
I am not sure I understand your question. Bowtie can work with an entire genome, with chromsomes or with parts of chromosomes. So there is no need to have one large file and plenty of reasons not to (e.g., ease of manipulation, ease of visualization, etc.) However ff you do wish to concatenate all of the chromsomes together into one large genome file then leave the '>' part in place. Good luck with your analysis.
Thanks a lot. So, while concatenating, suppose chr1>NN..AG..NN and chr2>NN..GC...NN, I should remove the > means output is : chr1NN..AG..NNchr2NN..GC...NN. And give input the concatenated file to BowTie. Am I correct ?
Arupsss is offline   Reply With Quote
Old 06-18-2012, 08:17 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Your files should look something like:

>chr1
NN..AG..NN

And the next file should look like:

>chr2
NN..GC..NN

When you cat these files together leave in the '>' part to get a large file that looks like:


>chr1
NN..AG..NN
>chr2
NN..GC..NN

Unless I misunderstanding your question, this is simple FastA format manipulation.
westerman is offline   Reply With Quote
Old 06-18-2012, 08:26 AM   #5
Arupsss
Member
 
Location: Trento, Italy

Join Date: May 2011
Posts: 44
Default

Quote:
Originally Posted by westerman View Post
Your files should look something like:

>chr1
NN..AG..NN

And the next file should look like:

>chr2
NN..GC..NN

When you cat these files together leave in the '>' part to get a large file that looks like:


>chr1
NN..AG..NN
>chr2
NN..GC..NN

Unless I misunderstanding your question, this is simple FastA format manipulation.
Yah. I am trying to do that simple FastA format manipulation thus I can give it as a single file input to BowTie. However, "'>' part" means only ">" or ">chr2>" because in the above large file example you just cat those files, no part is dropped.
Arupsss is offline   Reply With Quote
Old 06-19-2012, 08:12 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,783
Default

Save yourself a significant amount of effort and just download the pre-built bowtie indexes for hg19 from here: ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip
GenoMax is offline   Reply With Quote
Old 06-20-2012, 10:24 AM   #7
Arupsss
Member
 
Location: Trento, Italy

Join Date: May 2011
Posts: 44
Default

Quote:
Originally Posted by GenoMax View Post
Save yourself a significant amount of effort and just download the pre-built bowtie indexes for hg19 from here: ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip
Thanks a lot. However, I have many chromosomal sequences (not only for Human or hg19/18). I have to do it for all. I don't think for all I can get prebuilt indexes. Another point is that for some cases I have to include/exclude sex related chromosomal sequence.
Arupsss is offline   Reply With Quote
Old 06-20-2012, 11:01 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,783
Default

I guess you are trying to do much of this on windows. It may be time to put some effort into using a unix distro. There are several unix distributions that you can try. You may want to experiment with "bioliunx" which has a lot of pre-built bioinformatics apps (http://nebc.nerc.ac.uk/tools/bio-linux/bio-linux-6.0).

You are bound to run into some issue (sooner than later) where trying to do this type of analysis on windows (editing/handling large files is one thing that comes to mind).

A simple unix command like "cat file1 fie2 file3 > final.fa" would achieve what you were asking about in the original question.

Quote:
Originally Posted by Arupsss View Post
Thanks a lot. However, I have many chromosomal sequences (not only for Human or hg19/18). I have to do it for all. I don't think for all I can get prebuilt indexes. Another point is that for some cases I have to include/exclude sex related chromosomal sequence.
GenoMax is offline   Reply With Quote
Reply

Tags
bowtie, bowtie alignment illumina, short read alignment

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO