SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract gene sequences from gff3 file and reference fasta JonB Bioinformatics 1 07-15-2014 01:13 AM
Grouping fasta entries from different files based on reference/name gevielr Bioinformatics 2 05-06-2014 01:45 AM
Does Samtools mpileup command require a reference fasta? rcapper Bioinformatics 9 06-04-2013 01:02 PM
Convert WIG file into Fasta file kumardeep Bioinformatics 3 08-23-2012 05:56 AM
Lower case characters in FASTa reference sequence foxyg Bioinformatics 5 09-08-2010 02:08 PM

Reply
 
Thread Tools
Old 04-28-2015, 10:12 AM   #1
ashkot
Member
 
Location: Cupertino, CA

Join Date: Nov 2011
Posts: 59
Default Reference FASTA file for sequencing use

Hi all,
I downloaded FASTA files from NCBI and tried to use them in my sequencing pipeline. The issue is that within the FASTA file the sequences were represented using their genbank accession and due to this the genbank accession also appreared in the VCF file which is undesirable.

Is there a way to prepare a FASTA file so that chromosome number is used or rather is there a way to "prepare" a FASTA file for sequencing use?

Thank in advance.
ashkot is offline   Reply With Quote
Old 04-28-2015, 10:29 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

What "genome" is this?

You can get pre-formatted sequence, annotation and index files for a number of common organisms at the iGenomes site: http://support.illumina.com/sequenci...e/igenome.html
GenoMax is offline   Reply With Quote
Old 04-28-2015, 10:44 AM   #3
ashkot
Member
 
Location: Cupertino, CA

Join Date: Nov 2011
Posts: 59
Default

That site has only up to hg18 for human but if I wanted to use much older genome assemblies? And my work is for human genome analysis.
ashkot is offline   Reply With Quote
Old 04-28-2015, 11:05 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

I would suggest that you get older data from from UCSC: http://hgdownload.soe.ucsc.edu/downloads.html#human Look for "chromFa.zip" files in the "Full Dataset" links.
GenoMax is offline   Reply With Quote
Old 04-28-2015, 12:39 PM   #5
ashkot
Member
 
Location: Cupertino, CA

Join Date: Nov 2011
Posts: 59
Default

We did take those file and there is no issue with those files. There are some other old genomes which we require and that is the issue.

I am wondering if there is some code that can standardize sequence names across fasta file?
ashkot is offline   Reply With Quote
Old 04-28-2015, 06:17 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

You could manually change the fasta headers to suite your purposes and remake the indexes.
GenoMax is offline   Reply With Quote
Reply

Tags
fasta, reference file, vcf

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO