![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Homo_sapiens. GRCh37.55.gtf for tophat | shouguogao | RNA Sequencing | 2 | 12-16-2011 03:51 AM |
Acceptable Sp/Sn output from cufflinks and problems with Homo_sapiens.GRCh37.60.gtf | nat | Bioinformatics | 0 | 12-02-2010 10:58 PM |
Eland - GRCh37/hg19 assembly | AnotherHTS | Bioinformatics | 2 | 10-21-2010 08:27 AM |
how to match snp position to GRCh37 release? | cheng | Bioinformatics | 1 | 10-06-2010 04:12 PM |
RepeatMasker report of GRCh37 | alanwan | General | 0 | 08-23-2010 06:46 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Israel Join Date: Nov 2011
Posts: 8
|
![]()
I am trying to find variants on some bam files I got but GATK requires the exact database used for the alignment.
Apparently it is GRCh37. Any idea how can I download it? I have downloaded a file called homo-cre-GRCh37.zip containing a bunch of homo-##.#.ebwt , does this help me in any way? UnifiedGenotyper needs a fasta afaik. ![]() Thanks Moty |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 279
|
![]()
The GATK FTP site has all of the file you will need just look in the bundle folders
|
![]() |
![]() |
![]() |
#3 |
Member
Location: kolkata Join Date: Oct 2011
Posts: 32
|
![]()
You can download the GRCh37 files from UCSC browser or Broad institute ftp sites.
|
![]() |
![]() |
![]() |
#4 | |
Junior Member
Location: Israel Join Date: Nov 2011
Posts: 8
|
![]()
thank you very much for your help. I did manage to get that reference, but appareantly that wasn't enough.
I know this question was asked, but I never found a solution for that which helped me, but I am getting this error: Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Member
Location: Maryland Join Date: Apr 2010
Posts: 31
|
![]() |
![]() |
![]() |
![]() |
#6 | |
Junior Member
Location: Israel Join Date: Nov 2011
Posts: 8
|
![]()
I've tried that one, this time I get:
Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Yes, manually changing "1" --> "chr1" and so on will solve your problem. Writing a script is an even better way. You might even strip the GL0*,etc. files and just keep chr1-22,X,Y,M just to keep things simple (since you reads were only aligned to to chr1-22,X,Y,M).
|
![]() |
![]() |
![]() |
#8 | |
Junior Member
Location: Israel Join Date: Nov 2011
Posts: 8
|
![]()
well ive done that but, sadly, now it says
Quote:
has anyone ever had this? |
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
This is the "new chrM plus" Grch37 problem. I'm sure others have other more germanic and shorter syllables descriptions for this. Many hours have been spent dealing with this important "forking" of the data.
Basicaly there's 2 chrM's in common usage for hg19/grch37 analysis. You can delete chrM from your analysis or get the right version for your data. I did a bl2seq on the two chrM's and there wasn't much difference: one had 3 inserts the other 1 for a difference of 2 (which you see in the file size difference). see: ftp://ftp.sanger.ac.uk/pub/1000genom...ference/README note comments on NC_012920 I hope when grch38/hg20 comes out everybody just sticks with the snapshot. |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
For human alignment, the 1000g phase 1 reference is the most widely used, by nearly all the human projects involving Sanger, Broad and UMich. It is available from the 1000g website, the GATK bundle and the sanger FTP others has pointed out. If possible, try to use that. It is not so trivial to build the right reference genome, though for most this has little practical effect.
|
![]() |
![]() |
![]() |
#11 | |
Junior Member
Location: Israel Join Date: Nov 2011
Posts: 8
|
![]()
I do have no need for the reads of chrM at all so removing them might be a good option.
I tried using Quote:
|
|
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Edit the chrM entry out of the bamfile header.
Dump header using "view -H" Use text editor to delete chrM line. Use "samtools reheader" |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 279
|
![]()
If you remove the mitochondrial reads from the fasta file you run the risk that those reads align erroneously to the genome. As Ih3 said, most people use the 1000 genomes version including the supercontigs for the same reason you don't want to delete the mitochondrial genome. Look at the 1000 genomes decoy documentation for a full list of reasons you want the most comprehensive fasta file possible.
|
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
Just to clarify: the decoy reference (I made it) is used by phase 2. The phase-1 reference genome is the mostly used at present.
|
![]() |
![]() |
![]() |
Tags |
database, download, gatk, grch37, hg19 |
Thread Tools | |
|
|