Seqanswers Leaderboard Ad

**colindaven** · 10-26-2011, 04:19 AM

Hi,

for md5 google "md5 sum".

The human genome should be around 3 - 3.2 Gb, depending, as you say, on if you include extra contigs

You're partially right, human_g1k_v37.fasta.gz
seems to me to be correct from this source.

fai is a fasta index, which can be generated by Samtools.

Most people seem to build a complete genome from the individual contigs.

See the first post in

Exome sequencing analysis manual - SEQanswers

http://seqanswers.com/forums/showthread.php?t=14038&highlight=ulz_peter

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

for a nice manual on how to build your own human genome with "cat".

**rskr** · 10-26-2011, 08:25 AM

Not a trivial question. It depends on what you want to do with it. Many people simply can't deal with the variations such as HLA-6 on chromosome six, or VDJ regions, so they choose to ignore them. Which is a bit sad because most people working with the human genome are in medicine and should be very interested in HLA-6 as it is crucial for the immune system functioning.

**vcguy** · 10-28-2011, 05:39 AM

The reference genomes for human, mouse and zebrafish is improved, maintained and released by the Genome Reference Consortium (GRC)

Human Genome Overview - Genome Reference Consortium

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.shtml

The last major release was GRCh37 which you see in most of the browsers. However since that release there have been regional fixes in the form of "patches". The latest asssembly in that case is GRCh37.p5. You can download the latest data from the above website. Other information including problematic regions or fixes are also displayed on the website.

hope that helps.

**rosa_dentellare** · 01-09-2013, 05:33 PM

Hi,

Need help from the sequencing community.

I've downloaded all the GRCh37 assembled referance at ftp://ftp.ncbi.nlm.nih.gov/genbank/g...mosomes/FASTA/.

But what i got was 48 files cosisting of individual chromosome. I was thingking of merging all the files together but then there was two types of files for each chromosome:
1) chr*.fa.gz
2) chr*.rm.out.gz

Would it be ok if I merge it together with the repeat masker output (.rm.out.gz) files to build my referance chromosome?

Also, does anyone know how to mask out the PAR from the referance?

**dpryan** · 01-10-2013, 01:38 AM

I expect merging the regular fasta files with the repeat masked files is not what you want to do, at least if you plan to use the resulting file for mapping or anything else that's standard. Just concatenate the various chr*.fa.gz files together.

**rosa_dentellare** · 01-10-2013, 01:52 AM

thanks for the input dpryan. appreciate it.

am abit confused. what are the *.rm.out.gz files for, if I may ask?

**dpryan** · 01-10-2013, 06:31 AM

They're the output from repeatmasker, saying which regions are repeats and what type (LINEs, SINEs, LTRs, etc.). They aren't fasta files.

**rosa_dentellare** · 01-10-2013, 12:05 PM

ok..got it now. thank you dpryan =) u've been a help.

**rosa_dentellare** · 01-10-2013, 12:08 PM

oh..another question came to mind. how do I remove the PAR from the reference? or have it been removed already from the .fa files?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Human Reference Genome

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News