SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
build hg38 in CEAS chxu02 Bioinformatics 0 01-16-2015 01:24 PM
comparing exome sequencing and exome chip data Rabu Bioinformatics 0 10-28-2014 11:40 AM
HG19 Genome/Exome Visualisation Puhekupla Bioinformatics 5 03-19-2014 01:31 AM
Obtaining genomic features from miRNA-seq data based on UCSC hg19 alignment results foolishbrat Bioinformatics 1 01-30-2014 05:11 AM
dbsnp135 vcf on an alignment with hg19 memento Genomic Resequencing 3 02-06-2012 06:50 AM

Reply
 
Thread Tools
Old 04-01-2015, 10:09 AM   #1
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default Alignment on hg19 or hg38 for exome-seq data

Hi,

I've just received the sequencing results from 6 exome-seq experiments.
The samples come from human patients.

Would you recommend aligning the samples on hg19 or hg38?
Have you all switched to hg38, or do you still use hg19?
Is hg19 still better annotated, and does it still have more related datasets facilitating downstream analyses?

I will launch the alignment based on your recommendations.

Sorry if the answer is obvious.
I haven't done any exome-seq analyses since the release of hg38.

Thank
blancha is offline   Reply With Quote
Old 04-01-2015, 11:33 AM   #2
vivek_
Bioinformatician
 
Location: Denmark

Join Date: Jul 2012
Posts: 158
Default

I still stick with hg19 mainly because I annotate variants with population allele frequencies and 1000 genomes/ExAC datasets still use hg19 the last I checked. However Once UCSC releases the annotation, you can safely start using the new genome build.
vivek_ is offline   Reply With Quote
Old 04-02-2015, 02:17 AM   #3
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

That's a good point.

The 1000 Genomes Project actually has a post on their twitter feed stating that they are recruiting a developer to move the data to GRCh38.
https://twitter.com/1000genomes/stat...16223279149056

On the other hand, the updates to GRCh38 are based partly on the 1000 Genomes project, so it's ironic not to move on because the 1000 Genomes Project hasn't moved on yet.
"Sequence updates - Several erroneous bases and misassembled regions in GRCh37 have been corrected in the GRCh38 assembly, and more than 100 gaps have been filled or reduced. Much of the data used to improve the reference sequence was obtained from other genome sequencing and analysis projects, such as the 1000 Genomes Project."
https://genome.ucsc.edu/cgi-bin/hgGateway?db=hg38

I understand from my discussions with other analysts that hg19 is still used by the overwhelming majority of bioinformaticians, even though the official announcements about hg38 tout all its improvements.

If anyone is using hg38 for exome-seq data, I'be interested to hear about your experience.
For example, is it simple to use liftover to compare to earlier datasets aligned with hg19, e.g. the 1000 Genomes project?
blancha is offline   Reply With Quote
Old 04-07-2015, 02:14 PM   #4
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default

Most of the improvements are in the non-coding regions, so if you are doing exomes, you won't notice a difference.

My concern would be future-proofing. If these are your first exomes, you may as well use hg38. If you need to use some hg19-based datasets, you can always liftover them to hg38. The problem is if you choose hg19 now and continue with that, you'll have to make a transition a couple of years down the road. It'll be a lot harder once you have a bunch of accumulated results.

On the other hand, I still see a lot of people using mm9 and mm10 has been out since 2011.
id0 is offline   Reply With Quote
Old 04-21-2015, 07:39 AM   #5
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

Well, I ended up trying to run concurrently the analyses on hg19 and hg38.
I quickly gave up on the hg38 analysis.
As vivek_ has warned me, the fact that the 1000 Genomes project is aligned on hg19 is a problem.
I was unable to follow the GATK DNASeq Best Practices with hg38 given that there are no VCF files available for hg38 identifying known indels and SNPs.

I think the switch to hg38 will be effortless once the 1000 Genomes Project has transitioned to hg38. I'll stick to hg19 untill that transition has been completed.

I'll have to tell the researcher I gave up on doing the analysis with hg38. Hopefully id0 is right, and it will have no impact on the results.
blancha is offline   Reply With Quote
Old 04-21-2015, 08:22 AM   #6
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 688
Default

How reliable would it be to lift over 1000 genome snps to GRCh38/hg38 ?
Richard Finney is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO