Seqanswers Leaderboard Ad

**HESmith** · 11-04-2015, 07:18 AM

You can create a reference by de novo assembly from you 600 sample data set, then align each to identify sample-specific variants.

**GenoMax** · 11-04-2015, 08:17 AM

Depending on how much data there is, 600 samples may be too much to try and assemble at one time. Perhaps a sampling approach and comparing the assemblies between those tries to estimate the differences?

**WhatsOEver** · 11-04-2015, 08:25 AM

I'm not sure if I would see a problem here. Let's assume you would compare your samples against the reference and you would see for example that sample1 has a A(ref)->T(s1) mutation at position 10 while sample2 has a A(ref)->C(s2) mutation at position 10. The variation between samples (here: C vs T) is easy extractable you just use your reference as a backbone for the comparison.

**GenoMax** · 11-04-2015, 08:36 AM

If @Guillefriis does what you are proposing then where to set the cut-off to say that a particular difference is due to divergence (present in > X% samples) and so is not interesting?

**Guillefriis** · 11-05-2015, 02:35 AM

@HESmith I wouldn't like to use a de novo assembly since I need the genomic positions of the variants provided by the zebra finch genome (planning to do a genome scan).

@WhatsOEver (I don't know if it's a good practice to answer to two posts in one, please let me know if forum users prefer them separatedly) I see your point and actually I thought it could work as you say, only looks computational time wasting to look over differences with respect the reference (there are going to be a lot of them) and extracting between-samples variants afterwards. Looks like SelectVariants GATK tool can do so, but I'm not sure how exactely, somebody has used it? Also, I'm not sure of the behavior of the soft callers when heterozygous at these position, a variant heterozigous site between my samples be filter out because both of the samples have an alternate allele matching the reference?

@GenoMax I'm not sure if I understood you, I'm not interested in reference-relative variants because my study is focused in phylogenomic relationships within an emberizid genus while my reference is the Zebra Finch, only used for the mapping and downstream analyses.

Thanks you all guys.

**GenoMax** · 11-05-2015, 05:00 PM

Originally posted by Guillefriis View Post

@GenoMax I'm not sure if I understood you, I'm not interested in reference-relative variants because my study is focused in phylogenomic relationships within an emberizid genus while my reference is the Zebra Finch, only used for the mapping and downstream analyses.

You may not be interested in them but that is how you are going to pick them, right? Have you done a test to see what this result looks like? I am not an evolutionary biologist by a long shot so I don't know how ~20M year difference has affected the overall genome organization (# of chromosomes, sizes etc).

With 600 samples you likely have enough data to try some assemblies with a random sampling of reads. That may prove to be a better reference.

It is late and my mind is wandering ...

**nucacidhunter** · 11-05-2015, 09:44 PM

I wonder if you have considered pyRAD:
http://dereneaton.com/software/pyrad/

**Guillefriis** · 11-06-2015, 08:00 AM

Originally posted by GenoMax View Post

You may not be interested in them but that is how you are going to pick them, right? Have you done a test to see what this result looks like? I am not an evolutionary biologist by a long shot so I don't know how ~20M year difference has affected the overall genome organization (# of chromosomes, sizes etc).

With 600 samples you likely have enough data to try some assemblies with a random sampling of reads. That may prove to be a better reference.

It is late and my mind is wandering ...

I think I'll try. I'll lose genomic position of the variants but I can end with a larger number of them, which it's better in phylogenetic terms. Never have done an assembly though!

**Guillefriis** · 11-06-2015, 08:46 AM

Originally posted by nucacidhunter View Post

I wonder if you have considered pyRAD:
http://dereneaton.com/software/pyrad/

You know @nucacidhunter I had a look and seems pretty interesting, I think that I'll do an intersection called SNPs using bith gatk and pyRAD. Thanks man.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Genotype calling within your sample set instead relative to reference genome

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News