Hi,
This is what Im trying to do. I have a recently release genome, for now just available as two large contigs. The sequences in question are from an insect.
There are also available the sequencing reads, which are composed
of two types of reads. some where made from a single individual, not
and other from pooled individuals, which are the ones Im
interested in.
The gene Im looking for is rather unknown, we have no idea what it
looks like, but we do know that it has one feature, a high number of
different alleles, so the idea is to align these reads made from pooled
individuals and align them on the "reference genome" (the two contigs) and then estimate SNPs. then the hope is to be able to identify potential gene
containing regions that show high levels of SNPs...
does this make sense? but I would like to know what is the most efficient way to do this so that the SNPs can be id only on exons, ORFs, or after eliminating repetitive, low complexity regions.
any input is welcome
This is what Im trying to do. I have a recently release genome, for now just available as two large contigs. The sequences in question are from an insect.
There are also available the sequencing reads, which are composed
of two types of reads. some where made from a single individual, not
and other from pooled individuals, which are the ones Im
interested in.
The gene Im looking for is rather unknown, we have no idea what it
looks like, but we do know that it has one feature, a high number of
different alleles, so the idea is to align these reads made from pooled
individuals and align them on the "reference genome" (the two contigs) and then estimate SNPs. then the hope is to be able to identify potential gene
containing regions that show high levels of SNPs...
does this make sense? but I would like to know what is the most efficient way to do this so that the SNPs can be id only on exons, ORFs, or after eliminating repetitive, low complexity regions.
any input is welcome