SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
mapping the genomic location to chromosomal location ash9nov Bioinformatics 0 11-13-2014 04:23 AM
Finding common genomic regions... milesgr Bioinformatics 0 03-27-2013 05:54 AM
cDNA or protein location to genomic DNA location Liam_Gallagher Bioinformatics 5 05-11-2012 02:34 AM

Reply
 
Thread Tools
Old 12-21-2015, 11:58 PM   #1
ErikFas
Member
 
Location: Sweden

Join Date: Jun 2014
Posts: 86
Default Finding the genomic location of an insert

Is there some way to use RNA-seq and/or whole genome sequencing data (I have both for the relevant samples) to find the genomic location of an insert with an unknown location? The insert itself is of known sequence, and aligns correctly to a reference containing only itself + some minor control sequences.

I was told that one thing I might do is to align my data to the reference containing only the insert sequences, but split my (paired-end) data into two, i.e. only align one pair at a time as a single end ("..._1"-files and "..._2"-files separately). I should then take out all the reads that align (by name) and subset the other original fastq files by them so I get their mates (i.e. subset "..._2" by aligned reads in "..._1") and align those to the normal reference genome, again single-end. I would then, hopefully, get reads aligning to the same region, and I would know the location of my insert (after which I could create some PCR primers and validate the results).

I have done this with my WGS-data, but the reads map more or less randomly across all chromosomes... I feel I might be subsetting the read names wrong, somehow, mostly because I don't think I'm sure exactly how they are given names and how to find the pairs properly. At the moment, this is what I'm doing:

Code:
(... alignment with BWA)

samtools view mapped.sorted.rmdup.input_1.bam | \
	gawk '{print $1}' | \
	sort | \
	uniq > unique.txt

fastqutils filter -whitelist unique.txt input_2.fastq > 1-to-2.fastq
Am I doing something wrong with the analysis, or is the idea somehow flawed? I am being fairly stringent in the first alignment step, using the -B 40 -O 60 -E 10 options (with BWA), in order to hopefully only align more exact matches (I have also done without this stringency, with more or less the same results).

Does anybody have any idea what I'm doing wrong, what's wrong with the idea, or have any other idea on how to find an unknown insert?
ErikFas is offline   Reply With Quote
Old 12-22-2015, 01:10 AM   #2
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 414
Default

This is quite difficult in general and leads to false positive hits in my experience.

It's difficult to have an idea how many false positives you can expect without knowning the read length and genome size / repetitivity.

Maybe you've tried this, but doing a couple of de novo assemblies and looking for the - if present - flanking genomic regions around your insert would probably be more helpful. If these are mappable and unique in the genome, then that is good evidence.
colindaven is offline   Reply With Quote
Old 12-22-2015, 03:09 AM   #3
ErikFas
Member
 
Location: Sweden

Join Date: Jun 2014
Posts: 86
Default

Ah, interesting... I have never done a de novo assembly before, either on genomic or transcriptome level. I assume you're advicing I do it on the genomic level, or? Could you point me towards some tool(s) that I could use for this?
ErikFas is offline   Reply With Quote
Old 12-27-2015, 11:27 PM   #4
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 414
Default

For RNA-seq, a good de novo tool is Trinity. For genomic assemblies, perhaps Abyss, Minia or Soap de novo might suit your needs. Perhaps you can find these on a Galaxy instance somewhere if you have no experience, maybe at Iplant. I think Sweden has a very good infrastructure setup you could get time on too though (I forget what it's called).
colindaven is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:09 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO