SEQanswers

Go Back   SEQanswers > Applications Forums > Genomic Resequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mpileup/BCFtools pipeline not picking up indels (suggestions please) cam.jack Bioinformatics 7 05-17-2013 02:05 PM
Planning a cancer exome sequencing project sadiqsaleem09 Genomic Resequencing 6 05-09-2011 09:38 PM
Looking for access to Illumina for RAD sequencing project tlking Illumina/Solexa 0 11-02-2009 09:05 AM
Volunteers wanted! Sequencing Quality Control Project (SEQC) Joann Events / Conferences 2 10-09-2009 04:24 AM

Reply
 
Thread Tools
Old 06-16-2011, 06:04 PM   #1
giror
Junior Member
 
Location: St. Louis

Join Date: Jun 2011
Posts: 6
Question Help picking up an abandoned sequencing project

Hello All,

My lab has gotten two runs of mouse genome sequences from two genomes. Both runs are paired end sequenced, once with 36bp and once with 100bp. Unfortunately the people from the sequencing core who actually ran the sequences are unresponsive and the people from my lab who coordinated with them are long gone.

Now I am trying to pick up the sequencing project. So far there are SNPs and InDels from samtools, and SVs from GASV and BreakDancer. BreakDancer and samtools were run by others so I am not sure what parameters they received. I have several issues that I am looking for help on:

(1) I checked the deletions called by Breakdancer and GASV using the samtools sequence viewer, and found that reads often map in the called deletions, but have a lower quality. Does anyone have any sanity check suggestions for working with SVs from GASV and BreakDancer?

(2) I suspect that some of the deletions are due to transposable element insertions in the reference and vice versa. I would like to find the transposable element insertions, but don't know of any tools out there for doing this. Do you guys know of any? If not, does anyone have a suggestion for how to pull out of BAM files only the paired reads with one end mapped?

--> This last part is what I am struggling with because I don't even know how the BAM files were made and what was included. Also I heard that sometimes unmapped reads get the same coordinate as the mate, would this hurt my situation or is there a flag that I could use?


These are just two of my most pressing troubles, but please let me know if you have any suggestions. Thank you in advance for any help.

Last edited by giror; 06-17-2011 at 09:44 AM.
giror is offline   Reply With Quote
Old 06-17-2011, 11:25 AM   #2
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Transposon mapping with paired-end reads is straightforward.

1) Create a reference file that contains the sequence of each transposon.
2) Align read one and read two separately to the transposon reference.
3) Align read one and read two separately to the genome reference, using repeat masking (so you won't align to transposons).
4) Filter the read one genome alignments with the read two transposon alignments, using the unique read identifier.
5) Repeat with read two genome and read one transposon alignments.

There are more sophisticated strategies, but this works relatively well given adequate read depth.

-Harold
HESmith is offline   Reply With Quote
Old 06-17-2011, 11:34 AM   #3
giror
Junior Member
 
Location: St. Louis

Join Date: Jun 2011
Posts: 6
Default thanks Harold

This is generally the strategy I imagined. Unfortunately I am on an 8gig ram mac with a terabyte HD and I am not sure I could efficiently read through the entire BAM files which are 51 and 71 GB. The reads have already been mapped back to the genome, but I'm not sure of the parameters that were used. Do you know of a way I could get this information from the BAM?

If not, could you recommend an alignment program given the hardware constraints that I am under?

Last edited by giror; 06-17-2011 at 11:36 AM.
giror is offline   Reply With Quote
Old 06-17-2011, 11:55 AM   #4
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

The approach I suggested would almost certainly require repeating the alignments. I don't know which aligner was used to generate your existing dataset, but the repeats were either masked (yielding no matches) or not (multiple matches). Most aligners return the unique matches so, either way, the transposon reads would be missing.

Our aligners run on a server cluster, so I can't offer any software recommendations for your system. A cloud solution might be your best option.
HESmith is offline   Reply With Quote
Reply

Tags
bed, samtools, structure variation, transposable element

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:14 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO