SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
NGS Project / Data Management Software golharam General 3 05-22-2015 02:16 AM
Software Engineer for the Galaxy Project jgoecks Academic/Non-Profit Jobs 0 03-18-2014 01:19 PM
Software Engineers on the Galaxy Project tnabtaf Academic/Non-Profit Jobs 0 01-28-2013 04:59 PM
Galaxy Project Post-Docs and Software Engineers tnabtaf Academic/Non-Profit Jobs 0 03-05-2012 02:15 PM
SAMtools school project ppliu Bioinformatics 0 04-18-2011 05:33 PM

Reply
 
Thread Tools
Old 12-18-2015, 04:26 PM   #1
thickrick99
Member
 
Location: Washington

Join Date: Jul 2014
Posts: 21
Question Is SAMtools the right software for this project?

Hi Everyone,

Here is a brief summary of what I am trying to do with my project: Essentially, I want to find how a mutation in the mRNA sequence affects the amino acid sequence of a protein. I have whole exome sequencing data as a .sam file and I am interested in finding the flanking sequence +X nucleotides and -X nucleotides upstream and downstream from a specific site of the mutation. From here, I want to determine the amino acid sequence of that flanking sequence but it has to be correctly in frame from the original sequence.

Here are a few questions that I had in terms of using SAMtools and accomplishing these tasks:

1) I assume I need to find the consensus sequence for the reads in my whole exome sequencing data and how would I be able to do this with SAMtools. I found the mpileup command, but what would be the the reference fasta file in my case. Is finding the consensus even needed?

2) My main issue is going from the .sam file reads to being able to pinpoint the location of interest and get the flanking sequence. What do I need to do to process the .sam exome sequencing file to be able to determine the flanking sequence?

3) Once i find the flanking sequence, how do I figure out the amino acid sequence and adjust accordingly to make sure it is in frame?

4) How do i account for the multiple transcripts that may exist for a particular gene because of alternative splicing?

Sorry for all the questions, it is my first time working in this area. I appreciate any help! Thanks in advance!
thickrick99 is offline   Reply With Quote
Old 12-28-2015, 01:03 AM   #2
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

There are tools for this - snpEff and Annovar are popular.

Snpeff is quite simple too. You SNP will need to be in an _annotated_ mRNA seq of course.

Input is VCF format.
colindaven is offline   Reply With Quote
Old 12-28-2015, 12:18 PM   #3
thickrick99
Member
 
Location: Washington

Join Date: Jul 2014
Posts: 21
Default

Thanks for the response! Both of these tools seem helpful, however do they output the sequence of the flanking region/altered amino acid sequence as well? I quickly looked through snpEff and Annovar and it seems like the tools only tell you what the impact of an SNP is or what the amino acid change is. Im interested in not only determine what the amino acid change is, but also using the mutated amino acid sequence after the SNP for further analysis.

Do you know if these tools/other tools are able to accomplish this?
thickrick99 is offline   Reply With Quote
Reply

Tags
protein sequences, rna seq, sam file, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO