Seqanswers Leaderboard Ad

**dpryan** · 09-03-2014, 07:16 AM

Do you just want to visually look, or do you want to extract all reads overlapping a gene or do you just want to call variants from your RNAseq data?

The solutions, in order, would be IGV, samtools view (use a BED file file the exon coordinates), and GATK or one of the other common variant callers.

**Richard Finney** · 09-03-2014, 07:58 AM

promo alert ...

This thing is pretty good and fast for viewing bam files ...
https://github.com/NCIP/alview Native GUI for view BAM files.

Web Demo : https://cgwb.nci.nih.gov/cgi-bin/alview

**dpryan** · 09-03-2014, 08:02 AM

@Richard: Looks cool. Can you look at multiple BAM files at once?

**Richard Finney** · 09-03-2014, 08:04 AM

Coming soon !

The command line generate beaucoup images and place in slide show and just flip through real quick is the current solution; but I'm getting enough requests that I'll put that in. It's designed to accommodate multiple images; I just didn't test it so I disabled it for now.

But I'll fix that in next few days.

stay tuned.

**ErikFas** · 09-03-2014, 10:35 PM

Originally posted by dpryan View Post

Do you just want to visually look, or do you want to extract all reads overlapping a gene or do you just want to call variants from your RNAseq data?

The solutions, in order, would be IGV, samtools view (use a BED file file the exon coordinates), and GATK or one of the other common variant callers.

Still new to bioinformatics, so I'm not sure exactly which of those applies... I'm in essence looking for a way to check if there are mutations in a single gene I'm interested in in my different samples. This was just an idea I had for further analysis of my samples, i.e. divide them into different mutations in this gene, and do DE-analysis on them as per the resulting grouping.

**dpryan** · 09-03-2014, 11:28 PM

The first or third solution. Visually looking at things is probably the simplest if you just want a quick answer and you only have one or two shortish genes of interest. Variant calling (i.e., finding mutations) itself would be more appropriate for the whole genome (you'll likely get better results that way rather than just calling variants in a couple genes).

**ErikFas** · 09-04-2014, 03:38 AM

Ah, thanks!

I've done tried to learned the IGV program, and I have successfully gotten to the point where I can see the reference (hg19, from their server) and my own reads (from my bam-files + indexed bam files, .bai) on top of them. Now I'm having some trouble actually checking the sequence, in that they seem to be... wrong.

I'm interested in the known cancer somatic mutation in KRAS: http://cancer.sanger.ac.uk/cosmic/mu...verview?id=532. As far as COSMIC tells me, it's supposed to be situated at chr12:25398276-25398286, and it's supposed to be a Glycine. Looking at the reference, it says that the amino acid is a Glycine, but looking at the actual codon it reads as GCC. And, unless I'm super stupid or have totally misread my codon chart, is supposed to be a Alanine. Or is the "Sequence" row in the reference the consensus sequence for my actual sample? It does say "reference sequence" when I hover over it...

Attached Files

Screen Shot 2014-09-04 at 13.31.40.png (36.1 KB, 7 views)

**dpryan** · 09-04-2014, 03:43 AM

Don't forget to consider the strand

**ErikFas** · 09-04-2014, 05:03 AM

Oh, yeah... I've worked too much with proteins

Just to be clear, the sequence GCC is reverse complementary to the original sequence, i.e. GGC, which is Glycine? What is the reasoning for the software to correctly write out Glycine for the amino acid, but GCC for the sequence? I'm assuming there's a good reason for it?

**dpryan** · 09-04-2014, 06:20 AM

Sequence is always shown 5'->3' and typically only for the + strand (unless you're looking at a transcript sequence by itself, in which case it'll always be the strand you're interested in). So it's showing the + strand sequence and knows from the annotation file that there's a protein coding gene on the opposite strand and only reverses that for you (otherwise, it'd have to swap which way it orients things as you scroll...and that'd get confusing).

**ErikFas** · 09-04-2014, 11:21 PM

This has been a long overdue read-up on strands, apparently. So, if I understand you and my old coursebook correctly...

DNA template strand (coding, -) is transcribed into mRNA sense strand (+)
mRNA (+) is reverse transcribed into cDNA (both +/-), which is then sequenced
IGV is showing the cDNA sense (+) strand as the sequence (GCC), which is non-coding, but showing the complementary (-) strand as the amino acid (Glycine=GGC), which is coding

... or? Sorry for being so slow on the uptake, I really do appreciate the help!

**dpryan** · 09-04-2014, 11:35 PM

Originally posted by ErikFas View Post

This has been a long overdue read-up on strands, apparently. So, if I understand you and my old coursebook correctly...

DNA template strand (coding, -) is transcribed into mRNA sense strand (+)
mRNA (+) is reverse transcribed into cDNA (both +/-), which is then sequenced
IGV is showing the cDNA sense (+) strand as the sequence (GCC), which is non-coding, but showing the complementary (-) strand as the amino acid (Glycine=GGC), which is coding

... or? Sorry for being so slow on the uptake, I really do appreciate the help!

When a DNA sequence is stored, we only write down the + strand, so that's what'll normally be shown...regardless of the strand you're actually interested in. If a gene is on the - strand and a program knows that and is showing you the resulting protein sequence, it'll be smart enough to reverse complement things (yes, this is due to the sense strand of the mRNA matching the - strand of the DNA...again, because the gene is on the - strand) before translating into amino acids.

In short, DNA sequence = + strand, protein sequence = whatever is appropriate given the annotation. So yeah, you might have to count from the right and reverse complement things if, for example, you want to look for the 3rd base in codon.

**ErikFas** · 09-05-2014, 12:19 AM

Alright, thanks a lot! That does clear that up. However...

I just checked another mutation I'm interested in, the PIK3CA (H1047R) mutation (http://cancer.sanger.ac.uk/cosmic/sa...iew?id=1998442). Without changing anything in the view, I go over to that location, and look at the sequence and the mutation, which is supposed to be a H to an R. The sequence reads CAT, which IS actually an Histidine, without having to do any reading from the right and complementary! The same goes for the mutation, CGT, into an Arginine, also without any non-default reading.

Why is it that some of the sequence seems to need a non-standard reading of the sequence to be right, while others do not? Again, the only thing I changed was the actual zoom in IGV (i.e. the chromosal coordinates), nothing else.

**dpryan** · 09-05-2014, 12:22 AM

PIK3CA is on the plus strand. You'd have to do things differently for genes on the - strand (like KRAS).

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Getting sequence for a gene from a BAM file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News