Seqanswers Leaderboard Ad

**GenoMax** · 12-10-2015, 01:26 PM

That is strange. Is the identity to the protein/gene you are expecting? What kind of an experiment has the sequence come from (DNA/RNA seq)?

**PatrikD** · 12-10-2015, 03:13 PM

Are you sure you're looking at the correct reading frame for the protein sequence? If you start reading codons one base over, you'll just get random amino acids with a bunch of stop codons.

I would guess you either have a homopolymer sequencing error (reading one base too few or too many in a series of identical bases), a mis-called start of the ORF, or a pseudogene with an indel.

**gsgs** · 12-10-2015, 09:56 PM

can you post or upload an excerpt of the sequence ?

even if you "construct" a sequence with low protein similarity,
with 5% nonmatching nucleotides you can only get 15%
nonmatching amino acids

**vpi** · 12-10-2015, 11:34 PM

Thank you all for the replies!

Its a DNA sequence obtained from Miseq and using IDBA assembler. Just for the information, DNA sequence was translated using ExPASy, and I took the sequence from 5'3'Frame 1.

I tried to remove '*' and blastp, then I get few hits with very low identity to a bacterial genome, whereas, the nucleotide sequence shows high identity to a virus. Can I use such a sequence for homology modeling where stop codons are manually removed?

**gsgs** · 12-10-2015, 11:37 PM

can you also post the nucleotides-axcerpt

**gsgs** · 12-11-2015, 12:34 AM

I took the opportunity to update my virus database from genbank...
I'm not sure yet what virus it could be
probably none of the 3 big HIV,Influenza,Hepatitis

maybe the amino-acid sequence of that virus is not available ?

here are possible other reading frames:
[I use "}" for stop codons]

Code:

>test1
LVSKSVGSDCCYLLHCHSVMSKVN}CVQVLYSPRGSWAPHSNMFGFHVNADFCIYCDRIMLINYADSNG-
>test1 revert
QR}}IRWPSTSN}HRPYPRIWARPKPTWLCVVNI}KL}CPAGPHDWSG}TS}CYRSTCYHRLQAFGFTT-
>test1 revertf
NHCCPHS}}A}SGHNKCKNPRLHETQTC}SVVPTSP}GCKVLEHIS}LCS}RCGNGASSNNQNLRIWTP-
>test1 revert revertf
GCEPECLESVVAGRTVTL}RLP}PVVWTCRALKLLDVYNTEPRGFGTCPYSGVRAVSIRRRRPAYSLPL-
>test1
WCPNP}VLIVATCSIATAS}AKSTDVFKYFTALGARGHHTLTCLGFM}TRIFAFIVTGSCSLTMRTAMV
>test1 revert
NGSEYAGRLRRIDTARTPEYGHVPNPRGSVL}TSKSFSALQVHTTGQGRRHSVTVLPATTDSKHSGSQ-
>test1 revertf
TIAVRIVNEHDPVTINAKIRVYMKPKHVRVWCPRAPRAVKYLNTSVDFAHDAVAMEQVATIRTYGFGHQ
>test1 revert revertf
VVNPNAWSRW}QVER}HYDVYPDQSCGPAGH}SF}MFTTQSHVGLGRAHIRGYGRCQFDVDGQRIHYR-
>test1
GVQIRRF}LLLLAPLPQRHEQSQLMCSSTLQP}GLVGTTL}HVWVSCKRGFLHLL}PDHAH}LCGQQW-
>test1 revert
TVVNTLAVYVELTPPVPPNMGTSQTHVALCCKHLKALVPCRSTRLVRVDVIVLPFYLLPPTPSIRVHN-
>test1 revertf
PLLSA}LMSMIRSQ}MQKSAFT}NPNMLECGAHEPLGL}ST}THQLTLLMTLWQWSK}QQSEPTDLDT-
>test1 revert revertf
L}TRMLGVGGSR}NGNTMTSTLTSRVDLQGTKAFRCLQHRATWVWDVPIFGGTGGVNST}TASVFTTV-

**GenoMax** · 12-11-2015, 12:40 AM

@vpi: Have you blasted some of original reads to make sure the data is from the right organism. This would not be the first time there is a mixup of some kind (either in your hands or at the sequence provider)?

Possibility also exists that IDBA has produced a bad assembly.

**gsgs** · 12-11-2015, 12:51 AM

TTTACATGAAACCCAAACATGTTAGAGTGTGGTGCCCACGAGCCCC
,-3.nuc, JX514942, Human enterovirus C116

>test1 revertf
TTTACATGAAACCCAAACATGTTAGAGTGTGGTGCCCACGAGCCCC

>test1 revertf
YMKPKHVRVWCPRA-

inverse reading frame, starting at the 3rd nucleotide TAC...
this partial amino-acid sequence is also given in the genbank record

Human enterovirus C116 polyprotein gene, complete cds - Nucleotide - NCBI

http://www.ncbi.nlm.nih.gov/nuccore/JX514942

**vpi** · 12-11-2015, 01:06 AM

Thank you all for the suggestions!

**kmcarr** · 12-11-2015, 05:47 AM

Originally posted by vpi View Post

Thank you all for the replies!

Its a DNA sequence obtained from Miseq and using IDBA assembler. Just for the information, DNA sequence was translated using ExPASy, and I took the sequence from 5'3'Frame 1.

I tried to remove '*' and blastp, then I get few hits with very low identity to a bacterial genome, whereas, the nucleotide sequence shows high identity to a virus. Can I use such a sequence for homology modeling where stop codons are manually removed?

Why are you only translating in one frame? An assembled contig from raw reads is not expected to be in frame or even in the proper direction. You need to examine all 6 possible reading frames (3 forward & 3 reverse) for the correct coding sequence. Also, if there any misassemblies which result in "indels" in your contig this would cause the proper reading frame to shift in the middle of your coding sequence.

**vpi** · 12-14-2015, 01:42 AM

Thanks a lot @kmcarr....your suggestion was of great help. I looked at other frames and found that 3'5' frame 1 had a perfect sequence with no stop codons in between.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Working with a sequence with high similarity at nucleotide level but none at protein

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News