Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Working with a sequence with high similarity at nucleotide level but none at protein

    Hi,

    I have a sequence generated from an assembler which shows 95% identity to a nucleotide sequence when I BLAST or blastx it. But there is no similarity at protein level (blastp). Also, the translated sequence has multiple asterisks (*) , which means stop codons. If its at the end it means the end of the sequence, what does it mean when there are many of them?

    I would like to do a homology model of the sequence. Any suggestions how to go ahead.

    Any help is very much appreciated.

    Thank you

  • #2
    That is strange. Is the identity to the protein/gene you are expecting? What kind of an experiment has the sequence come from (DNA/RNA seq)?

    Comment


    • #3
      Are you sure you're looking at the correct reading frame for the protein sequence? If you start reading codons one base over, you'll just get random amino acids with a bunch of stop codons.

      I would guess you either have a homopolymer sequencing error (reading one base too few or too many in a series of identical bases), a mis-called start of the ORF, or a pseudogene with an indel.

      Comment


      • #4
        can you post or upload an excerpt of the sequence ?

        even if you "construct" a sequence with low protein similarity,
        with 5% nonmatching nucleotides you can only get 15%
        nonmatching amino acids
        Last edited by gsgs; 12-10-2015, 09:59 PM.

        Comment


        • #5
          Thank you all for the replies!

          Its a DNA sequence obtained from Miseq and using IDBA assembler. Just for the information, DNA sequence was translated using ExPASy, and I took the sequence from 5'3'Frame 1.

          I tried to remove '*' and blastp, then I get few hits with very low identity to a bacterial genome, whereas, the nucleotide sequence shows high identity to a virus. Can I use such a sequence for homology modeling where stop codons are manually removed?
          Last edited by vpi; 12-11-2015, 01:01 AM.

          Comment


          • #6
            can you also post the nucleotides-axcerpt

            Comment


            • #7
              I took the opportunity to update my virus database from genbank...
              I'm not sure yet what virus it could be
              probably none of the 3 big HIV,Influenza,Hepatitis

              maybe the amino-acid sequence of that virus is not available ?

              here are possible other reading frames:
              [I use "}" for stop codons]

              Code:
              >test1
              LVSKSVGSDCCYLLHCHSVMSKVN}CVQVLYSPRGSWAPHSNMFGFHVNADFCIYCDRIMLINYADSNG-
              >test1 revert
              QR}}IRWPSTSN}HRPYPRIWARPKPTWLCVVNI}KL}CPAGPHDWSG}TS}CYRSTCYHRLQAFGFTT-
              >test1 revertf
              NHCCPHS}}A}SGHNKCKNPRLHETQTC}SVVPTSP}GCKVLEHIS}LCS}RCGNGASSNNQNLRIWTP-
              >test1 revert revertf
              GCEPECLESVVAGRTVTL}RLP}PVVWTCRALKLLDVYNTEPRGFGTCPYSGVRAVSIRRRRPAYSLPL-
              >test1
              WCPNP}VLIVATCSIATAS}AKSTDVFKYFTALGARGHHTLTCLGFM}TRIFAFIVTGSCSLTMRTAMV
              >test1 revert
              NGSEYAGRLRRIDTARTPEYGHVPNPRGSVL}TSKSFSALQVHTTGQGRRHSVTVLPATTDSKHSGSQ-
              >test1 revertf
              TIAVRIVNEHDPVTINAKIRVYMKPKHVRVWCPRAPRAVKYLNTSVDFAHDAVAMEQVATIRTYGFGHQ
              >test1 revert revertf
              VVNPNAWSRW}QVER}HYDVYPDQSCGPAGH}SF}MFTTQSHVGLGRAHIRGYGRCQFDVDGQRIHYR-
              >test1
              GVQIRRF}LLLLAPLPQRHEQSQLMCSSTLQP}GLVGTTL}HVWVSCKRGFLHLL}PDHAH}LCGQQW-
              >test1 revert
              TVVNTLAVYVELTPPVPPNMGTSQTHVALCCKHLKALVPCRSTRLVRVDVIVLPFYLLPPTPSIRVHN-
              >test1 revertf
              PLLSA}LMSMIRSQ}MQKSAFT}NPNMLECGAHEPLGL}ST}THQLTLLMTLWQWSK}QQSEPTDLDT-
              >test1 revert revertf
              L}TRMLGVGGSR}NGNTMTSTLTSRVDLQGTKAFRCLQHRATWVWDVPIFGGTGGVNST}TASVFTTV-
              Last edited by gsgs; 12-11-2015, 12:37 AM.

              Comment


              • #8
                @vpi: Have you blasted some of original reads to make sure the data is from the right organism. This would not be the first time there is a mixup of some kind (either in your hands or at the sequence provider)?

                Possibility also exists that IDBA has produced a bad assembly.

                Comment


                • #9
                  TTTACATGAAACCCAAACATGTTAGAGTGTGGTGCCCACGAGCCCC
                  ,-3.nuc, JX514942, Human enterovirus C116


                  >test1 revertf
                  TTTACATGAAACCCAAACATGTTAGAGTGTGGTGCCCACGAGCCCC

                  >test1 revertf
                  YMKPKHVRVWCPRA-

                  inverse reading frame, starting at the 3rd nucleotide TAC...
                  this partial amino-acid sequence is also given in the genbank record



                  Last edited by gsgs; 12-11-2015, 01:09 AM.

                  Comment


                  • #10
                    Thank you all for the suggestions!

                    Comment


                    • #11
                      Originally posted by vpi View Post
                      Thank you all for the replies!

                      Its a DNA sequence obtained from Miseq and using IDBA assembler. Just for the information, DNA sequence was translated using ExPASy, and I took the sequence from 5'3'Frame 1.

                      I tried to remove '*' and blastp, then I get few hits with very low identity to a bacterial genome, whereas, the nucleotide sequence shows high identity to a virus. Can I use such a sequence for homology modeling where stop codons are manually removed?
                      Why are you only translating in one frame? An assembled contig from raw reads is not expected to be in frame or even in the proper direction. You need to examine all 6 possible reading frames (3 forward & 3 reverse) for the correct coding sequence. Also, if there any misassemblies which result in "indels" in your contig this would cause the proper reading frame to shift in the middle of your coding sequence.

                      Comment


                      • #12
                        Thanks a lot @kmcarr....your suggestion was of great help. I looked at other frames and found that 3'5' frame 1 had a perfect sequence with no stop codons in between.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        18 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        47 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X