Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should I trust in the Isotigs assembled by newbler in our experiment?

    Hi all,

    I perform de novo assemble of 454 reads by using gs assembler (newbler, version 2.3). This program could give output information about alternative splicing isoforms -- there can be more than one transcripts (isotigs) assembled for each gene (isogroup). I have got a summary listed below:

    Code:
    NumIsotigsInIsogroup    NumIsogroups
        1                      35
        2                      12
        3                      11
        5                       1
        6                       1
        7                       1
       12                       1
    It would be OK if they are indeed splicing variants. However, we actually only put one clone from each gene into one deepwell for sequencing. So ideally there should be only one isotig assembled in every isogroup. I have had a look at the report file (454IsotigsLayout.txt). There are something like this:
    Code:
    ...
    >isogroup00004  numIsotigs=6  numContigs=6
    [FONT="Fixedsys"]  Length : 4     1329  700   1074  16    196   (bp)
       Contig : 00016 00110 00133 00017 00134 00156 Total:
    isotig00022 >>>>>             >>>>>       <<<<<  1274
    isotig00023                         >>>>>       <<<<<  1270
    isotig00024             <<<<<                   <<<<<  1525
    isotig00025                         >>>>>       >>>>>         716
    isotig00026 >>>>>             >>>>>              1078
    isotig00027             >>>>>                          1329[/FONT]
    ...
    The only difference between isotig00022 and isotig00023 is the former one has an additional contig, which is only 4 nt in length! I don't think there is an exon only with 4 nucleotides...

    My question is, should I trust in the isoforms generated by newbler? If yes, we should have picked >1 clones by mistake for the genes with multiple transcriptions assembled. If no, what can I do in case I want to "optimize" the results? Can I pairwise align all the isotigs in one isogroup and collapse similar isotigs (like isotig00022 and isotig00023) into one?

    It's the first time I play with next-generation sequencing data... Any suggestion is appreciated.

    Thanks in advance!
    Last edited by sulicon; 09-23-2010, 08:06 AM.

  • #2
    Unfortunately, the 454ISotigsLayout.txt part you copied doesn't align. Could you paste it with 'code' tags around?

    Other than that, it does sound strange that you get more than one isotig for each transcript. You really handpicked clones, and sequenced them pooled? Could there be paralogs (multiple members of the same gene)?

    You could try increasing the alignment stringency by going higher for -mi and -ml, this might get paralogs split into separate isogroups...

    Comment


    • #3
      The only difference between isotig00022 and isotig00023 is the former one has an additional contig, which is only 4 nt in length! I don't think there is an exon only with 4 nucleotides...
      This can happen sometimes as a result of sequencing errors, especially in homopolymer regions - a few reads will have a few bogus nucs and newbler decides it's a mini-exon.

      Comment


      • #4
        Thanks for your suggestion. I have edited the alignment.

        Our collaborators performed the experiments. It was said that clones from paralogs were put into different "deepwells". However, some isotigs indeed come from genes sharing some similarity with our target genes in both ends -- they should be amplified by non-specific primer-target interaction... And in some cases, more then one clones were picked into one "deepwell" -- we got this conclusion by Sanger sequencing some of the clones.

        They suggested us that we should use "-genomic" (the default) option, rather than "-cdna", for this analysis, so that Newbler could make "isotigs of an isogroup into one contig if possible". And they also suggested higher -mi (94%) and -ml (50) parameters in assembly. I no longer get multiple isotags from single gene in this way, but I have to deal with "segment contigs" -- I think in cases where there are indeed multiple isoforms sequenced for one gene, Newbler has to split them into segments if "-genomic" option used.

        Maybe I should just discard these isotigs (when -cnda option used) or segments (when -genomic option used). It would be fine if newbler could assemble all the isoforms correctly (using -cnda), but I'm afraid there would be many artificial ones and some irrelevant isotags (from different genes) have been grouped together due to some vector sequences remained (I don't know why these vector sequences are failed to be trimmed...)

        Comment


        • #5
          Originally posted by cram View Post
          This can happen sometimes as a result of sequencing errors, especially in homopolymer regions - a few reads will have a few bogus nucs and newbler decides it's a mini-exon.
          Thanks for your explanation

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          39 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          35 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X