Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding SMALL peptides in de novo assembly

    Hi,

    I am trying to identify small peptides of the male accessory glands in a de novo transcriptome assembly of med fly.
    I have 100bp paired end Illumina data of male and female sex organs. The idea is to combine the data to do the assembly, map back the reads and look at transcripts expressed specifically in males. With this subset I could try to BLAST against known sex peptides of other species, although, they are not well conserved.

    The problem is the size of these peptides. Most of them are only 30-40 amino acids long. With the Illumina reads being already 200 bp long it is possible that every read that is not integrated into a longer transcript will pop up at this length. It is likely that the peptides are in the noise of the assembly.

    I wonder if anyone had to deal with a similar problem in the past and could give me a few hints on how to do such an analysis with very small transcripts.

    I don't want this first post to become too long to read, however, if it is of interest I can give details on how the assembly and mapping is done in a follow up.

    Thank you!

  • #2
    Just to clarify, you mean small transcripts rather than small peptides and sequencing nucleotides rather than amino acids, yes? You switch between protein and DNA/RNA nomenclature throughout, so it's difficult to be absolutely certain whether you're doing RNAseq or some sort of protein sequencing.

    Comment


    • #3
      Yes, I meant small transcripts. In the end I want to translate the RNAseq assembly to find the peptide sequences. So the transcripts of interest will be 90 to 120 bp long (maybe a bit longer with UTR).

      Sorry for the confusion.

      Comment


      • #4
        Since no one else has given any input, I'll give a little advice.

        Firstly, you'll have to keep in mind that you can only even look for a fraction of the possible peptides. Remember that peptides can be divided into (in this case) 3 groups: small proteins arising from their own genes (e.g. insulin), cleaved proteins arising from larger genes (e.g. amyloid beta from APP), synthesized small peptides (e.g. a lot of the neurotransmitters). You will only have a chance at finding the first group with RNAseq (unless you can predict cleavage sites, in which case you could maybe predict the second class).

        What I would do is to first do the de novo assembly, and then exclude any contigs >300bp (since you indicated that those are unlikely candidates). I would also filter contigs that are too small (otherwise, you'll get a bunch of small ncRNAs). I would then further filter things by minimum open reading frame length. Obviously, if a contig has an ORF of 30bp, then it's unlikely to meet your criterion. You might be able to further rank things by taking codon usage or similar characteristics into account (obviously you would want to see if these characteristics are predictive in other species). In short, see what properties would distinguish these sorts of peptides in other species where the peptides are known.

        That's probably the best you can do without having any predicted conserved motifs. Presumably the real transcript would have a lot of copies, so you want to filter by that. If these peptides tend to have a similar structure, you might try to do a prediction on the ORFs and filter accordingly. Of course, this assumes that the actual peptide isn't just cleaved from something else...

        Personally, I would look into screening proteins at the same time, since that'll probably be more informative (heck, a couple 2D gels could probably narrow things down both in size and other characteristics).

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X