Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA de novo assembly - blasts - KEGG - GO

    Hello,

    I am a phd candidate to bioninformatics and with (almost) 0 guidance. Seeking help here.. I was asked to do a de novo RNA transcriptome assembly from a total RNA sequencing. After fastqc i trimmed my original fastq and then ran trinity. So i got my trinity_trimmed.fasta. So, some of the things i was asked to do are:

    1) fill out a table like this one :

    | total number | total length(nt) | mean length(nt) | N50 | total consensus sequences | Distinct Clusters | Distinct Singletons

    Contig
    ______

    Unigene

    I used TrinityStats.pl and got this :

    ## Counts of transcripts, etc.
    ################################
    Total trinity 'genes': 87177
    Total trinity transcripts: 169974
    Percent GC: 40.18

    ########################################
    Stats based on ALL transcript contigs:
    ########################################

    Contig N10: 3290
    Contig N20: 2503
    Contig N30: 2049
    Contig N40: 1713
    Contig N50: 1413

    Median contig length: 529
    Average contig: 869.67
    Total assembled bases: 147821426

    #####################################################
    ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
    #####################################################

    Contig N10: 3087
    Contig N20: 2301
    Contig N30: 1816
    Contig N40: 1414
    Contig N50: 1029

    Median contig length: 348
    Average contig: 632.11
    Total assembled bases: 55105774

    My question has 2 parts : a) can i fill out this table with this information? b) Some people use cap3 assembly tool. I have already done that too in case i need it. Is that the way to go ? I need to check the quality of trinity_trimmed.fasta ?

    for cap3 i also used TrinityStats.pl and got this :

    for contigs:

    Total trinity 'genes': 23017
    Total trinity transcripts: 23017
    Percent GC: 40.42

    ########################################
    Stats based on ALL transcript contigs:
    ########################################

    Contig N10: 3885
    Contig N20: 3082
    Contig N30: 2598
    Contig N40: 2254
    Contig N50: 1971

    Median contig length: 1318
    Average contig: 1522.23
    Total assembled bases: 35037102

    - note: not reporting gene-based longest isoform info since couldn't parse Trinity accession info.

    for singletons:

    ## Counts of transcripts, etc.
    ################################
    Total trinity 'genes': 67695
    Total trinity transcripts: 81478
    Percent GC: 38.77

    ########################################
    Stats based on ALL transcript contigs:
    ########################################

    Contig N10: 1906
    Contig N20: 1347
    Contig N30: 1007
    Contig N40: 751
    Contig N50: 572

    Median contig length: 333
    Average contig: 490.70
    Total assembled bases: 39981353

    #####################################################
    ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
    #####################################################

    Contig N10: 1853
    Contig N20: 1284
    Contig N30: 917
    Contig N40: 671
    Contig N50: 508

    Median contig length: 317
    Average contig: 461.01
    Total assembled bases: 31207973


    2) blastp/blastx in excel files.

    i should use -outfmt 16 ?

    ( also hmmscan/pfam is needed for KEGG / GO terms ? )

    3) Do a KEGG and GO analysis. I should annotate the assembly ( but which one the trinity_trimmed.fasta or the cap3 one ? ) using Trinotate and then go with GOseq for GO? Or i could use blast2go, using the blastx/blatp files with -outfmt 16? (7 days trial version ) . Kegg also in blast2go or i could something llike this : https://www.kegg.jp/blastkoala/ ?

    i know i was long, sorry about that.

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 05-10-2024, 06:35 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-06-2024, 07:17 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Working...
X