SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
De novo small RNA assembly Juulluu21 Bioinformatics 2 07-01-2016 03:53 AM
Mapping transcripts to Kegg gene IDs / Using Paintomics on a non-Kegg species anth Bioinformatics 0 06-27-2014 02:26 PM
Compare de-novo transcriptome assembly to genome reference guided assembly IdoBar Bioinformatics 1 04-04-2014 12:28 AM
Inquiry: minimum length of reads for referece-based assembly or de novo assembly sunfuhui Bioinformatics 1 10-04-2013 09:28 AM

Reply
 
Thread Tools
Old 03-18-2020, 02:58 AM   #1
polaxgr
Junior Member
 
Location: Athens

Join Date: Mar 2018
Posts: 6
Default RNA de novo assembly - blasts - KEGG - GO

Hello,

I am a phd candidate to bioninformatics and with (almost) 0 guidance. Seeking help here.. I was asked to do a de novo RNA transcriptome assembly from a total RNA sequencing. After fastqc i trimmed my original fastq and then ran trinity. So i got my trinity_trimmed.fasta. So, some of the things i was asked to do are:

1) fill out a table like this one :

| total number | total length(nt) | mean length(nt) | N50 | total consensus sequences | Distinct Clusters | Distinct Singletons

Contig
______

Unigene

I used TrinityStats.pl and got this :

## Counts of transcripts, etc.
################################
Total trinity 'genes': 87177
Total trinity transcripts: 169974
Percent GC: 40.18

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 3290
Contig N20: 2503
Contig N30: 2049
Contig N40: 1713
Contig N50: 1413

Median contig length: 529
Average contig: 869.67
Total assembled bases: 147821426

#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

Contig N10: 3087
Contig N20: 2301
Contig N30: 1816
Contig N40: 1414
Contig N50: 1029

Median contig length: 348
Average contig: 632.11
Total assembled bases: 55105774

My question has 2 parts : a) can i fill out this table with this information? b) Some people use cap3 assembly tool. I have already done that too in case i need it. Is that the way to go ? I need to check the quality of trinity_trimmed.fasta ?

for cap3 i also used TrinityStats.pl and got this :

for contigs:

Total trinity 'genes': 23017
Total trinity transcripts: 23017
Percent GC: 40.42

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 3885
Contig N20: 3082
Contig N30: 2598
Contig N40: 2254
Contig N50: 1971

Median contig length: 1318
Average contig: 1522.23
Total assembled bases: 35037102

- note: not reporting gene-based longest isoform info since couldn't parse Trinity accession info.

for singletons:

## Counts of transcripts, etc.
################################
Total trinity 'genes': 67695
Total trinity transcripts: 81478
Percent GC: 38.77

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 1906
Contig N20: 1347
Contig N30: 1007
Contig N40: 751
Contig N50: 572

Median contig length: 333
Average contig: 490.70
Total assembled bases: 39981353

#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

Contig N10: 1853
Contig N20: 1284
Contig N30: 917
Contig N40: 671
Contig N50: 508

Median contig length: 317
Average contig: 461.01
Total assembled bases: 31207973


2) blastp/blastx in excel files.

i should use -outfmt 16 ?

( also hmmscan/pfam is needed for KEGG / GO terms ? )

3) Do a KEGG and GO analysis. I should annotate the assembly ( but which one the trinity_trimmed.fasta or the cap3 one ? ) using Trinotate and then go with GOseq for GO? Or i could use blast2go, using the blastx/blatp files with -outfmt 16? (7 days trial version ) . Kegg also in blast2go or i could something llike this : https://www.kegg.jp/blastkoala/ ?

i know i was long, sorry about that.
polaxgr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:26 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO