SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cuffdiff- no assembled transcripts crh Bioinformatics 0 06-19-2014 06:53 PM
problem about transcripts assembled by scripture tinkering Bioinformatics 0 05-08-2014 11:53 AM
extract assembled transcripts zchou RNA Sequencing 38 02-17-2014 03:04 AM
How is the coverage of 5 and 3 ends of assembled transcripts? papori Bioinformatics 1 01-17-2013 10:04 PM
fetch transcripts assembled by cufflinks asling Bioinformatics 6 09-27-2012 10:46 PM

Reply
 
Thread Tools
Old 07-04-2016, 07:12 AM   #1
DrYak
Member
 
Location: South Africa

Join Date: Sep 2013
Posts: 12
Question GC content of assembled transcripts

Hi,

I have a transcriptome assembly that is comprised of a metazoan and protist (intracellular symbiont) component. I know from the assemblies of similar organisms that the CG percentage of the two organisms are very different and can be used to separate contigs.

Is there a simple tool that I can use to (1) determine the GC content (%) of each contig in the final assembly (fasta) and (2) plot the histogram (or provide the bins for plotting) and (3) split the fasta file based on GC content (or provide a list of contigs with associated GC % that can be sorted and split).

I suppose 2 and 3 are a bit redundant because I can plot the histogram from the list of contigs and GC %.

Thanks,
Dave
DrYak is offline   Reply With Quote
Old 07-04-2016, 12:37 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi Dave,

The BBMap package has a tool called Stats (stats.sh) which will list the gc content per scaffold to a file like this:

Code:
stats.sh in=assembly.fa gc=gc.txt
You can plot a histogram and subsequently bin like this:

Code:
reformat.sh in=assembly.fa gchist=gchist.txt
reformat.sh in=assembly.fa out=low.fa maxgc=0.45
reformat.sh in=assembly.fa out=high.fa mingc=0.450001
Brian Bushnell is offline   Reply With Quote
Old 07-04-2016, 10:54 PM   #3
DrYak
Member
 
Location: South Africa

Join Date: Sep 2013
Posts: 12
Wink

Quote:
Originally Posted by Brian Bushnell View Post
Hi Dave,

The BBMap package has a tool called Stats (stats.sh) which will list the gc content per scaffold to a file like this:
I must admit I was hoping and, quite frankly, expecting, that BBMap would be capable of this. Once again thank you for your excellent package...

Yours,
Dave
DrYak is offline   Reply With Quote
Reply

Tags
fasta, gc content, histogram, transcript annotation

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO