SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Contig vs contig or map against contig lib? JackieBadger Bioinformatics 1 05-30-2016 06:34 AM
Trinity paralog filtering atcghelix Bioinformatics 4 07-16-2013 11:13 PM
Trinity contig filtering jkerry RNA Sequencing 0 02-16-2013 10:28 AM
Trinity Help dmacmillan De novo discovery 5 05-11-2012 12:20 PM
SRMA Problem SAMRecord contig does not match the current reference sequence contig gavin.oliver Bioinformatics 5 07-05-2011 06:28 AM

Reply
 
Thread Tools
Old 10-10-2013, 02:08 PM   #1
ripeapple
Junior Member
 
Location: virginia

Join Date: Oct 2013
Posts: 3
Default Trinity contig filtering

Hello, all,

I have just got my Trinity assembly, the N50 looks good. However, I have 100 contigs length extends from 4000bp to 11700bp.
Because we don't expect the contig size to be above 4000bp, so is this because of some genomic contaimination?
And can anyone suggest a program or script that can filter out the bad contigs? Thanks a lot,
ripeapple is offline   Reply With Quote
Old 10-10-2013, 10:22 PM   #2
atcghelix
Member
 
Location: CA

Join Date: Jul 2013
Posts: 74
Default

I'm not sure what those contigs would be. Maybe chimeras? Have you tried blasting them?

If you're just looking to get rid of all sequences over 'x' length, then you can do something like below (if you have bioPerl installed). Usage: perl script.pl --in startingfile.fas --cutoff 4000 --out prunedfile.fas


Code:
#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;
use Bio::SeqIO;

my $inFile;
my $cutoff;
my $outFile;

GetOptions  ("in=s"      => \$inFile,
             "cutoff=i"  => \$cutoff,
             "out=s"     => \$outFile) || die "Couldn't get parameters with Getopt::Long.\n";

my $seqIn = Bio::SeqIO->new(-file   => $inFile,
                            -format => 'fasta');
my $seqOut = Bio::SeqIO->new(-file   => ">$outFile",
                             -format => 'fasta');

while (my $seq = $seqIn->next_seq()) {
    if ($seq->length() < $cutoff) {
        $seqOut->write_seq($seq);
    }
}

Last edited by atcghelix; 10-10-2013 at 10:30 PM. Reason: Mention that you need bioPerl for this to run.
atcghelix is offline   Reply With Quote
Old 10-11-2013, 07:02 AM   #3
ripeapple
Junior Member
 
Location: virginia

Join Date: Oct 2013
Posts: 3
Default

Thanks, it helps
ripeapple is offline   Reply With Quote
Old 10-12-2013, 10:33 PM   #4
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Quote:
Originally Posted by ripeapple View Post
Hello, all,

I have just got my Trinity assembly, the N50 looks good. However, I have 100 contigs length extends from 4000bp to 11700bp.
Because we don't expect the contig size to be above 4000bp, so is this because of some genomic contaimination?
And can anyone suggest a program or script that can filter out the bad contigs? Thanks a lot,
Its possible its prespliced or incompletely spliced RNAs. However, why do you assume you shouldn't have >4000bp contigs? Ttn is >100,000bp. So certainly there are genes this big.

Also, have you tried Trinity's downstream analysis modules. They are very good at picking out protein coding orfs and blasting your dataset to identify orthologs
Wallysb01 is offline   Reply With Quote
Old 10-13-2013, 10:36 AM   #5
ripeapple
Junior Member
 
Location: virginia

Join Date: Oct 2013
Posts: 3
Default

Yes, you're right,
I was thinking to remove the possible genomic contamination at first to set a cutoff for contig size.
Now I guess I need to map the sequence back to assembly see how it goes,
Thanks,
ripeapple is offline   Reply With Quote
Reply

Tags
contig quality, contig size, quality filtering, trinity assembly

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:52 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO