![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Contig vs contig or map against contig lib? | JackieBadger | Bioinformatics | 1 | 05-30-2016 06:34 AM |
Trinity paralog filtering | atcghelix | Bioinformatics | 4 | 07-16-2013 11:13 PM |
Trinity contig filtering | jkerry | RNA Sequencing | 0 | 02-16-2013 10:28 AM |
Trinity Help | dmacmillan | De novo discovery | 5 | 05-11-2012 12:20 PM |
SRMA Problem SAMRecord contig does not match the current reference sequence contig | gavin.oliver | Bioinformatics | 5 | 07-05-2011 06:28 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: virginia Join Date: Oct 2013
Posts: 3
|
![]()
Hello, all,
I have just got my Trinity assembly, the N50 looks good. However, I have 100 contigs length extends from 4000bp to 11700bp. Because we don't expect the contig size to be above 4000bp, so is this because of some genomic contaimination? And can anyone suggest a program or script that can filter out the bad contigs? Thanks a lot, |
![]() |
![]() |
![]() |
#2 |
Member
Location: CA Join Date: Jul 2013
Posts: 74
|
![]()
I'm not sure what those contigs would be. Maybe chimeras? Have you tried blasting them?
If you're just looking to get rid of all sequences over 'x' length, then you can do something like below (if you have bioPerl installed). Usage: perl script.pl --in startingfile.fas --cutoff 4000 --out prunedfile.fas Code:
#!/usr/bin/perl use strict; use warnings; use Getopt::Long; use Bio::SeqIO; my $inFile; my $cutoff; my $outFile; GetOptions ("in=s" => \$inFile, "cutoff=i" => \$cutoff, "out=s" => \$outFile) || die "Couldn't get parameters with Getopt::Long.\n"; my $seqIn = Bio::SeqIO->new(-file => $inFile, -format => 'fasta'); my $seqOut = Bio::SeqIO->new(-file => ">$outFile", -format => 'fasta'); while (my $seq = $seqIn->next_seq()) { if ($seq->length() < $cutoff) { $seqOut->write_seq($seq); } } Last edited by atcghelix; 10-10-2013 at 10:30 PM. Reason: Mention that you need bioPerl for this to run. |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: virginia Join Date: Oct 2013
Posts: 3
|
![]()
Thanks, it helps
![]() |
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: San Francisco, CA Join Date: Feb 2011
Posts: 286
|
![]() Quote:
Also, have you tried Trinity's downstream analysis modules. They are very good at picking out protein coding orfs and blasting your dataset to identify orthologs |
|
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: virginia Join Date: Oct 2013
Posts: 3
|
![]()
Yes, you're right,
I was thinking to remove the possible genomic contamination at first to set a cutoff for contig size. Now I guess I need to map the sequence back to assembly see how it goes, Thanks, ![]() |
![]() |
![]() |
![]() |
Tags |
contig quality, contig size, quality filtering, trinity assembly |
Thread Tools | |
|
|