SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq and mouse reference genome ChristmasSunflower Bioinformatics 3 06-25-2014 11:23 PM
custom reference genome for RNA-seq enelkinsan Bioinformatics 2 01-05-2013 02:45 AM
RNA-seq with No Reference Genome taylormjeffery Bioinformatics 1 06-06-2012 08:32 AM
the downstream analysis of RNA-seq Xi Wang RNA Sequencing 18 04-15-2011 07:43 AM
RNA-seq assembly and reference genome lfaino Bioinformatics 3 04-13-2011 07:05 AM

Reply
 
Thread Tools
Old 05-19-2014, 06:02 AM   #21
nareshvasani
Member
 
Location: NC

Join Date: Apr 2013
Posts: 57
Smile

Hi Sabbir,

How did you confirm that assembly from MIRA4, Trinity and velvet assembler were not good enough?
I had very big input file (fastq) which was not very well handled by all assembler. So, in order to generate input file which can be very well handled by Trinity, i processed several steps like trim, duplication removal from each fastq files and finally combining all fasta files into one fasta file.


Below is the command I used for Trinity [I used same cmd as suggested by trinity website]

1] Trinity.pl -seqType fa -min_contig_length 200 -JM 40G -CPU 4 -single inputfilename.fasta -output trinity_output

#### took 2 days to complete 100% ######

2] /bin/util/TrinityStats.pl Trinity.fasta # gives basic stat information i.e.e assembly file info.

Total trinity transcripts: 128578
Total trinity components: 43455
Contig N50: 862


Hope this helps.
Best luck,
Naresh


Quote:
Originally Posted by sabbir_barj View Post
Hi nareshvasani,

I am curious about your assembly result with trinity. I have a instrument of Ion proton data and I am trying to assembly the transcriptome data. But I did not get good result with MIRA4, Trinity and Velvet assembler. Can you tell me your command line for trinity and general information about your raw read and assembly file? it will help me a lot.Thanks


sabbir
nareshvasani is offline   Reply With Quote
Old 05-19-2014, 07:52 PM   #22
sabbir_barj
Junior Member
 
Location: Bangladesh

Join Date: Feb 2014
Posts: 4
Default

Hi nareshvasani,

Thanks for your reply. I got huge number of contigs from the three assembler than I expected. From the genome sequences of my species, the contigs number should be 30000-40000 but I got more than 200000 contigs. Also I got large size of trancripts than I expected. Following pipeline I used for assembl
1. remove adapter by cutadapt
2. remove duplicate
3. then assembly with trinity.
I also used trinity quality options

I have also a big input file (Fastq). Accroding to your suggestions I need to run several tools to improve input file. Can you please tell me in details the several steps like trim, duplication removal from each fastq files and finally combining all fasta files into one fasta file for assembly (name of different tools and how to combine the fasta files)?

Regards
Sabbir
sabbir_barj is offline   Reply With Quote
Old 05-20-2014, 12:46 PM   #23
nareshvasani
Member
 
Location: NC

Join Date: Apr 2013
Posts: 57
Smile

Hi Sabbir,

Below are the following steps I used for trimming:

The following command keep reads which has quality score above 20 in at least 50% of bases.
> fastq_quality_filter -Q33 -q20 -p 50 -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>.qulaity_filter.fastq

The following operation removes nucleotides having quality scores lower than 20 from the ends of reads. Furthermore, any trimmed reads having lengths less than 50 nucleotides are discarded altogether:
> fastq_quality_trimmer -Q33 -t 20 -l 50 -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>.clean.fastq

To remove base sequence content and GC content from the end of reads, following command was used. It removes 15 nucleotides from the end of reads.
> fastx_trimmer Q33 f 1 -l 335 -i <SAMPLE_NAME>.clean.fastq -o <SAMPLE_NAME>.fastx_trimmer.fastq

After this step, the read length distribution changed minimally, with the majority of reads retaining their full length. In addition around 25% of the reads were discarded completely.

In order to remove identical sequences, fastx_collapser tool was used:
> fastx_collapser -v -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>_collapsed.fasta
Above tools removes few millions reads from each files while maintaining all read counts and gives output in fasta format.


##All of the above steps will help you to reduce size of input file and improve quality of each fastq file.



Best,
NareshVasani
nareshvasani is offline   Reply With Quote
Old 05-20-2014, 03:52 PM   #24
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,057
Default

@sabbir: You are probably doing this already but use the example command lines/setting supplied by Naresh as a guideline. You will have to experiment with your own data.

What is your expected genome size? Do you have an idea of the approximate fold coverage you have? If there is a reference genome available then alignment may be a better option to try than assembly.
GenoMax is offline   Reply With Quote
Old 05-21-2014, 05:18 AM   #25
sabbir_barj
Junior Member
 
Location: Bangladesh

Join Date: Feb 2014
Posts: 4
Default

Hi NareshVasani,

Thanks for your kind reply. Did you run different trimming tools on the same file and the merge the output files of each tools or you run one after another?
Such as,
sample file >fastq_quality_filter >output file1
sample file> fastq_quality_trimmer> ouput file 2
sample file > fastx_trimmer> output file3
sample file > fastx_collapser >output file4

then merge the all 4 files.

then merge the four files, or
You run one tools one after, Like first you run fastq_quality_filter and then run fastq_quality_trimmer with output of fastq_quality_filter (input file is fastq_quality_filter output) and the following the tools.

If you followed first one then I want to how to you merge the files and why you did not use the fastq file?

Regards
Sabbir
sabbir_barj is offline   Reply With Quote
Old 05-21-2014, 06:08 AM   #26
nareshvasani
Member
 
Location: NC

Join Date: Apr 2013
Posts: 57
Smile

Hi Sabbir,


This cmd I used for my data. You do not have to follow same steps but you have to use this cmd as your reference as per your data need [as said bu Genomax]

I think you have not understand how this fastx toolkit work. You need to read manual of this toolkit properly.

First method you described it doesn't make any sense.
I used second method.

Best,
Nareshvasani


Quote:
Originally Posted by sabbir_barj View Post
Hi NareshVasani,

Thanks for your kind reply. Did you run different trimming tools on the same file and the merge the output files of each tools or you run one after another?
Such as,
sample file >fastq_quality_filter >output file1
sample file> fastq_quality_trimmer> ouput file 2
sample file > fastx_trimmer> output file3
sample file > fastx_collapser >output file4

then merge the all 4 files.

then merge the four files, or
You run one tools one after, Like first you run fastq_quality_filter and then run fastq_quality_trimmer with output of fastq_quality_filter (input file is fastq_quality_filter output) and the following the tools.

If you followed first one then I want to how to you merge the files and why you did not use the fastq file?

Regards
Sabbir
nareshvasani is offline   Reply With Quote
Old 05-21-2014, 06:23 AM   #27
sabbir_barj
Junior Member
 
Location: Bangladesh

Join Date: Feb 2014
Posts: 4
Default

Hi Nareshvasani,
Thanks a lot. You said before that several steps like trim, duplication removal from each fastq files and finally combining all fasta files into one fasta file. If you follow the second step, then how can you got several files and how can you merge file? Because you run the tools one after and working with output file of each tool?

Regards
Sabbir
sabbir_barj is offline   Reply With Quote
Reply

Tags
bioinformatic analaysis, ion torrent, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO