Dear Bioinformatics community,
I am recently put on project on identifying plant microRNAs in animal tissues. Overall, I would like to sift through mouse and human small RNA deepseq raw data and see if there are any plant (dietary) microRNAs present in animal tissues (For rationale of science, you can read: Cell Research (2012) 22:107–126. doi:10.1038/cr.2011.158; published online 20 September 2011).
I am a plant biology postdoc and had limited scripting skills. But I have taken a bioinformatics class recently and am pretty family with Linux command line. I also learnt long time ago some programming with C, so I can probably use publicly available python or perl scripts on linux.
But I need some help with the overall work flow design and selection of tools for each step processing and analysis of the small RNA data.
Our mouse and human sRNA deepseq data is generated through illumina HighSeq 2000, using TruSeq sRNA library kit (50 cycles, single end sequencing). Our current data is in the fastq format. I think there has not been any QC done on them.
What I would like to achieve is as follows:
1) I want to do a QC and filter low quality sequence tags.
2) With the high quality tags, I want to:
A) trim 3' adaptors
B) cleanup really small tags (e.g. less 8bp).
C) remove tags that were resulted from adator dimmers.
3) With the small RNA tags from step 2), I want to:
A) cluster the identical tags and make a count.
B) Cluster homologous tags? (not sure, should I do this?)
C) Do a length distribution analysis
4) With the unique non-redundant small RNA tags, I would like to map them into:
A) known animal tRNA, rRNA, snRNA, snoRNA database
B) known animal microRNAs database
5) With the unique un-mapped small RNA tags, I would like to map them into plant microRNA database to see if any of them are plant miRNAs.
Can you suggest the tools (publicly available, we are kind of poor and don't have access to commercial tools) for each step?
I have read through many threads in this forum and have some general idea such as tools like miRdeep2 or miRkey for mapping etc. But I would like to get a better opnion or guidance from your experienced guys.
I have a basic laptop with linux 10.0.4 installed. I had the impression that for the type of analysis I want to do, there is no need of server. Is that true?
Thanks a lot!
And Sorry for the long post.
Jian
I am recently put on project on identifying plant microRNAs in animal tissues. Overall, I would like to sift through mouse and human small RNA deepseq raw data and see if there are any plant (dietary) microRNAs present in animal tissues (For rationale of science, you can read: Cell Research (2012) 22:107–126. doi:10.1038/cr.2011.158; published online 20 September 2011).
I am a plant biology postdoc and had limited scripting skills. But I have taken a bioinformatics class recently and am pretty family with Linux command line. I also learnt long time ago some programming with C, so I can probably use publicly available python or perl scripts on linux.
But I need some help with the overall work flow design and selection of tools for each step processing and analysis of the small RNA data.
Our mouse and human sRNA deepseq data is generated through illumina HighSeq 2000, using TruSeq sRNA library kit (50 cycles, single end sequencing). Our current data is in the fastq format. I think there has not been any QC done on them.
What I would like to achieve is as follows:
1) I want to do a QC and filter low quality sequence tags.
2) With the high quality tags, I want to:
A) trim 3' adaptors
B) cleanup really small tags (e.g. less 8bp).
C) remove tags that were resulted from adator dimmers.
3) With the small RNA tags from step 2), I want to:
A) cluster the identical tags and make a count.
B) Cluster homologous tags? (not sure, should I do this?)
C) Do a length distribution analysis
4) With the unique non-redundant small RNA tags, I would like to map them into:
A) known animal tRNA, rRNA, snRNA, snoRNA database
B) known animal microRNAs database
5) With the unique un-mapped small RNA tags, I would like to map them into plant microRNA database to see if any of them are plant miRNAs.
Can you suggest the tools (publicly available, we are kind of poor and don't have access to commercial tools) for each step?
I have read through many threads in this forum and have some general idea such as tools like miRdeep2 or miRkey for mapping etc. But I would like to get a better opnion or guidance from your experienced guys.
I have a basic laptop with linux 10.0.4 installed. I had the impression that for the type of analysis I want to do, there is no need of server. Is that true?
Thanks a lot!
And Sorry for the long post.
Jian
Comment