Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help needed for smallRNA deepseq data processing & analysis

    Dear Bioinformatics community,

    I am recently put on project on identifying plant microRNAs in animal tissues. Overall, I would like to sift through mouse and human small RNA deepseq raw data and see if there are any plant (dietary) microRNAs present in animal tissues (For rationale of science, you can read: Cell Research (2012) 22:107–126. doi:10.1038/cr.2011.158; published online 20 September 2011).

    I am a plant biology postdoc and had limited scripting skills. But I have taken a bioinformatics class recently and am pretty family with Linux command line. I also learnt long time ago some programming with C, so I can probably use publicly available python or perl scripts on linux.

    But I need some help with the overall work flow design and selection of tools for each step processing and analysis of the small RNA data.
    Our mouse and human sRNA deepseq data is generated through illumina HighSeq 2000, using TruSeq sRNA library kit (50 cycles, single end sequencing). Our current data is in the fastq format. I think there has not been any QC done on them.

    What I would like to achieve is as follows:
    1) I want to do a QC and filter low quality sequence tags.

    2) With the high quality tags, I want to:
    A) trim 3' adaptors
    B) cleanup really small tags (e.g. less 8bp).
    C) remove tags that were resulted from adator dimmers.

    3) With the small RNA tags from step 2), I want to:
    A) cluster the identical tags and make a count.
    B) Cluster homologous tags? (not sure, should I do this?)
    C) Do a length distribution analysis

    4) With the unique non-redundant small RNA tags, I would like to map them into:
    A) known animal tRNA, rRNA, snRNA, snoRNA database
    B) known animal microRNAs database

    5) With the unique un-mapped small RNA tags, I would like to map them into plant microRNA database to see if any of them are plant miRNAs.

    Can you suggest the tools (publicly available, we are kind of poor and don't have access to commercial tools) for each step?
    I have read through many threads in this forum and have some general idea such as tools like miRdeep2 or miRkey for mapping etc. But I would like to get a better opnion or guidance from your experienced guys.

    I have a basic laptop with linux 10.0.4 installed. I had the impression that for the type of analysis I want to do, there is no need of server. Is that true?
    Thanks a lot!
    And Sorry for the long post.

    Jian

  • #2
    Dear Jian,

    You could try the UEA sRNA toolkit: http://srna-tools.cmp.uea.ac.uk/plan...srna-tools.cgi

    You should be able to analyse most of your data with this.

    Ronny

    Comment


    • #3
      HI Ronny,

      Thanks a lot! Wow, I didn't know that tool.
      The web-based version of that tool has a limit on the size of fastq file (200M). But I saw they have a downloadable version which should not have that limit. (?)
      I will defenitely try it out.

      Jian

      Comment


      • #4
        I have more or less figured out how to do the tasks, thanks to all your help.
        I'd like to share some of my experiences for people who might have similar problems.

        1)
        I used the "cutadapt" tool to trim adaptors.
        In my opinion, "cutadapt" is extremely flexible and easy to use, and is fast.

        2)
        I used the fastx_tools to convert the fastq to fasta, and also to collapse the identical reads. The read counts for each unique seq is appended to the seqID. The seqID is the numerical rank of reads based on read count.
        3)
        If you have many files to process, you can write shell script to automate the pipeline, so that you don't have enter those command line for each tool and each file.

        Comment


        • #5
          shortRan

          Hi

          You could do all the steps you have written using shortRan, article should be out any day in bioinformatics or you can write me on [email protected] and I can send you the package.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X