Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by andrea_maso View Post
    Dear kmcarr,
    yes I know that mirbase has the sequences and GFF coordinates but they are multifasta sequences format and not a single sequence file (I am thinking to Mapview that requires a unique fasta sequence...).
    I will try to use Bowtie and SAM tool to align and view the sequences and I do not know which format they require.
    Do you have an idea?

    Thanks and bye for now.
    Andrea
    Andrea,
    have you tried bowtie to map the reads to miRbase as you said? If you did, can you share your findings?

    Comment


    • #17
      Can people share experience using http://mirexpress.mbc.nctu.edu.tw/usage.php
      Seems many of you have your own versions from scripts ..
      --
      bioinfosm

      Comment


      • #18
        Hi,

        is there a nice tool that allows me to collapse identical reads from Illumina Genome Analyzer or FastA files. I have sequencing data from different small RNAs, with 10-20 million reads. What I want to do is find identical reads and cluster them together as well as count the number of reads in each cluster. Is there a tool that can do that?

        Comment


        • #19
          several tools for deep sequencing-derived small RNA were provided in deepBase(http://deepbase.sysu.edu.cn/), which were developed to map, store, retrieve, annotate, integrate and visualize deep sequencing-derived small RNAs, and facilitate transcriptomic research and the discovery of novel ncRNAs.

          Comment


          • #20
            Originally posted by myrna View Post
            I use a very similar approach, but I first collapse identical reads before aligning (to avoid aligning the same let-7 and other abundant miRNA reads hundreds of thousands of times. You can then count the number of reads in the original file to generate counts. The only problem with this is that you lose the sequence quality information (if you have a need for that).

            Ryan
            Hi all,

            I have similar question and I found this post from google. Could anybody explain what are identical reads (after trimming I supposed?) and how to collapse them?

            Thanks,

            D.

            Comment


            • #21
              Originally posted by dukevn View Post
              Hi all,

              I have similar question and I found this post from google. Could anybody explain what are identical reads (after trimming I supposed?) and how to collapse them?

              Thanks,

              D.
              Hi D.

              Identical reads are identical sequences. It require simple scripting in order to collapse identical reads; you may write a programme that counts identical reads.

              eg. AGCGT
              AGCGT
              AGCGT
              AGAAT
              You will write a new file that contains:
              AGCGT 3
              AGAAT 1

              Simple right?

              Demis001

              Comment


              • #22
                Originally posted by demis001 View Post
                Hi D.

                Identical reads are identical sequences. It require simple scripting in order to collapse identical reads; you may write a programme that counts identical reads.

                eg. AGCGT
                AGCGT
                AGCGT
                AGAAT
                You will write a new file that contains:
                AGCGT 3
                AGAAT 1

                Simple right?

                Demis001
                Ah, got it now. But I dont get it why we need that counting? Will that do anything with the expression counting after alignment? If they are 3 identical reads, aligning just one is enough, doesnt it?

                D.

                Comment


                • #23
                  Originally posted by dukevn View Post
                  Ah, got it now. But I dont get it why we need that counting? Will that do anything with the expression counting after alignment? If they are 3 identical reads, aligning just one is enough, doesnt it?

                  D.
                  Hi D.,

                  Yes, aligning one sequence is suffice and saves a lot of time. The expression counting will be done at the end as you said. However, I found that it would be good to store as part of fasta header string in case you might need it down the pipeline. If you are sure you don't need it , don't count at the initial collapse.

                  eg. >Seq1_$count
                  TTCCCCGGG

                  Demis001

                  Comment


                  • #24
                    Originally posted by demis001 View Post
                    Hi D.,

                    Yes, aligning one sequence is suffice and saves a lot of time. The expression counting will be done at the end as you said. However, I found that it would be good to store as part of fasta header string in case you might need it down the pipeline. If you are sure you don't need it , don't count at the initial collapse.

                    eg. >Seq1_$count
                    TTCCCCGGG

                    Demis001
                    I gave it a second thought and even I agree that we can save time by collapsing reads, but we do lose coverage afterward, dont we? Let say in your example, if AGCGT maps uniquely to region R on genome, instead of having 3X coverage (for that read), we get only 1X? But if you count a duplicate number, would it be more difficult to include those numbers after alignment? I think leaving reads as they should be will solve that coverage issue.

                    Comment


                    • #25
                      Really easy to use microRNA tool

                      Caution: I work for CLC bio

                      If any of you would like to try a very easy to use tool for microRNA analysis, the 4.0 version of CLC bio's Genomics Workbench includes a start (trimming adapters) to finish (identification of microRNAs in miRBase) analysis pipeline. We have had a tremendously strong response from those who have tried it with both SOLiD data and Illumina data.

                      If you are interested in giving the MicroRNA tool a whirl, please feel free to download a fully functional trial version. You can also download a tutorial that will walk you through all of the steps. This page has links to both the tutorial and the Genomics Workbench.

                      Welcome to QIAGEN Digital Insights LabCorp uses QCI and HGMD to improve identification and interpretation of genetic variants within inhereited diseases.Read...


                      Whether you like it or not, I am curious to hear what you think.

                      Comment


                      • #26
                        Originally posted by kolja View Post
                        Hi,

                        is there a nice tool that allows me to collapse identical reads from Illumina Genome Analyzer or FastA files. I have sequencing data from different small RNAs, with 10-20 million reads. What I want to do is find identical reads and cluster them together as well as count the number of reads in each cluster. Is there a tool that can do that?
                        Hi,
                        may be FASTX Collapser is what you need


                        Cheers
                        Gabriel
                        gabriele bucci

                        Comment


                        • #27
                          Collapsing identical reads

                          I understand how and why the collapsing has to be done, but I'm not sure what to use (even in terms of writing a simple script in python or R) to get the job done, given the files we're talking about are several Gigs in size.

                          I won't be able to load such a file in R memory for it to work on and collapse based on each read of miRNA 'species' , and obtaining counts for each of them as well.

                          I thought about using fastx toolkit's collapser program but it wouldn't keep track of the counts which I would need when quantifying mapped miRNAs

                          Anyone have any ideas? Any help would be greatly appreciated.

                          Comment


                          • #28
                            i dont know if you are still looking. but this simple code should do the counts. i got this from this forum only

                            awk 'NR%4==2' input.fastq|sort |uniq -c|awk '{print $1"\t"$2}'

                            Comment


                            • #29
                              Hi all,

                              I tried to research online but couldn't find solutions related to this. Do you use bowtie to align the reads after it's being collapsed such as in this format?

                              miRNA-1
                              AGCGT 3
                              AGAAT 1

                              miRNA-101
                              AGCGTAT 5
                              AGAATAA 1

                              It's no longer the conventional fasta format so which option would you recommend using with bowtie? Do I have to make the file to conform to the fasta format prior to using it with bowtie?

                              Thanks in advance for any help!

                              M

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X