Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Building bowtie index with mirBase hairpin.fa file

    Has anyone been able to build a bowtie index with the mirBase hairpin.fa file? I'm having some weird issues and I don't really know what the story is. I've been able to build other indexes without issue, but the mirbase hairpin file is acting strange. When I try to do an alignment it is not finding the vast majority of alignments with the index. It only finds a small percentage, and these are overwhelming GC-rich with 1 or 2 mismatches. There are no errors in the index building process, although I was a bit confused with one aspect of the build - during the build process this was part of the output:

    fchr[A]: 0
    fchr[C]: 416893
    fchr[G]: 781895
    fchr[T]: 1191858
    fchr[$]: 1191858

    I'd be lying if I said I knew exactly what it meant, but it does seem a bit odd...

    Anyone make this index successfully?

    PS-miRBase is now on version 17, but I couldn't get this to work with version 16 either

  • #2
    I'm having the same issue. Has there been any hints yet?
    Thanks!

    Alex

    Comment


    • #3
      @ggy: Is the problem building the index or with alignments?

      Comment


      • #4
        Sorry I was not being clear.
        I was trying to build bowtie2 index of mm10 from the mm10_all_chr.fa file.
        Basically what I did not get any error message to my understanding (but could be wrong). However, I saw sth like in the report after index building:

        fchr[A]: 0
        fchr[C]: 416893
        fchr[G]: 781895
        fchr[T]: 1191858
        fchr[$]: 1191858

        I haven't carried out alignment yet, but I don't feel right about these messages.
        What does the 'fchr[$]' mean? Why is there no 'A' detected if I understood it correctly?

        Thank you very much!

        Comment


        • #5
          Those look like the starting indices for the different symbols in the bwt. $ is a fake symbol that indicates the end of a string, and it looks like this actually indicates there are 416893 A's, not zero. Generally, just ignore stuff a program prints to the screen that you don't understand unless it crashes or clearly states something went wrong; it probably means something to the programmer.

          Comment


          • #6
            Thanks for the advice. I proceeded with the alignment with Tophat2 and got thru with no error. However, there seems to be a significant difference between the alignment result btw Tophat2 and novoalign. I have two samples, the result is like this:

            sample1 (17665866 reads): Tophat2 returns 10942864 mapped (61.9% of input); novoalign returns 15436347 mapped (87.4%)

            sample2 (23823043 reads): Tophat2 returns 16776971 mapped (70.4% of input); novoalign returns 21214876 mapped (89.1%)

            Should I be worried that the Tophat2 was not run correctly or the result from it is less reliable?

            Thanks!

            Alex

            Comment


            • #7
              Yes, you should be worried. No two mappers will give you identical output, and it's always best to use the one that gives the best results. However, sensitivity is not the same as accuracy, so technically you don't know that novoalign is giving you better results, but it probably is.

              In this case, I'm guessing that there's a problem with your data and you should probably quality-trim and/or adapter-trim before mapping. I suggest you run FastQC and post the results here.

              What kind of data are you mapping, and what are you mapping it to, anyway?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X