Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running Tophat on a small subset of Reads

    Hello.

    I am trying to find out which library type to run for a single stranded RNA seq run. the manual says


    I am not sure which library type to use (fr-firststrand or fr-secondstrand), what should I do?

    "One possible way to figure out the correct library-type is to run TopHat with a small subset of the reads (e.g., 1M) as follows.

    run TopHat with fr-firststrand and count the number of junctions in junctions.bed (one of the output files from TopHat)
    run TopHat with fr-secondstrand and count the number of junctions in junctions.bed

    Since the splice junction finding algorithm of TopHat makes use of library-type information (if provided), one of the two TopHat runs would result in many more splice junctions than the other one. You can then use the library type that gives more junctions. If this is not the case TopHat might not work well with your sequencing protocol. Please let us know more details about your protocol so we can add support for new library types."



    For 10 samples, I have ran the first strand library type and completed the alignment producing the alignment report.

    Now I am running the second library type for a single sample and counting the number junctions in the junctions.bed file (when it copmletes).

    My question is this

    1) Say for this single sample A, if the second library type has more junctions then the second library type is the correct library type. But does the manual mean to say that it is the correct library type FOR ALL samples, or FOR JUST THIS ONE?

    2) If for this single sample, if the alignment for the second library type comes out to have less junctions in the junctions.bed output file, does this mean that the second library type is the incorrect library type FOR ALL samples, or for just this one?

  • #2
    I think if your samples were library prepped and sequenced at the same time, then whatever the result is likely to be broadly applicable - as it's the library prep that defines which type of stranded flag you need to use..

    Although, there's threads on here which hint at more straightforward ways of checking, but knowing which protocol was used to do the library prep is essential. I'd be more keen to ask the lab what they did than waste time processing all my samples through tophat2 twice..

    Comment


    • #3
      for my alignment output I have received read scores of 41.6% concordant pair alignment

      Left reads:
      Input : 35732348
      Mapped : 16332729 (45.7% of input)
      of these: 476549 ( 2.9%) have multiple alignments (4626 have >20)
      Right reads:
      Input : 35732348
      Mapped : 16383018 (45.8% of input)
      of these: 426962 ( 2.6%) have multiple alignments (4550 have >20)
      45.8% overall read mapping rate.

      Aligned pairs: 14952780
      of these: 374540 ( 2.5%) have multiple alignments
      80381 ( 0.5%) are discordant alignments
      41.6% concordant pair alignment rate.



      However, another manual I am reading says

      Accurate differential analysis depends on accurate spliced read alignments. Typically, at
      least 70% of RNA-seq reads should align to the genome, and lower mapping rates may
      indicate poor quality reads or the presence of contaminant.


      I have done QC and my reads were good. So does this mean that perhaps I should try the other library type?

      Comment


      • #4
        Thank you very much for your response, I contacted the lab and found that they have used the TrueSeq Stranded Prep Kit

        From the manual, the library type should be set as "firststrand"

        However, I still do not understand why I have such low percentage rates.

        Comment


        • #5
          The lab also used Ribozo Prep kit, which would also use library type --firststrand

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          48 views
          0 likes
          Last Post seqadmin  
          Working...
          X