Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 read orientation

    Hi everyone,

    I am looking at a 454 dataset and I am wondering whether the read sequences (they are in a FASTA file) are ususually in the same direction as the original mRNAs or can they be reverse complement?

    This will determine what BLAT parameters I use during alignment. Either q=rna or q=dna. I think with standard (non-454) ESTs you don't know the orientation, so you have to use q=dna. However, this can give you unwanted duplicated alignments.

  • #2
    The typical protocol for sequencing RNA with 454 is to make ds cDNA, fragment it (nebulizer, covaris, etc.) then use a standard genomic library prep kit from Roche. This means polishing (blunting) the ends and attaching the sequencing adapters in a non-directional manner. Thus the reads you get will be a mixture of both directions.

    Comment


    • #3
      Thanks! I guess I have to use q=dna, then.

      The dataset I am looking at is a public 454 GS20 dataset from the paper "Sampling the Arabidopsis Transcriptome with massively parallel pyrosequencing" (Weber et al, Plant Physiology May 2007). Kmcarr, I think I remember from a previous post that you have some experience with this particular dataset.

      Do you have any guess whether the original researchers used q=rna in the BLAT alignment? I remember they had about 11% of the reads that don't map to the genome. But if I use q=dna, I get a larger percent mapping to TAIR7.

      Also, if I do use q=dna, I guess I will only want to 'count' reads once when they map to a gene and its reverse complement. However, I would want to keep both matches when a read maps to multiple genes (say paralogs, or duplicate genes) I'm not sure how to tell these two cases apart... Anyone have any suggestions?

      Comment


      • #4
        Originally posted by behoward View Post
        The dataset I am looking at is a public 454 GS20 dataset from the paper "Sampling the Arabidopsis Transcriptome with massively parallel pyrosequencing" (Weber et al, Plant Physiology May 2007). Kmcarr, I think I remember from a previous post that you have some experience with this particular dataset.

        Do you have any guess whether the original researchers used q=rna in the BLAT alignment? I remember they had about 11% of the reads that don't map to the genome. But if I use q=dna, I get a larger percent mapping to TAIR7.

        Also, if I do use q=dna, I guess I will only want to 'count' reads once when they map to a gene and its reverse complement. However, I would want to keep both matches when a read maps to multiple genes (say paralogs, or duplicate genes) I'm not sure how to tell these two cases apart... Anyone have any suggestions?
        Man! That dataset just won't die. When I said I had some familiarity with the data I was understating it a bit. I was one of the authors, performing all of the bioinformatics. I used the default BLAT settings for query and target type, i.e. both -q ant -t=dna. However BLAT will only output a single alignment for a read at a given location; it will not report both the forward and reverse alignment of a read. You don't have to worry about that.

        Your are correct that you will find equally good alignments to paralogous genes. You will have to decide how you want to approach assigning or counting those reads.

        You will also find many poor alignments of reads to the genome. You should play with the pslReps program to filter your initial BLAT output. pslReps is meant to retain only the best alignment if a query sequence aligns to multiple target locations. If there are a group of alignments which are equally good (or nearly so) they will all be retained.

        Comment


        • #5
          Well, thanks again

          I guess I came to the right person! I suppose the good thing about a dataset that won't die is that you must get a ton of citations.

          Cheers,
          Brian

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          27 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X