Seqanswers Leaderboard Ad

**kmcarr** · 04-20-2009, 10:28 AM

The typical protocol for sequencing RNA with 454 is to make ds cDNA, fragment it (nebulizer, covaris, etc.) then use a standard genomic library prep kit from Roche. This means polishing (blunting) the ends and attaching the sequencing adapters in a non-directional manner. Thus the reads you get will be a mixture of both directions.

**behoward** · 04-20-2009, 11:01 AM

Thanks! I guess I have to use q=dna, then.

The dataset I am looking at is a public 454 GS20 dataset from the paper "Sampling the Arabidopsis Transcriptome with massively parallel pyrosequencing" (Weber et al, Plant Physiology May 2007). Kmcarr, I think I remember from a previous post that you have some experience with this particular dataset.

Do you have any guess whether the original researchers used q=rna in the BLAT alignment? I remember they had about 11% of the reads that don't map to the genome. But if I use q=dna, I get a larger percent mapping to TAIR7.

Also, if I do use q=dna, I guess I will only want to 'count' reads once when they map to a gene and its reverse complement. However, I would want to keep both matches when a read maps to multiple genes (say paralogs, or duplicate genes) I'm not sure how to tell these two cases apart... Anyone have any suggestions?

**kmcarr** · 04-20-2009, 12:10 PM

Originally posted by behoward View Post

The dataset I am looking at is a public 454 GS20 dataset from the paper "Sampling the Arabidopsis Transcriptome with massively parallel pyrosequencing" (Weber et al, Plant Physiology May 2007). Kmcarr, I think I remember from a previous post that you have some experience with this particular dataset.

Do you have any guess whether the original researchers used q=rna in the BLAT alignment? I remember they had about 11% of the reads that don't map to the genome. But if I use q=dna, I get a larger percent mapping to TAIR7.

Also, if I do use q=dna, I guess I will only want to 'count' reads once when they map to a gene and its reverse complement. However, I would want to keep both matches when a read maps to multiple genes (say paralogs, or duplicate genes) I'm not sure how to tell these two cases apart... Anyone have any suggestions?

Man! That dataset just won't die. When I said I had some familiarity with the data I was understating it a bit. I was one of the authors, performing all of the bioinformatics. I used the default BLAT settings for query and target type, i.e. both -q ant -t=dna. However BLAT will only output a single alignment for a read at a given location; it will not report both the forward and reverse alignment of a read. You don't have to worry about that.

Your are correct that you will find equally good alignments to paralogous genes. You will have to decide how you want to approach assigning or counting those reads.

You will also find many poor alignments of reads to the genome. You should play with the pslReps program to filter your initial BLAT output. pslReps is meant to retain only the best alignment if a query sequence aligns to multiple target locations. If there are a group of alignments which are equally good (or nearly so) they will all be retained.

**behoward** · 04-25-2009, 12:17 PM

Well, thanks again

I guess I came to the right person! I suppose the good thing about a dataset that won't die is that you must get a ton of citations.

Cheers,
Brian

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

454 read orientation

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News