Seqanswers Leaderboard Ad

**mnkyboy** · 07-12-2011, 07:52 AM

I did some QC runs using MAQC UHR and Human Brain with three replicates for TruSeq Whole Transcriptome and the libraries had an R-squared value of .98 or higher when I compared their FPKM as generated by Cufflinks. Granted I did not do the mRNA selection step.

I have also made libraries from the same experimental sample using different methods for ribosomal reduction and when comparing the non-ribosomal FPKM I also get a very high correlation of >0.9.

Were they prepared at different times? What kind of RNA and how does the BioAnlyzer look?

**eab** · 07-12-2011, 08:07 AM

Hey mnkyboy, thanks for the superfast reply! Here are details.

Cells: sorted human naive T cells, approximately 15 million in one tube. Cells aliquotted into 5 tubes, including one (1) tube of 1x10e7, two (2) tubes of 2x10e6, and two (2) tubes of 2x10e5.

Extractions: cells pelleted and lysed in RNAzol RT immediately after aliquoting, then stored at -80 until total RNA extraction. RNA extraction done at same time with same tubes of reagents on all 5 tubes.

Library prep: TruSeq RNA sample prep kit A, all libraries prepared together in a single 96-well plate using high-throughput protocol (with a few minor mods).

Library QC: completed, purified libraries run on bioanalyzer and showed appropriate size peak + a large peak that I took to represent the "bubble form" Illumina describes. Libraries quantified by Kapa qPCR with flowcell primers and SYBR Green reporter.

Clustering: cBOT using cluster kit TruSeq PE cluster kit v2 - HighSeq.

I did not run the starting RNA on the BioA before library prep. The cells were handled as immaculately as was possible, so I figured that no matter what the BioA gave me for an RIN, I would not be able to improve on it and I needed to just go forward. I have some RNA saved back that I can run now on the BioA, but I would be shocked if differential degradation were the problem.

Any ideas? We're wondering especially about trivial informatics sorts of things that can lead to false differences.....

Thanks!
Eli

**chadn737** · 07-12-2011, 08:22 AM

When you say they look radically different, what do you mean? Is this before alignment or after alignment?

**mnkyboy** · 07-12-2011, 08:28 AM

That is definitely a head scratcher. How long were your reads? We have found for RNA-seq if we go over 75 bases we start hitting adapter and our mapping goes to awry. Did you multiplex? Was there anything that stuck out across the lanes in your QC? We generally multiplex and spread across the flow cell to reduce any lane variation.

The only other thing that I think could be an issue is if something odd happened during the poly-A selection. One way to check this is too see if you map to any known non poly-adenylated non-coding RNA and see if there are differences across the samples.

**chadn737** · 07-12-2011, 08:33 AM

Originally posted by mnkyboy

That is definitely a head scratcher. How long were your reads? We have found for RNA-seq if we go over 75 bases we start hitting adapter and our mapping goes to awry.

This is exactly the problem I had with the truseq libraries and I wonder if this is the problem now. We had 100bp reads and I was only getting ~60% to map. When I would blast random reads, the last 25 or so bps often had no match at all and turned out to be adapter sequence. I have heard of other people also having this problem with correct size selection.

**mnkyboy** · 07-12-2011, 08:37 AM

Originally posted by chadn737 View Post

This is exactly the problem I had with the truseq libraries and I wonder if this is the problem now. We had 100bp reads and I was only getting ~60% to map. When I would blast random reads, the last 25 or so bps often had no match at all and turned out to be adapter sequence. I have heard of other people also having this problem with correct size selection.

Yeah our standard WT or mRNA-seq is now 2x75 bp and then 2x50 if we do FFPE.

**sdarko** · 07-12-2011, 09:49 AM

Originally posted by chadn737 View Post

When you say they look radically different, what do you mean? Is this before alignment or after alignment?

I'm the bioinformatician working on this.

They looked vastly different.

In the first image I uploaded, I had used the wrong gtf (contained multiple entry names for the same transcript, ucsc_all_known_mRNA) file for the cufflinks analysis and that was a cause of much of the disparity. The R^2 value was only 0.60 or so.

After realizing my error, I grabbed the refSeq gtf file from the UCSC genome browser. After using it in cufflinks, we obtained the second image. The R^2 value for that one us much better at 0.90 or so, but probably should be a bit better.

Sam

Attached Files

**eab** · 07-12-2011, 10:22 AM

As Sam (sdarko) writes, a change in the gtf improved the correlation between duplicate libraries, but we hope the actual correlation is even better. First off, if you look at the right-hand plot from his post, there are a good number of reads stacked up along the axes, meaning that they occurred in only one of the two libraries. Second, of the reads that occurred in both libraries, correlation between libraries is not so close, especially at the middle and lower ranges of abundance.

**chadn737** · 07-12-2011, 10:26 AM

How deep was your sequencing? I almost always find a large number of genes with 1 or 2 reads mapping, that may be in one sample, but not in the other. Still, even 0.9 seems a bit low for technical replicates. We only do Biological replicates and there we usually an r2 of around .96 - .97.

**sdarko** · 07-13-2011, 03:45 AM

Originally posted by chadn737 View Post

How deep was your sequencing? I almost always find a large number of genes with 1 or 2 reads mapping, that may be in one sample, but not in the other. Still, even 0.9 seems a bit low for technical replicates. We only do Biological replicates and there we usually an r2 of around .96 - .97.

I think that one issue may be that in one "identical" library we have ~ 4 million reads (with ~83% aligning to genome) while in the other "identical" library we have ~1 million reads (with ~71% aligning to genome).

So we have greater than 4x the reads aligning for one library versus the other.

Sam

**Heisman** · 07-13-2011, 03:51 AM

Originally posted by sdarko View Post

I think that one issue may be that in one "identical" library we have ~ 4 million reads (with ~83% aligning to genome) while in the other "identical" library we have ~1 million reads (with ~71% aligning to genome).

So we have greater than 4x the reads aligning for one library versus the other.

Sam

That can be a big. Since you're a bioinformatician who is presumably much better at programming than I am can you take random samples of 1M reads from the total 4M and align them and see how the R^2 looks? How much coverage did you get overall?

**sdarko** · 07-13-2011, 04:00 AM

Originally posted by Heisman View Post

That can be a big. Since you're a bioinformatician who is presumably much better at programming than I am can you take random samples of 1M reads from the total 4M and align them and see how the R^2 looks? How much coverage did you get overall?

Taking a random subset is on the agenda for today. Will let you know.

**eab** · 07-13-2011, 08:54 AM

We noticed that many of the species "unique" to 1/2 duplicates appear to be ubiquitously-expressed genes mapping to loci encompassing several possible transcripts. So there is no way they should have been unique to one of the starting RNA samples. Perhaps a single species is being called one thing from one duplicate library, and something else from the other? Either that, or PCR is so chaotic that it completely loses large numbers of moderately-abundant species in a somewhat random fashion? I feel like the field would be aware of that if it were the case, though.

**chadn737** · 07-13-2011, 10:17 AM

Originally posted by sdarko View Post

I think that one issue may be that in one "identical" library we have ~ 4 million reads (with ~83% aligning to genome) while in the other "identical" library we have ~1 million reads (with ~71% aligning to genome).

So we have greater than 4x the reads aligning for one library versus the other.

Sam

Yeah, thats not very deep, so I would expect a lot more singletons. If you set an arbitrary cutoff and filter out the singletons, I wonder if your r2 will increase.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Intra-sample variability, Illumina TruSeq mRNA

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News