Seqanswers Leaderboard Ad

**liaojinyue** · 10-24-2014, 07:12 AM

data analysis

Hi Simone,

We have just received the sequencing result using smart-seq2. The input material for each sample is 10 cells and we got around 10M PE100 reads for each sample. We then tried to analysis the data using TopHat-cuffdiff workflow. The mapping rate is around 80% which is quite good. However, there are too many ( more than 8000 genes) showed differential expression. It looks like lots of genes only have expression in one sample but zero rpkm in another one. I wonder whether this is normal for this kind of experiment and how can I make sense of the gene list. Thanks for the help.

Jason

Originally posted by Simone78 View Post

Hi,
what you saw here is nothing new for us and we still can´t explain what it is. we also saw a peak around 1.8 Kb when optimizing the protocol using HEK293T (human) cells. When we looked for over-represented sequences in the data we found out that the peak corresponded to a specific transcript called "humanin", which is mitochondrial (!) but has several homologous copies on the genome. While it´s annoying having such a library, the results were not affected. Below please find an example of what we got when we were optimizing Smart-seq2. There you can see we had not only this humanin peak but also a fair amount of primer dimers, but still the performance of our method was superior to the Clontech kit (primer dimers get tagmented with the Nextera kit as well and end up in the final library, thus "wasting" sequencing reads). Sometimes we see it also with mouse cells, but this obviously can´t be humanin which, as the name says, is present only in humans. For the mouse we never found out what it is...what we know is that it´s not a contamination because some of our collaborators reported the same. On the other hand, even when working with single cells this peaks doesn´t always come up. Maybe it depends on how stressed the cells are or how they were sorted? I would also be interested in finding it out!
In conclusion, Smart-seq2 is better than Smart-seq but it´s not perfect, sorry!

I saw that you tried to reduce the amount of TSO. I did few trials on the amount of the different primers. While reducing ISPCR and SMART dT30VN sometimes (sometimes!) helps in reducing the amount of primer dimers, decreasing the TSO usually leads to lower cDNA yield after preampl. Maybe the strand-switch reaction is inefficient and you need to have such a huge amount of TSO even when working with single cells!
Have you tried to sequence some of these libraries? I don´t know what you are interested in, but if you need a lot of reads you might simply multiplex less samples per lane. For most applications (diff expression, detection of isoforms/splice variants) 1 million reads/sample are sufficient and you would get that amount even when pooling 96 samples on a Illumina HiSeq 2000.

We had some concatamer problem in the beginning, but they were due to too much adaptors (TSO, ISCR or oligo dT) compared to the RNA of a cell and we observed that only with very small cells (mostly immune cells which have very little mRNA). If it is an issue, you might try to block the TSO at the 5´end as done by Kapteyn J et al (BMC Genomics 2010, 11:413). Blocking the 5´should prevent concatamerization of the TSO.

Best,
Simone

**DW-A290** · 01-28-2015, 03:42 AM

Hey everyone,

we are using the Smart Seq2 protocol in our lab for single cell RNA seq and it works great! However I tried to use this protocol for generating cDNA from nuclear RNA and I got a cDNA library with pretty short fragments (before Tagmentation etc). According to Picelli et al., 2014 the peak should be around around 1.5-2kb. Does anybody have experience with nuclear RNA seq or maybe also got a result like that? Anybody having an idea why my peak is around 350bp? I would highly appreciate any comments!

Best,
Damian

Attached Files

Bildschirmfoto 2015-01-28 um 12.30.38.png (56.9 KB, 184 views)

**Babooncanfly** · 02-08-2015, 08:19 PM

Originally posted by Simone78 View Post

Hi,
what you saw here is nothing new for us and we still can´t explain what it is. we also saw a peak around 1.8 Kb when optimizing the protocol using HEK293T (human) cells. When we looked for over-represented sequences in the data we found out that the peak corresponded to a specific transcript called "humanin", which is mitochondrial (!) but has several homologous copies on the genome. While it´s annoying having such a library, the results were not affected. Below please find an example of what we got when we were optimizing Smart-seq2. There you can see we had not only this humanin peak but also a fair amount of primer dimers, but still the performance of our method was superior to the Clontech kit (primer dimers get tagmented with the Nextera kit as well and end up in the final library, thus "wasting" sequencing reads). Sometimes we see it also with mouse cells, but this obviously can´t be humanin which, as the name says, is present only in humans. For the mouse we never found out what it is...what we know is that it´s not a contamination because some of our collaborators reported the same. On the other hand, even when working with single cells this peaks doesn´t always come up. Maybe it depends on how stressed the cells are or how they were sorted? I would also be interested in finding it out!
In conclusion, Smart-seq2 is better than Smart-seq but it´s not perfect, sorry!

I saw that you tried to reduce the amount of TSO. I did few trials on the amount of the different primers. While reducing ISPCR and SMART dT30VN sometimes (sometimes!) helps in reducing the amount of primer dimers, decreasing the TSO usually leads to lower cDNA yield after preampl. Maybe the strand-switch reaction is inefficient and you need to have such a huge amount of TSO even when working with single cells!
Have you tried to sequence some of these libraries? I don´t know what you are interested in, but if you need a lot of reads you might simply multiplex less samples per lane. For most applications (diff expression, detection of isoforms/splice variants) 1 million reads/sample are sufficient and you would get that amount even when pooling 96 samples on a Illumina HiSeq 2000.

We had some concatamer problem in the beginning, but they were due to too much adaptors (TSO, ISCR or oligo dT) compared to the RNA of a cell and we observed that only with very small cells (mostly immune cells which have very little mRNA). If it is an issue, you might try to block the TSO at the 5´end as done by Kapteyn J et al (BMC Genomics 2010, 11:413). Blocking the 5´should prevent concatamerization of the TSO.

Best,
Simone

Hi Simone it may be helpful to know that I saw this 1.85kb peak from bulk purified RNA as well as single cells, so it's unlikely stress on the cell. Furthermore, we don't see this with Smart-seq1, comparing side-by-side on the same input source, so it's likely a method-specific artifact. Having said that, I'm excited by the enhanced sensitivity and coverage offered by smart-seq2, and am waiting for some pilot data to confirm. If there's any new updates/improvement on the method please let us know! Thanks for the great protocol!

**Simone78** · 02-09-2015, 01:21 AM

Originally posted by Babooncanfly View Post

Hi Simone it may be helpful to know that I saw this 1.85kb peak from bulk purified RNA as well as single cells, so it's unlikely stress on the cell. Furthermore, we don't see this with Smart-seq1, comparing side-by-side on the same input source, so it's likely a method-specific artifact. Having said that, I'm excited by the enhanced sensitivity and coverage offered by smart-seq2, and am waiting for some pilot data to confirm. If there's any new updates/improvement on the method please let us know! Thanks for the great protocol!

as I said in another thread (http://seqanswers.com/forums/showthr...d=1#post159849), we don´t know why this happens. We hypothesised it was because cells were stressed but, from your results, it doesn´t seems to be the case!
The protocol hasn´t been updated since its publication (I was mainly working on the tagmentation protocol with Tn5 transposase to replace the Nextera kit, that was recently published). The only thing I would suggest and that preliminary data indicate it might help to get rid or primer dimers/concatamers is blocking the TSO at the 5´end (using biotin, for example). This was "forgotten" during the Smart-seq2 optimization but it prevents the "loss" of many reads after seq. In fact, Tn5 cuts anything that is double-stranded down to about 40 bp, as shown by Adey et al in their original Genome Biol paper in 2010. If you don´t eliminate the dimers after the first PCR (before tagmentation) with the bead purification (unlikely) you´ll find them in the final library.
Best,
Simone

**KroSeq** · 03-04-2015, 07:25 AM

Hej Simone,
Thx for the cool protocol and your advises on this platform. Following it I got quite good data - at least for the part before sequencing (HiSeq run, 2x 150 bp, 400 mio reads)...
Now I need to provide some additional infos to the core facility guys as they haven't yet run such libraries.
1 What should be the actual concentration of the final pool ([ng/µl])?
2 What should be the actual average fragment size?
3 What should be the actual concentration ([nM]) and whether this shall be based on Qubit, BioAnalyzer or Tapestation?
4 What loading concentrations did you use ([pM]) and what cluster density you achieved?
5 What % of phiX spine-in you used?
It would be very helpful if you could comment on that.

@2: all my libraries range between 525 and 630 bp mean fragment length (in your paper you state 200 - 600 bp), so this should be OK, right?

Thx

**Simone78** · 03-04-2015, 12:44 PM

Originally posted by KroSeq View Post

Hej Simone,
Thx for the cool protocol and your advises on this platform. Following it I got quite good data - at least for the part before sequencing (HiSeq run, 2x 150 bp, 400 mio reads)...
Now I need to provide some additional infos to the core facility guys as they haven't yet run such libraries.
1 What should be the actual concentration of the final pool ([ng/µl])?
2 What should be the actual average fragment size?
3 What should be the actual concentration ([nM]) and whether this shall be based on Qubit, BioAnalyzer or Tapestation?
4 What loading concentrations did you use ([pM]) and what cluster density you achieved?
5 What % of phiX spine-in you used?
It would be very helpful if you could comment on that.

@2: all my libraries range between 525 and 630 bp mean fragment length (in your paper you state 200 - 600 bp), so this should be OK, right?

Thx

I´ll try to answer to your questions, but for some it is not that straightforward.
1- the final concentration depends on many factors, such as input DNA used in the tagmentation, number of PCR cycles after tagmentation, avg size of the library, etc. In general, I have a broad range of conc and avg sizes. What is important for us (who are sending the libraries to a seq facility) is to get a final conc of 2 nM. As I said, the conc of the final pool depends on the size, so it varies.
2- the avg size depends on your application. If you do PE seq then maybe a 200 bp library is too short. We do SE 50 bp seq and it works well. Nextera Adaptors are (if I am correct) are around 120 bp.
3- it´s 2 nM (see answer 1). We get the conc of the pool from the Qubit and the size from the Bioanalyzer. It would be more precise to use the KAPA qPCR kit for measuring the conc. The kit is fast and easy to use, I recommend it.
4- I think at the seq facility they load 10 pM most of the times. Cluster density is around 900k/mm2 and no. of reads vary but it´s 200-250M/lane.
5- don´t remember, sorry! I guess they use what is recommended by Illumina (1%).

You libraries have the right size, I wouldn´t be worried about the length, which is probably very similar to what you get when you use the Nextera XT kit (where we, at least, get a very broad distribution of fragments in the range 200-1000 bp).
best,
Simone

**eab** · 03-19-2015, 11:32 AM

rRNA signal in SMARTseq2

What rRNA read percentages do people typically get using this protocol? We made a batch of libraries recently with rRNA percentages ~50%. Seems too high. Does anyone have thoughts about potential causes?
Thanks!
Eli

**wishingfly** · 05-19-2015, 04:00 PM

Question from rookie

I wish I had known this website and read these threads earlier. I am kind of a rookie in single-cell RNA-seq, and I am so excited to find such a informative place to discuss the technique. I have tried the SMARTer kit from Clontech, but the result is unsatisfied. I then read the papers that Simone's lab published about the SMARTer seq2, and I am eager to try it out.

I have an entry level question, which might seems stupid to you guys, but I really want to know. It is about the RT, nowaday, we commonly use Superscript III from Invitrogen in the lab; but in this protocol, it still uses Superscript II. I am wondering is there any special reason for this? Actually Invitrogen has lauched Superscript IV, which is claimed to run RT in 10 min. That is actually a real game changer to my experiment, because we do have special reason to finish the entire process as soon as possible. I understand that it is often better to follow the protocol without too many whys, but here we do have special need in timing.

I would really appreciate if anyone could answer my question. Thanks in advance!

**kushald** · 05-19-2015, 10:06 PM

We (Genotypic Technology) are a Genomics services provider based in India. We have used the SMARTer kit for Sheep oocyte samples. Please write us at [email protected] for more details on this.

Visit our website www.genotypic.co.in

**Simone78** · 05-19-2015, 10:34 PM

Originally posted by wishingfly View Post

I wish I had known this website and read these threads earlier. I am kind of a rookie in single-cell RNA-seq, and I am so excited to find such a informative place to discuss the technique. I have tried the SMARTer kit from Clontech, but the result is unsatisfied. I then read the papers that Simone's lab published about the SMARTer seq2, and I am eager to try it out.

I have an entry level question, which might seems stupid to you guys, but I really want to know. It is about the RT, nowaday, we commonly use Superscript III from Invitrogen in the lab; but in this protocol, it still uses Superscript II. I am wondering is there any special reason for this? Actually Invitrogen has lauched Superscript IV, which is claimed to run RT in 10 min. That is actually a real game changer to my experiment, because we do have special reason to finish the entire process as soon as possible. I understand that it is often better to follow the protocol without too many whys, but here we do have special need in timing.

I would really appreciate if anyone could answer my question. Thanks in advance!

Hi,
we used Superscript II simply because it has strand-switch activity while the Superscript III does not (or, at least, it is much less efficient).
Recently I also did a test with the Superscript IV and didn´t get a satisfactory cDNA yield when using the standard protocol, following the manufacturer´s instructions. However, when adding betaine, some extra MgCl2 and following my Smart-seq2 protocol I got yields perfectly comparable with the Superscript II. I haven´t sequenced these samples yet and I did the test only in 10 pg tot RNA, but I believe there won´t much difference. For the Superscript IV I did the RT at 50 degrees for 15 mins (they suggest 50-55 degrees for 10-15 mins).
Good luck with your experiments!

Best,
Simone

**jwfoley** · 05-20-2015, 04:38 AM

Often when a company claims that a new recombinant enzyme has improved processivity compared to the wild-type (or in this case, we're comparing with the standard mutant that lacks the RNase H domain), what it really means is that they've just made it more temperature-resistant, so that you can run the reaction at a higher temperature, which makes it faster. The old enzyme might be just as fast at that temperature, except it would degrade too quickly to finish the job. Note the recommended reaction temperatures: SuperScript II, 42 °C (like most other MMLV-derived RTases); SuperScript III, 50 °C; SuperScript IV, 50–55 °C (70% activity at 65 °C!). You could take one of the fancier enzymes and cool it down to 42 °C like the regular ones, but then it would lose its processivity advantage and defeat the purpose.

Template-switching reverse transcription has a peculiar problem with this: it hinges (literally?) on the annealing of a three-base overhang, so running up the reaction temperature would speed up the enzyme but could also reduce the template-switching (the LNA would help with this). Unfortunately IDT's OligoAnalyzer doesn't want to give numbers for a trinucleotide, but even the 20T primer included in the SuperScript IV kit has a melting temperature of only 49.4 ºC in realistic conditions. (I used the recommended oligo and dNTP concentrations in the SSIV manual, but it doesn't say the buffer composition so I had to use NEB's for that: 75 mM monovalent cation, 3 mM magnesium. I wouldn't be surprised if the SSIV buffer is rather different considering its special reaction conditions and claimed benefits; often the difference between commercial enzyme kits isn't in the enzymes themselves, just the reaction buffers.)

This poses even worse concerns for standard random-primed RNA-seq, because the higher reaction temperature is going to (further) bias the priming against A:T pairs, which only have two hydrogen bonds instead of three. Illumina uses SuperScript II in its RNA-seq kits, which seems even more surprising since there's no template-switching required (and since it's sold by Life Tech, which is the only company that's remotely threatening to Illumina's sequencing monopoly at the moment, but still not very). This thread concluded that's just because of institutional inertia, but I wonder whether someone thought of this problem.

And another problem is error rate: polymerases may go faster at higher temperatures, but they also tend to make more mis-pairing errors. TS-RT is already at a disadvantage compared to standard RNA-seq in this regard because it uses error-prone RTase to synthesize both cDNA strands instead of just one. This probably isn't a big deal for gene-expression profiling, where you're just counting aligned reads and an error or two won't make the read unalignable, but it matters more for base-counting applications like detecting RNA editing or allele-specific expression.

Anyway, if you really want to try different temperatures or enzymes, be sure to include no-template controls with the RT primer alone, the TS primer alone, and both together. Depending on how you measure (be sure to check not just the quantity of product molecules but their size distribution), what looks like a high yield of cDNA might actually be a high yield of primer dimers from a reaction that completely failed at template-switching.

**Simone78** · 05-20-2015, 05:15 AM

interesting comment, thanks!
I just want to add that a visual inspection and a quick dispensing test with a liquid handling robot showed that the viscosity of the reaction buffers is clearly different (SSRT II vs IV).
As I said, I haven´t sequenced any of my samples but the size distribution is exactly the same, with just a slightly higher "background" of short fragments. That´s why I think the 2 enzymes are comparable (no primer dimers here, see attachment).
And of course, as said above, the SSRT IV is still a retroviral RT and suffers all the limitations of the previous versions, I am afraid.

**wishingfly** · 05-21-2015, 01:37 PM

I really appreciate Simone's on-hand experience and jwfoley's detailed explaination of how to choose the right RTase. Based upon my understanding, I guess I will go with the classic Superscript II in majority of my experiment, and keep the option of Superscript IV in the future optimization. Thank you so much!

**VerhTwente** · 05-22-2015, 01:37 AM

Hi everyone,

Here in the lab we've been trying to do TS experiments with set amounts of RNA input (500pg), using the STRT method. Our negative controls where we don't add TSO yield similar cDNA amounts to the regular conditions. Has anyone devised a good way to evaluate TS efficiency without actual sequencing?

**jwfoley** · 05-22-2015, 03:23 AM

Originally posted by VerhTwente View Post

Here in the lab we've been trying to do TS experiments with set amounts of RNA input (500pg), using the STRT method. Our negative controls where we don't add TSO yield similar cDNA amounts to the regular conditions. Has anyone devised a good way to evaluate TS efficiency without actual sequencing?

Run your PCR product on a Bioanalyzer or equivalent. cDNA produces a nice wide size distribution while primer dimers should be one tight band of predictable size. Or maybe a series of tight bands at multiples of that size (concatamers), in which case you should switch to 5' blocked oligonucleotides (it's as simple as adding a 5' biotin).

Also, as I said above, consider doing NTCs that only use one primer or the other instead of both.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News