SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Reply
 
Thread Tools
Old 02-16-2017, 04:57 PM   #1
Rashmi007
Junior Member
 
Location: UK

Join Date: Feb 2017
Posts: 6
Default Need help with single cell RNA-seq.. how to begin the analysis from the raw reads?

Hi,

I am Rashmi Kulkarni and looking for some initial answer to work on single cell RNA-seq. obtained by 10x genomics and sequenced on illumina NextSeq. I have raw data and struggling to understand where to go from here. I am trying to look at open source progrms to handle this. Can anybody direct me to resources on how to do this analysis from the scratch with raw reads?

Thanks,
Rashmi

Last edited by GenoMax; 02-17-2017 at 03:54 AM. Reason: as suggested in the reply to earlier post!
Rashmi007 is offline   Reply With Quote
Old 02-16-2017, 06:12 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,448
Default

I recommend adapter-trimming with BBDuk and then mapping with BBMap to produce a sam file. What you do with the sam file depends on your experiment. There are programs like edgeR and DESeq for differential expression analysis between samples, but I'm not sure if they were designed with single-cell in mind.

You may want to read about RNA-seq in the Biostar Handbook for starters.
Brian Bushnell is offline   Reply With Quote
Old 02-16-2017, 06:22 PM   #3
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 905
Default

Analysis workflow depends on the scRNA-Seq protocol used for library prep. Some methods such as Drop-Seq and 10x Genomics requires starting analysis with open source or free software. These may require other software at the final steps for data presentation but generally they have end to end solution.For other methods it would be best to look at the papers that describes the method.

PS. You will get better responses if you change the title of your post to represent the question.

Last edited by nucacidhunter; 02-16-2017 at 06:35 PM.
nucacidhunter is offline   Reply With Quote
Old 02-17-2017, 03:24 AM   #4
Rashmi007
Junior Member
 
Location: UK

Join Date: Feb 2017
Posts: 6
Default

Yes.. I have updated my title but it doesn't show up. Anyway scRNA-seq library was prepared using 10x genomics technology. Will you please kindly direct me to relevant resources regarding in general how to start from the raw reads? There are tutorials which start from count matrices but hardly any show initial steps.

Thanks a lot,
Rashmi
Rashmi007 is offline   Reply With Quote
Old 02-17-2017, 03:58 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,190
Default

Edit: I will leave this post here but it is not applicable to the original question since the original data is from 10x genomics not plain illumina.

The procedure of QC/scanning and trimming adapters/alignment is more or less the same for most *-seq analysis one may be doing.

You can see sections 4 and 5 in this WikiBook for a general idea of the process. I recommend that you give FastQC (for QC) and then BBMap suite a try for the steps noted above. Both tools are easy to find/use and have extensive support here if you run into questions.

Last edited by GenoMax; 02-17-2017 at 04:57 AM.
GenoMax is offline   Reply With Quote
Old 02-17-2017, 04:02 AM   #6
Rashmi007
Junior Member
 
Location: UK

Join Date: Feb 2017
Posts: 6
Default

Ok.. will follow those steps and post a question, if required.

Thanks,
Rashmi
Rashmi007 is offline   Reply With Quote
Old 02-17-2017, 04:29 AM   #7
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 905
Default

Quote:
Originally Posted by Rashmi007 View Post
Yes.. I have updated my title but it doesn't show up. Anyway scRNA-seq library was prepared using 10x genomics technology. Will you please kindly direct me to relevant resources regarding in general how to start from the raw reads? There are tutorials which start from count matrices but hardly any show initial steps.

Thanks a lot,
Rashmi
10x Genomics scRN-ASeq analysis requires BCL files (fastq files can be used with some extra steps) and is processed through Cell Ranger pipeline and results can be presented with Loupe Cell Browser (both are free and supported). Following link for download and guide:

https://support.10xgenomics.com/sing...erview/welcome

They also have data sets that have been processed through their pipelines ad can be found in the following link:

https://support.10xgenomics.com/single-cell/datasets

Following files from run folder are required for 10x Cell Ranger pipeline:
Data directory (BCL files for the lane)
InterOp directory
runParameters.xml
RTAComplete.txt
RunInfo.xml

Using any third party software for initial data processing will give non-optimal results as reads originating from any single cell are barcoded and each transcript is marked with a UMI and they are checked and corrected against a white list (they are not random 16 or 10 base barcodes or UMIs, respectively). They have a very responsive tech support as well.

Last edited by nucacidhunter; 02-19-2017 at 04:08 PM. Reason: corrected required files name
nucacidhunter is offline   Reply With Quote
Old 02-17-2017, 04:56 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,190
Default

@Rashmi: I only read your last post and missed the critical part that this is 10x data (Thanks @nucacidhunter for quoting the original post).

I have not personally used cellranger but if it is anything like their other software (longranger and supernova, which I have used) then it would not be a trivial thing to get going. You would need good bit of hardware (preferably access to a cluster). If you are not tech-savvy then definitely talk with your local IT support first.

Alternatively, you may want to see if the facility that did your 10x work would be willing to run some of these analyses and give you analyzed data that you can look at locally (with Loupe browser).

Last edited by GenoMax; 02-17-2017 at 04:58 AM.
GenoMax is offline   Reply With Quote
Old 02-17-2017, 11:19 AM   #9
Rashmi007
Junior Member
 
Location: UK

Join Date: Feb 2017
Posts: 6
Default

@GenoMax.. Yes that is precisely the problem. I did look at CellRanger as an option, but its system requirements are too much for my personal PC. We do have CGC account where we can use it, but that needs wrapping the tool and then using it, which is bit tedious to do. That is why I was looking for other options maybe other than CellRanger.

@nucacidhunter.. We have fastq files and not BCL files. What can be done in that case?

I am particularly looking for this, just to understand the basics, how to find out UMI's and barcodes in a given sequence? Can I write a program to do that? Maybe this is too ambitious, but just want to know where I get that information?

Thanks,
Rashmi

Last edited by Rashmi007; 02-17-2017 at 11:22 AM.
Rashmi007 is offline   Reply With Quote
Old 02-17-2017, 11:43 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,190
Default

I think your best option is to go back to whoever did the 10x for you and see if they will analyze the data (you may have to pay if this was done at a service facility). If this is a one-time run it would be the most cost/time effective solution for you.
GenoMax is offline   Reply With Quote
Old 02-17-2017, 01:49 PM   #11
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 905
Default

Quote:
Originally Posted by Rashmi007 View Post
@nucacidhunter.. We have fastq files and not BCL files. What can be done in that case?

I am particularly looking for this, just to understand the basics, how to find out UMI's and barcodes in a given sequence? Can I write a program to do that? Maybe this is too ambitious, but just want to know where I get that information?
Instruction for using fastq data file with Cell Ranger:

https://support.10xgenomics.com/sing...l2fastq-direct

Position of UMIs and barcodes: in V1 kit barcodes are read as index 1 (i7) and UMI is the 10 base Read2 but in V2 kit barcode is the bases 1-16 and UMI bases 17-26 of Read 1.

For troubleshooting and more information I would suggest getting in contact with 10x tech support.

As GenoMax has pointed the easiest way would be to ask the place that have prepared and sequenced the libraries to do preliminary analysis and then you can use Cell Ranger output files for further analysis if you need. They should do it for free as they would have some interest in performance of the platform and evaluating their technical skills.
nucacidhunter is offline   Reply With Quote
Old 02-17-2017, 02:08 PM   #12
Rashmi007
Junior Member
 
Location: UK

Join Date: Feb 2017
Posts: 6
Default

I was reading about other sources and come across kallisto. In pseudoaligning reads to model transcriptome, kallisto generates transcript count matrix. They have given example for fastaq files using 10x genomics platform (https://pachterlab.github.io/kallisto/10xstarting.html). They have developed python scripts to take care of barcoding and UMIs in 10x genomics. Can I make use of kallisto pipeline to generate count matrices. Any suggestion on this?

Thanks,
Rashmi
Rashmi007 is offline   Reply With Quote
Old 03-16-2017, 01:30 PM   #13
hideandSEQ
Junior Member
 
Location: New Haven

Join Date: Mar 2016
Posts: 8
Default

Your best option is definitely processing the raw data using the 10x Genomics free proprietary software Cell Ranger. It uses STAR to map, I would create a new genome index because the one they offer to download on their website is generated from outdated builds.

I would not trust Cell Rangers analysis beyond QC readouts, after generating the raw and filtered UMI matrices, process the unfiltered matrix with an R Bioconductor package like Seurat or Monocle 2 . Both are very easy to use, were designed to be compatible with data from droplet devices like 10X, and can give you more reliable results and control over your workflow than CellRanger will.
hideandSEQ is offline   Reply With Quote
Old 03-20-2017, 04:17 PM   #14
Rashmi007
Junior Member
 
Location: UK

Join Date: Feb 2017
Posts: 6
Default

Hi hideandSEQ,

Now we have got hold on matrices. I have a question about QC of the data. We clearly can not do QC over how alignment has been performed? You also mention that start with unfiltered matrices but I thought the starting point would be filtered matrices? From where can I get logical explanation about which QC steps are required and why? I am already doing Seurat demo tutorial. I will have a look at Monocle.

Thanks,
Rashmi
Rashmi007 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:22 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO