Seqanswers Leaderboard Ad

**Brian Bushnell** · 02-16-2017, 07:12 PM

I recommend adapter-trimming with BBDuk and then mapping with BBMap to produce a sam file. What you do with the sam file depends on your experiment. There are programs like edgeR and DESeq for differential expression analysis between samples, but I'm not sure if they were designed with single-cell in mind.

You may want to read about RNA-seq in the Biostar Handbook for starters.

**nucacidhunter** · 02-16-2017, 07:22 PM

Analysis workflow depends on the scRNA-Seq protocol used for library prep. Some methods such as Drop-Seq and 10x Genomics requires starting analysis with open source or free software. These may require other software at the final steps for data presentation but generally they have end to end solution.For other methods it would be best to look at the papers that describes the method.

PS. You will get better responses if you change the title of your post to represent the question.

**Rashmi007** · 02-17-2017, 04:24 AM

Yes.. I have updated my title but it doesn't show up. Anyway scRNA-seq library was prepared using 10x genomics technology. Will you please kindly direct me to relevant resources regarding in general how to start from the raw reads? There are tutorials which start from count matrices but hardly any show initial steps.

Thanks a lot,
Rashmi

**GenoMax** · 02-17-2017, 04:58 AM

Edit: I will leave this post here but it is not applicable to the original question since the original data is from 10x genomics not plain illumina.

The procedure of QC/scanning and trimming adapters/alignment is more or less the same for most *-seq analysis one may be doing.

You can see sections 4 and 5 in this WikiBook for a general idea of the process. I recommend that you give FastQC (for QC) and then BBMap suite a try for the steps noted above. Both tools are easy to find/use and have extensive support here if you run into questions.

**Rashmi007** · 02-17-2017, 05:02 AM

Ok.. will follow those steps and post a question, if required.

Thanks,
Rashmi

**nucacidhunter** · 02-17-2017, 05:29 AM

Originally posted by Rashmi007 View Post

Yes.. I have updated my title but it doesn't show up. Anyway scRNA-seq library was prepared using 10x genomics technology. Will you please kindly direct me to relevant resources regarding in general how to start from the raw reads? There are tutorials which start from count matrices but hardly any show initial steps.

Thanks a lot,
Rashmi

10x Genomics scRN-ASeq analysis requires BCL files (fastq files can be used with some extra steps) and is processed through Cell Ranger pipeline and results can be presented with Loupe Cell Browser (both are free and supported). Following link for download and guide:

https://support.10xgenomics.com/sing...erview/welcome

They also have data sets that have been processed through their pipelines ad can be found in the following link:

https://support.10xgenomics.com/single-cell/datasets

Following files from run folder are required for 10x Cell Ranger pipeline:
Data directory (BCL files for the lane)
InterOp directory
runParameters.xml
RTAComplete.txt
RunInfo.xml

Using any third party software for initial data processing will give non-optimal results as reads originating from any single cell are barcoded and each transcript is marked with a UMI and they are checked and corrected against a white list (they are not random 16 or 10 base barcodes or UMIs, respectively). They have a very responsive tech support as well.

**GenoMax** · 02-17-2017, 05:56 AM

@Rashmi: I only read your last post and missed the critical part that this is 10x data (Thanks @nucacidhunter for quoting the original post).

I have not personally used cellranger but if it is anything like their other software (longranger and supernova, which I have used) then it would not be a trivial thing to get going. You would need good bit of hardware (preferably access to a cluster). If you are not tech-savvy then definitely talk with your local IT support first.

Alternatively, you may want to see if the facility that did your 10x work would be willing to run some of these analyses and give you analyzed data that you can look at locally (with Loupe browser).

**Rashmi007** · 02-17-2017, 12:19 PM

@GenoMax.. Yes that is precisely the problem. I did look at CellRanger as an option, but its system requirements are too much for my personal PC. We do have CGC account where we can use it, but that needs wrapping the tool and then using it, which is bit tedious to do. That is why I was looking for other options maybe other than CellRanger.

@nucacidhunter.. We have fastq files and not BCL files. What can be done in that case?

I am particularly looking for this, just to understand the basics, how to find out UMI's and barcodes in a given sequence? Can I write a program to do that? Maybe this is too ambitious, but just want to know where I get that information?

Thanks,
Rashmi

**GenoMax** · 02-17-2017, 12:43 PM

I think your best option is to go back to whoever did the 10x for you and see if they will analyze the data (you may have to pay if this was done at a service facility). If this is a one-time run it would be the most cost/time effective solution for you.

**nucacidhunter** · 02-17-2017, 02:49 PM

Originally posted by Rashmi007 View Post

@nucacidhunter.. We have fastq files and not BCL files. What can be done in that case?

I am particularly looking for this, just to understand the basics, how to find out UMI's and barcodes in a given sequence? Can I write a program to do that? Maybe this is too ambitious, but just want to know where I get that information?

Instruction for using fastq data file with Cell Ranger:

https://support.10xgenomics.com/sing...l2fastq-direct

Position of UMIs and barcodes: in V1 kit barcodes are read as index 1 (i7) and UMI is the 10 base Read2 but in V2 kit barcode is the bases 1-16 and UMI bases 17-26 of Read 1.

For troubleshooting and more information I would suggest getting in contact with 10x tech support.

As GenoMax has pointed the easiest way would be to ask the place that have prepared and sequenced the libraries to do preliminary analysis and then you can use Cell Ranger output files for further analysis if you need. They should do it for free as they would have some interest in performance of the platform and evaluating their technical skills.

**Rashmi007** · 02-17-2017, 03:08 PM

I was reading about other sources and come across kallisto. In pseudoaligning reads to model transcriptome, kallisto generates transcript count matrix. They have given example for fastaq files using 10x genomics platform (https://pachterlab.github.io/kallisto/10xstarting.html). They have developed python scripts to take care of barcoding and UMIs in 10x genomics. Can I make use of kallisto pipeline to generate count matrices. Any suggestion on this?

Thanks,
Rashmi

**hideandSEQ** · 03-16-2017, 01:30 PM

Your best option is definitely processing the raw data using the 10x Genomics free proprietary software Cell Ranger. It uses STAR to map, I would create a new genome index because the one they offer to download on their website is generated from outdated builds.

I would not trust Cell Rangers analysis beyond QC readouts, after generating the raw and filtered UMI matrices, process the unfiltered matrix with an R Bioconductor package like Seurat or Monocle 2 . Both are very easy to use, were designed to be compatible with data from droplet devices like 10X, and can give you more reliable results and control over your workflow than CellRanger will.

**Rashmi007** · 03-20-2017, 04:17 PM

Hi hideandSEQ,

Now we have got hold on matrices. I have a question about QC of the data. We clearly can not do QC over how alignment has been performed? You also mention that start with unfiltered matrices but I thought the starting point would be filtered matrices? From where can I get logical explanation about which QC steps are required and why? I am already doing Seurat demo tutorial. I will have a look at Monocle.

Thanks,
Rashmi

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Hello !!!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News