SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie, an ultrafast, memory-efficient, open source short read aligner Ben Langmead Bioinformatics 513 05-14-2015 03:29 PM
Introducing BBMap, a new short-read aligner for DNA and RNA Brian Bushnell Bioinformatics 24 07-07-2014 10:37 AM
Miso's open source joyce kang Bioinformatics 1 01-25-2012 07:25 AM
Targeted resequencing - open source stanford_genome_tech Genomic Resequencing 3 09-27-2011 04:27 PM
EKOPath 4 going open source dnusol Bioinformatics 0 06-15-2011 02:10 AM

Reply
 
Thread Tools
Old 11-03-2017, 09:26 AM   #581
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,583
Default

While I have not worked with cellranger (I have only used longranger for WGS data) this sounds like an odd behavior. 10x should be accounting for presence of N's in the reads in their software.

If you have not talked with their software tech support then I would suggest that you give that a try. They may have some specific suggestions or will need to implement a fix in their software. Support has been responsive to me in past.
GenoMax is offline   Reply With Quote
Old 11-03-2017, 09:35 AM   #582
santiagorevale
Member
 
Location: UK

Join Date: Dec 2016
Posts: 17
Default

Regarding the issue that produces the Ns, it's not supported by them, because it occurs when sequencing on a HiSeq 4000 using 75,8,0,75 bp cycle pattern instead of their recommended 26,8,98 bp pattern.

Regarding the N filtering (or any other quality filtering) I agree with you that they should be doing it in advance and incorporate it into their pipeline. I know that so far it's not being done and I have just sent them an email about it.

I'll see what their answer is and be back to you with it. Let's see what their feedback is.

In the meantime, do you think this would be something complicated to implement? If not as a permanent thing on bbduk, could you walk me on how to make it work for me? Because I was unable to match your scripts performance (I tried coding things in perl, python and bash).

Thanks for this quick replies!
santiagorevale is offline   Reply With Quote
Old 11-03-2017, 09:51 AM   #583
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,583
Default

Yikes! You are certainly doing something completely off-label here :-)

In the cellranger mkfastq run, you could use a base-mask such as --use-bases-mask=Y26n*,I8,Y*. That should produce reads in the format 10x wants. If this is a 2D, 4000 run then you may have to do --use-bases-mask=Y26n*,I8,n*,Y*, Your second read is going to be short by about 20 bases. I am not sure if the software will like that.

@Brian would have to comment on your original request. For him, being a programmer, anything would be possible to implement.
GenoMax is offline   Reply With Quote
Old 11-03-2017, 10:02 AM   #584
santiagorevale
Member
 
Location: UK

Join Date: Dec 2016
Posts: 17
Default

Oh, sorry if I was not clear enough. Those are the sequencing cycles not how I'm doing the base calling.

I've been working on 10X projects for several months now and evaluating them in different platforms. That's why I also included the additional question above.
santiagorevale is offline   Reply With Quote
Old 11-03-2017, 10:08 AM   #585
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,583
Default

So how are you doing the base calling then?

10x recommends that you run 26x8x98 but you are running this as a 75x8x75 run instead? If that is the case my recommendation above would cut the read 1 down to 26 as expected by 10x. The index read is correct length and the last read, which should be 98 bp is only 75 in your case. Is this how you are running cellranger mkfq?

With above base mask you should get reads that don't have any N's. Or so I would think.
GenoMax is offline   Reply With Quote
Old 11-06-2017, 04:31 AM   #586
santiagorevale
Member
 
Location: UK

Join Date: Dec 2016
Posts: 17
Default

Yes, that's how I'm running mkfastq. The sequencing is currently producing 2x75 bp, but by running mkfastq with the above mention cycle pattern: Read 1 is being trimmed to 26 bp, and for Read 2, we are only getting 75 bp, which is still enough data for doing alignment. Even though this is not the recommended way of sequencing this 10X libraries, 10X support team tell us they had other customers doing it this way with good results.

Regarding the N thing, that's not the case. These high N proportion reads are not being filtered at all by the first step (not by the bcl2fastq nor by the mkfastq wrapper).

I have contacted 10X and they are saying that this reads are not being filtered, but they are not being mapped also thus downstream analysis is not being affected, which is something we already knew.

However, they told me that for v2 chemistry the Index read is not being used any more after the demultiplexing step. I replied them asking how are they calculating "Q30 Bases in Sample Index" metric then without that file. In the meanwhile, I'm running a test so I'll be back later with both the results and the reply. So lets see. If this works, then I won't be needing any change on bbduk. Lets hope for the best!

Cheers!
santiagorevale is offline   Reply With Quote
Old 11-07-2017, 10:06 AM   #587
santiagorevale
Member
 
Location: UK

Join Date: Dec 2016
Posts: 17
Default

So, I've spoken with 10X support. For now they are not doing or planning to do any filtering step on the rawdata. However, thanks to my email they've opened a software feature request. Running "cellranger count" without the Index file will just skip the metric on that file, nothing else.

In summary, I'm so sorry for making you waste your time! If appropriate, we could even delete all this posts because we have spoken more about 10X than bbduk.

Thanks GenoMax for your feedback! Cheers!
santiagorevale is offline   Reply With Quote
Old 12-05-2017, 07:30 AM   #588
jweger1988
Member
 
Location: Paris, France

Join Date: Apr 2017
Posts: 37
Default

Hi Brian,

I'm using BBtools to call indels from small RNA virus genomes using both bbmap and callvariants. It's doing a great job of calling these. We are interested in the possibility of also calling inversion events from this data. I know this is somewhat common in transcriptomics data. Do these programs have this functionality?

To be more specific, is it possible to identify reads that are aligning to both the sense and antisense of the given reference?

Thanks in advance!

James
jweger1988 is offline   Reply With Quote
Reply

Tags
bbmap, metagenomics, rna-seq aligners, short read alignment

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO