Seqanswers Leaderboard Ad

**lh3** · 08-03-2013, 04:14 AM

Bowtie2 does not work with >4GB genomes yet, so far as I know. It is limited by the internal integer types. Update me if I am wrong. Don't know about star. If you are working with genomic reads (not RNA-seq), try BWA. A 10GB genome should need <20GB RAM.

**Brett_CCG** · 08-03-2013, 06:09 PM

Thanks for your feedback. Yes, that's what I found with Bowtie2. I'm working with RNA-seq reads and I need to use an aligner which does spliced alignment, that's why I chose TopHat2 and STAR. Is there a way to get BWA to do spliced alignments?

**dariober** · 08-03-2013, 11:23 PM

Originally posted by Brett_CCG View Post

I'm working with RNA-seq reads and I need to use an aligner which does spliced alignment

What about subread/subjunct? It handles RNA-seq data and it might be able to work with your reference. (But I've never used it)

Dario

**Brett_CCG** · 08-04-2013, 02:06 AM

Thanks. I've heard of subread. I'll try it out over the next few days and update this thread.

**arolfe** · 08-04-2013, 09:27 AM

I'd just split the genome into chunks, build the index on each chunk, align to each, and merge the results. You'll need to do a little post-processing if you're looking to find a single best hit for each read, but sorting reads by score isn't that hard.

I could also just align to each half, but this will result in biases and an increase in false positives with the alignments, which I would prefer to avoid.

How does aligning to each half cause false positives or bias? You need to merge your results properly (eg, decide which half has the better score and whether it's better enough to be unique), but that's all doable.

**Brett_CCG** · 08-04-2013, 06:47 PM

Thanks. I didn't think of this approach. Although I may end up filtering out true-positives by filtering on score. I'll try both approaches: subread (if it runs with whole genome) and running STAR on a split genome with quality score filtering.

**Brett_CCG** · 08-14-2013, 10:14 PM

Ok. I've tried Subread, and it has a limit of 4Gb for the genome. So I can't use that. I'm now using STAR, splitting up the genome (the genome is hexaploid, so I'm splitting on each genome) and running alignments on each. I wont merge them and filter on score, instead I'll work with each individually and identify splice junctions in each.

Now my main concern: I'm using annotation guided alignment. But the annotation contains a number of exons/CDS in the wrong frame due to start/end positions out by 1-2 bp. Will this affect alignments? My understanding is that STAR would align to regions based on annotated GFF, and then attempt realignment with reads which didn't align using the annotated GFF file. From what I've read this is what Tophat2 does. Is this the case for all RNA-seq GFF guided alignments? I can not get a better annotated GFF unless I wait for the group who done the assembly and annotation to improve it. I need to wrap up this project and move on. It's apart of my PhD (I have 1 year left on my scholarship), so I don't have the luxury of waiting around.

Thanks for any help anyone can provide.

**Kennels** · 08-14-2013, 10:37 PM

I'm not sure if this is what you want, but BWA MEM (v0.7.5a) reports chimeric alignments in the sam output with SA tags. For example:

Code:

HWI-ST226:220:D0AU7ACXX:5:1101:11456:146339	2193	Oa_Locus_2615_Transcript_18	2194	44	42M46H	=	2182	-54	AATTGAGCTACCAAAAACCCTAACCCAAAAATTTGTAGCGTC	*	NM:i:2	AS:i:35	XS:i:0	SA:Z:Oa_Locus_2615_Transcript_14,1923,+,21S43M24S,60,0;Oa_Locus_2615_Transcript_14,1976,-,50S38M,55,0;

See bwa manual and the latest SAM format specification document for details about the SA tag.

BWA should also be able to index your large genome.

**lh3** · 08-15-2013, 12:53 PM

I think bwa-mem works with RNA-seq to a lesser extend. It might be useful in certain non-typical analyses. However, for typical RNA-seq, it would be good to use a standard RNA-seq mapper if possible.

**Brett_CCG** · 08-15-2013, 09:52 PM

Thanks for your input. I'm going to be using particular scripts to extract out spliced reads. These scripts look for the CIGAR string, so for now at least I can't use BWA since it uses SA tags.

I am really interested though in finding out if RNA-seq alignment using a GFF file for guided alignment which is based on a rough draft annotation containing errors isn't a problem, because this is all I have to work with. Details of which are in my post above. If it is a problem, I'll take the unguided approach. Thanks for your help.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Indexing very large genomes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News