SEQanswers (
-   Bioinformatics (
-   -   RNA-Seq alignment issue (

Amative 04-02-2013 01:43 PM

RNA-Seq alignment issue
Hello all,

I have been given paired-end RNA-Seq files to align against a couple of references. I used Bowtie2 to do the job. The alignment results were very low in most of the cases (less than 5% overall alignment rate).Now, we are thinking this might be caused by either contamination or mix up samples.

Any suggestion what to do in such case?
Thank you in advance :)

swbarnes2 04-02-2013 02:47 PM

Bioinformatically, there's nothing you can do, other than help the people to know what went wrong.

For starters, spot-check some random high quality reads, BLAST them against nr, see if you can determine what they are.

Try aligning to the whole genome to see how much of the library was genomic.

See if there are certain highly repetitive reads (like Illumina adapters) taking up a lot of reads.

And of course see if the run overall was of good enough quality for you to believe that your reads are accurate.

rboettcher 04-03-2013 02:57 AM

Hi Amative,

what kind of reference did you provide? Bowtie2 is not splicing aware, so it is not able to deal with reads spanning splice junctions. Therefore, it can only be used to align against the transcriptome (for RNAseq). This is why TopHat was created to align against the whole genome.


Amative 04-03-2013 08:11 AM

Thanks swbarnes2 & rboettcher,

  • I tried to blast the first ten reads from one of the samples I have, blast results were not that good. I tried to align against the available sequences of the two of the top blast hits. Same low alignment rate.
  • I checked for adapters, sequences are already trimmed.

Yes, I am aligning against the transcriptome sequences.

GenoMax 04-03-2013 08:39 AM


Originally Posted by Amative (Post 100642)
I tried to blast the first ten reads from one of the samples I have, blast results were not that good.

You probably want to go into the file some ways. With illumina atleast the first hundred (or more) sequences may not represent the best of the lot since they are generally from the edge of the flowcell/start of the lane.

You may also want to use this tool to do some screening: http://www.bioinformatics.babraham.a.../fastq_screen/

Amative 04-08-2013 12:21 PM

Thanks GenoMax, for the suggestion I am working on it.

I like the fastq_screen It saves some time!

All times are GMT -8. The time now is 05:35 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.