Seqanswers Leaderboard Ad

**sasignor** · 05-15-2012, 06:09 PM

The verbose option crashes terminal every time I use it (after a few hours, but before the mapping is done), so I don't know about that.

The fastqc plots of kmer and overrepresented sequences are rather odd looking, but as this is a transcriptome it is unclear to me what the expectation should be, although from what I can tell by googling around what I have is not unusual for a transcriptome. I did not use barcodes for this dataset.

I have tried doing some additional quality filtering to see if that makes a difference. I will report back on that.

Just mapping the reads that do map and completing the pipeline does produce contigs that blast appropriately.

**sasignor** · 05-15-2012, 06:39 PM

Additional qc and the addition of the -v 3 option resulted in .034% of the reads mapping in bowtie, so I still have no idea whats wrong.

**arvid** · 05-16-2012, 12:59 AM

Originally posted by sasignor View Post

Just mapping the reads that do map and completing the pipeline does produce contigs that blast appropriately.

Well, that's not the point - obviously these came from the right species. I rather meant to take the reads that do not map (or all, for simplicity), and feed them through e.g. Trinity. Then take some of the contig and blast them to nt to see what species you have sequenced. If you do it for let's say 10 million reads, this goes really fast and should give you an idea whether contamination is a problem...

**sasignor** · 05-17-2012, 06:35 AM

This is the output of FastQC for one of the files I am using - the other is comparable. Again - it does look unusual for a genome but as far as I can tell not for a transcriptome, and not for transcriptomes I have successfully aligned in the past.

Attached Files

**westerman** · 05-17-2012, 07:22 AM

Lots of poly-T/A in there.

Try using bowtie2 (not bowtie) in '--local' mode.

**arvid** · 05-18-2012, 12:10 AM

Yeah, lots of poly-T - my transcriptome data doesn't look like that, for sure. The weird GC-content can't be biological IMHO - everything up to base ~55 looks crazy.
Did you look hard for adaptors? Perhaps you have a lot of weirdly ligated fragments in there, or the sequencing run had problems - talk to your provider.
A shot in the dark may be to try to clip the sequences up to 55 (cut out 55-95 or so) and try to map that...

**Mali Salmon** · 05-23-2012, 12:28 AM

Hello sasignor
I am wondering if you diagnosed the problem
I am facing a similar issue, and would love to hear about your progress
Thanks

**fjrossello** · 07-31-2012, 07:44 PM

Originally posted by sasignor View Post

I am attempting to align a transcriptome sequenced with Hiseq to a reference using bowtie. The parameters I am using are:

bowtie -S -p 2 reference -q --phred64-quals

And none of the reads align. They also do not align if I do not include the quality parameter, or any modification such as the --ff suggested in related postings.

I have checked for adapter contamination and found very little, the reads were cleaned using ngs backbone, although I am not using that pipeline for anything downstream of cleaning. It has also been reported by some that they do not align in paired end but do in single end, my reads do not align in either case. Around 30% of the reas align in bwa.

Does anyone have any idea why this is the case?

Thanks!
Sarah

Hi Sarah,

Your reads look as if they were produced with CASAVA v1.8, which reports Phred+33 Q-scores (Illumina 1.9/Sanger). If that is the case, removing the --phred64-quals option from the bowtie command may do the trick (phred 33 is default).

Cheers,

Fernando

**Sun-SEQ** · 08-01-2012, 12:42 AM

It looks to me that the first 10 bases of all your reads are similar if not identical sequences. Have you checked overrepresented sequences output from FastQC? It may also give you information about which kind of contamination (adaptors) the overrepresented sequences might be.

Try cleaning up reads before aligning them with bowtie, e.g. clip adaptors, trim low-quality bases, trim polyA tails, remove reads with low-complexity regions, etc. Prinseq or seqclean can do this job.

Sunny

**aeonsim** · 08-01-2012, 01:34 AM

Also might be worth checking with your sequence provider to make sure they sent you the right dataset.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News