Seqanswers Leaderboard Ad

**GenoMax** · 08-22-2013, 06:32 AM

Have you taken a sample of reads and blasted them to see what comes back? That may give some clue as to what may be happening (i.e. not the data you expect to have). Have you done in any other QC on this data?

**daysfoot** · 08-22-2013, 07:01 AM

Not yet, I am just trying blasted them as you suggested.

**daysfoot** · 08-26-2013, 06:11 AM

I blat(UCSC) some of the sequences(fastq file), which shows the information of the sequence. So the sequence is good?
As for the QC test, I am just on it.

**GenoMax** · 08-26-2013, 06:15 AM

Are the blat hits going to the right organism/genome i.e. there is no unexpected contamination in data? If that is true then it may perhaps just be the case that you need to trim the sequences (to remove adapters etc).

Use FASTQC as a simple option (if you have not done any QC on your data).

**daysfoot** · 08-26-2013, 11:58 AM

By the FASTQC, the "Per base sequence content" of the first 10bp is strange, so maybe they are the adapters of the Fastq file.

**GenoMax** · 08-26-2013, 12:02 PM

RNA-seq data generally has that signature for the first few base pairs. This is a known bias and does not affect alignments or downstream analysis.

See this thread and others within.

**daysfoot** · 08-28-2013, 07:04 AM

The technology is CAGE, not rna-seq, i just mixed them.

**GenoMax** · 08-28-2013, 07:09 AM

What did you find in the blat results? Expected genome matches or otherwise?

**Gonza** · 04-10-2015, 05:51 PM

GenoMax, I am glad to see you in this Forum, I'd like to get your opinion.

2 out of 15 of my libraries have low mapping rates (~60%) when using tophat. The rest are good (~94%). The 60% of these 2 samples translates to ~8 Millions mappable reads to the reference genome, which isn't that bad considering I lost a lot. I looked at the 'unmapped.bam' file and most of these reads map to anything but my model organism.
My question, do you think this will affect downstream analysis? I am assuming if ~8 MR mapped to the genome that is not a total waste ....?

Thank you kindly.
G

**Brian Bushnell** · 04-10-2015, 09:15 PM

1) What are you mapping to what? What is the data source (platform, chemistry) and type (read length, etc), and what is the reference?
2) What do the unmapped reads map to? Human, for example?
3) And what percent of the unmapped reads map to other organisms?
4) By Tophat, do you mean Tophat1 or Tophat2?
5) What kind of QC are you doing? Removing chastity-filter-failed reads, reads that don't exactly match the right barcode (assuming you are multiplexing), adapter-trimming, etc.

Also, it's never a bad idea to post a FastQC report when you have low mapping rates.

**Gonza** · 04-12-2015, 05:01 AM

Thanks Brian. Here are the details. Same question, will this affect downstream analysis (I ask because I have ~8 MR mapped to my reference genome) ?

--

1) Mapping to TAIR10. Illumina HiSeq2500 Single end sequencing
2) unammped reads --> Platynereis dumerilii
3) ~40 % mapping to Platynereis dumerilii
4) tophat2
5) i cleaned the reads using fastq-mcf. After filtering, all the reads have a Q values >30.

Google Code Archive - Long-term storage for Google Code Project Hosting.

https://code.google.com/p/ea-utils/

**GenoMax** · 04-12-2015, 05:08 AM

Did you make the libraries or did the sequence provider make them? If you made the libraries were they pooled by the provider? Were these samples multiplexed and if so do the contaminant reads have the barcode you expected for your own sample?

**Gonza** · 04-12-2015, 06:18 AM

the facility made and sequence them. They were multiplexed and pooled in a single lane.
I have not look at the barcode in the contaminated reads, i guess that is a good idea (Those barcode should be in the unmmaped.bam, right?)

THANKS !

**GenoMax** · 04-12-2015, 06:36 AM

If the contaminating reads were in the same file as your real sample then they must have the same barcode. Since you did not make the libraries and the contamination is not in all lanes something must have gone wrong with those two libraries.

A marine annelid worm is about as far as one can be from Arabidopsis .. unless you have very diverse research interests!

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Low percentage of mapped reads(Tophat)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News