Seqanswers Leaderboard Ad

**Ajayi Oyeyemi** · 04-01-2013, 05:44 PM

Hey Guys please help me. My question hasn't been answered. Can someone help me out?

**ECO** · 04-01-2013, 08:41 PM

Please don't post specific inquiries in major threads, and them bump them. This is your own thread for this query.

**eszter.ari** · 04-03-2013, 05:28 AM

Try to run HTSeq without the header of the SAM file.

**Simon Anders** · 04-03-2013, 05:42 AM

The vast majority of your reads seem to map to more than one locus. How comes?

Maybe visualize your BAM files with a genome browser and investigate.

**Ajayi Oyeyemi** · 04-03-2013, 11:02 AM

Originally posted by eszter.ari View Post

Try to run HTSeq without the header of the SAM file.

Sorry I'm a newbie. Can you be more explicit? How can I run HTSeq without the header of sam file? And how can it be removed?
I'm sorry if this question is naive.

**Ajayi Oyeyemi** · 04-03-2013, 11:05 AM

Originally posted by Simon Anders View Post

The vast majority of your reads seem to map to more than one locus. How comes?

Maybe visualize your BAM files with a genome browser and investigate.

Thanks Simon; I have igv installed on my system. What are the things to watch out for. How can I know if a read is mapping to multiple regions using the igv. I'm so sorry as I'm just a newbie...

**Ajayi Oyeyemi** · 04-04-2013, 07:08 AM

htseq

@Simon and All,

I investigated my bam files as advised by Simon and I was able to take some snap shots. Can you help me take a look? I could see some of my reads in the intergenic region with most of them significantly enriched at the 5' and 3' ends.

I'll appreciate your comments and Thanks in advance.

Yemi.

Attached Files

**Simon Anders** · 04-04-2013, 07:57 AM

Your third screen shot is how things are supposed to look like. These heaps way beyond the 3' ends in the other screenshots look quite unusual. They are probably what gives rise to al the "no_feature" counts.

Are there more such heaps in regions even further away from the genes?

For anybody to make a guess what is going on, you'll need to tell us more about your experiment. (Which organism? What kind of samples? Which wet-lab protocol? What biological question? Anything non-standard in your procedure or samples?)

About the reads with "alignment_not_unique": We cannot see from the screenshot which ones these are. If you hover your mouse over a read, you get the full information on it from the SAM file. Look for the optional field "NH". It tells you to how many places this read was mapped to. Are, for example, the uniquely mapping reads ("NH:i:1") in the genes and the multireads (NH>1) in these intergenic heaps?

**Darwin** · 04-04-2013, 08:20 AM

The screenshots suggest massive RNA degradation, causing 3' bias. Three prime UTRs are often repetitive, and since since most of your reads align there that would explain why your maps are not unique. Did the total RNA look good on the bioanalyzer prior to library synthesis?

**Ajayi Oyeyemi** · 04-05-2013, 06:13 AM

Originally posted by Simon Anders View Post

Your third screen shot is how things are supposed to look like. These heaps way beyond the 3' ends in the other screenshots look quite unusual. They are probably what gives rise to al the "no_feature" counts.

Are there more such heaps in regions even further away from the genes?

For anybody to make a guess what is going on, you'll need to tell us more about your experiment. (Which organism? What kind of samples? Which wet-lab protocol? What biological question? Anything non-standard in your procedure or samples?)

About the reads with "alignment_not_unique": We cannot see from the screenshot which ones these are. If you hover your mouse over a read, you get the full information on it from the SAM file. Look for the optional field "NH". It tells you to how many places this read was mapped to. Are, for example, the uniquely mapping reads ("NH:i:1") in the genes and the multireads (NH>1) in these intergenic heaps?

Thanks Simon and all. I had a second look at the alignment and I observed that while some aligned in the genes, it appears that vast majority of reads aligning to regions farther away from known genes, with most extending beyond the 3 prime end.

As to other questions posed, I'm working with skin samples obtained from cattle and I used the illumina TruSeq kit to make the libraries. Our study sought to investigate cattle species that were raised in different environmental conditions (tropically adapted and temperate adapted).

I investigated my sam files. While some had NH:1, vast majority had more than 1, with some having NH:20. I checked the tophat manual and I realised that the default value was 20. Is there anyway this can be resolved given that there are many paralogous genes in this species?

As for the RNA quality, it ranged from 6.8 to 7.4. We decided to give it a shot since the samples were so hard to get, much more so that the RNA is being extracted from skin.

Please let me know your thoughts...

Yemi.

**Ajayi Oyeyemi** · 04-05-2013, 06:17 AM

@Darwin,
Thanks Darwin. The Agilent readings were between 6.8 and 7.4. We decided to give this a shot because the samples were so hard to get.

IS there anyway one can beat around this?

**Simon Anders** · 04-05-2013, 06:21 AM

"NH:20" means that these genes mapped to 20 or more loci with all extremely similar sequences. This must be some highly repetative feature that is all over the genome. So, have a look at some of those, and check in Ensembl or wherever what kind of repetetive elements the reads map to. I guess it will not be genes, because at least in the species I work with, there are few genes with that many paralogous copies (epecially not copies so similiar that TopHat cannot decide between them.) Of course, there are many repetitive elements with thousands of copies in mammalian genomes, but they should not be transcribed and hence not turn up in RNA-Seq data. So, have a look and see what exactly all these multireads map to.

**Ajayi Oyeyemi** · 04-05-2013, 07:25 AM

HTSeq

@Simon,

I took a snapshot of one of the regions where I have huge reads mapping to it(in the repeat_igv_file). There isn't any gene lying in that region. I went to check in ensemble as advised and since I used it during tophat alignments. The View bottom file is the region in ensemble.

Any clues?

Attached Files

**Ajayi Oyeyemi** · 04-05-2013, 07:35 AM

Originally posted by Simon Anders View Post

"NH:20" means that these genes mapped to 20 or more loci with all extremely similar sequences. This must be some highly repetative feature that is all over the genome. So, have a look at some of those, and check in Ensembl or wherever what kind of repetetive elements the reads map to. I guess it will not be genes, because at least in the species I work with, there are few genes with that many paralogous copies (epecially not copies so similiar that TopHat cannot decide between them.) Of course, there are many repetitive elements with thousands of copies in mammalian genomes, but they should not be transcribed and hence not turn up in RNA-Seq data. So, have a look and see what exactly all these multireads map to.

I clicked on the link that connects ensemble to ucsc and ncbi. Interestingly in ncbi region 79,025,650-79,027,700 bp, version 6.1 in Mapview, there seems to be a gene lying in that region LOC100847108 which lies between LPP and TPRG1 which was missed by ensemble. Can you please help me check the view top file attached( which was the image on the top of the view bottom file I previously posted). I appreciate your efforts...

Attached Files

ViewTop.pdf (5.7 KB, 4 views)

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

HTseq help

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News