Seqanswers Leaderboard Ad

**Michael.Ante** · 04-12-2016, 12:34 AM

Hi Ram,

could you check if the chromosomes' name in the bam files are the same as the ones in the gtf?

Code:

samtools view -H my_bam.bam | head

and

Code:

head Homo_sapiens.GRCh37.75.gtf

.

Often, one annotation is 'chr1', another 'Chr1', and sometimes just '1'. If you are mixing these up, no program or tool will find an overlap.

Cheers,

Michael

**Rammohan** · 04-12-2016, 06:00 AM

Thanks a lot Michael,

below is the output of the codes you suggested:

[rshukla@dev01 test]$ samtools view -H Ctrl13058.bam | head
@HD VN:1.0 SO:coordinate
@RG ID:0 SM:Ctrl13058-Trimmed PI:50
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663

[rshukla@dev01 test]$ head Homo_sapiens.GRCh37.75.gtf
#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
1 pseudogene gene 11869 14412 . + . gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene";
1 processed_transcript transcript 11869 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; g ene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana";
1 processed_transcript exon 11869 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "1"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00002234944";
1 processed_transcript exon 12613 12721 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "2"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00003582793";
1 processed_transcript exon 13221 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "3"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00002312635";

The format in .bam file is chr1 where as in .gtf I guess it is just the number.

Do you have any suggestion how to proceed further, I am very new to all these.

Thanks again,

Ram

**GenoMax** · 04-12-2016, 06:09 AM

Can you download a corresponding GTF file from BaseSpace? That should sort this problem out.

It should be possible to change the GTF file you have (from 1 to chr1 etc) but if the above works then you would know that you have a file that exactly corresponds to the genome used for alignment. Using an incorrect GTF file will lead to erroneous results.

Edit: If changing 1 to chr 1 is the preferred option then use the solution suggested by @vivek in this thread.

Copying it here for reference (use your file names) :

Code:

$ awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' no_chr.file> with_chr.file

**Rammohan** · 04-12-2016, 08:24 AM

Thanks GenoMax and Micheal

I tried the other reference genome with same chromosome annotation but it still did not work

Below is the error message

> se <- summarizeOverlaps(features=ebg, reads=bamfiles,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE,suppressWarnings())
Error in validObject(.Object) :
invalid class “SummarizedExperiment0” object: 'assays' nrow differs from 'mcols' nrow

the heads of both the bam and the gtf files are as below:

[rshukla@dev01 test]$ samtools view -H Ctrl13058.bam | head
@HD VN:1.0 SO:coordinate
@RG ID:0 SM:Ctrl13058-Trimmed PI:50
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
[rshukla@dev01 test]$ head Human_hg19.gtf
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 12613 12721 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene exon 12646 12697 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene start_codon 12190 12192 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
chr1 hg19_knownGene CDS 12190 12227 0.000000 + 0 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
chr1 hg19_knownGene CDS 12595 12721 0.000000 + 1 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
[rshukla@dev01 test]$

As per Genomaxs suggestion I am also downloading reference genome from basespace, surprisingly it is 42.3GB !

any further suggestions?

Ram

**GenoMax** · 04-12-2016, 09:41 AM

@Ram you should not need to get the reference genome. The error above is not related to the original issue but an error with your DESeq2 command.

**Rammohan** · 04-12-2016, 11:11 AM

Thanks, what changes should I make to the command?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

BAM files from RNAseq -Alignment (Basespace) not working with DESEq2 in R

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News