Seqanswers Leaderboard Ad

**areyes** · 07-18-2012, 04:42 AM

Hi senkewiczs,

Do you get a message when running "dexseq_count.py" saying: "X number of reads processed"?

Could you include the first lines of your files Homo_sapiens.GRCh37.67_DEXSeq.gff and human1_sorted.sam ???

Thanks,
Alejandro

**senkewiczs** · 07-18-2012, 08:40 AM

Hi Alejandro,

Yes, I was getting 'xxxx number of reads processed.'

Here is the head of human1_sorted.sam:

PHP Code:


1000_1000_1096    73    chr2    110601762    0    50M    *    0    0    CTGTCCAAATGGAAAAATTATTAAACAAATCTTTTTTAAAATAAAATGCT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII:    RG:Z:20110216225354574    NH:i:0    CM:i:0    SM:i:1    CQ:Z:BBABBBA?@ABBA@?@@A@BB?B@A>=@>A=AA??@@@><?@?==@;<6:    CS:Z:T22112010031020000303303001100322000003000330003132

1000_1000_1096    133    *    0    0    *    chr2    108495505    0    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN    *    RG:Z:20110216225354574    NH:i:0    CQ:Z:BBA@BB@+<?B><>A>2'%@A?13:==3%@BA)5%    CS:Z:G00103210022022202223321212322331323

1000_1000_1096    133    *    0    0    *    chr2    110601762    0    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN    *    RG:Z:20110216225354574    NH:i:0    CQ:Z:BBA@BB@+<?B><>A>2'%@A?13:==3%@BA)5%    CS:Z:G00103210022022202223321212322331323

1000_1000_1587    81    chr3    96336408    0    25H25M    =    96336433    86    AGGCTTATGCGGAGGAGAATGTTTT    FIIIIIIIIIIIIIIIIIIIIIIII    RG:Z:20110216225354574    NH:i:2    CM:i:0    SM:i:3    CQ:Z:@<@;<;>;>?>?==>@:5@=85:>-:>8:99=.?9><:?9>:1?69><93    CS:Z:T30001130222022033133023021331212230300011123012202

1000_1000_1587    161    chr3    96336433    0    33M2H    =    96336408    -86    CATGTTACTTATACTAACATTAGTTCTTCTATA    IH8IHGII>.<H=7AIII=2?IE7.CD0+>II8    RG:Z:20110216225354574    NH:i:2    CM:i:0    SM:i:3    CQ:Z:B@)0A(@;9&)45)/3@>1-&:42&);*'%:@/*)    CS:Z:G31311031203331230113032102202233321

1000_1000_1673    99    chr14    79703924    51    50M    =    79703974    84    AGAGACACTGGGCCTTTTCTCCTTTACTCCAAGAAGAAATGCTCTTTTTT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII=    RG:Z:20110216225354574    NH:i:1    CM:i:0    SM:i:100    CQ:Z:BBBBAA<BBA@ABBBBABBB@BAA>BAB?B@>BAAB?@B>;@A@<BB>==    CS:Z:T32222111210030200022202003122010220220031322200000

1000_1000_1673    147    chr14    79703974    51    35M    =    79703924    -84    CATATATCTTTATTGAATACCTATTATGGTCCATG    @IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    RG:Z:20110216225354574    NH:i:1    CM:i:0    SM:i:65    CQ:Z:BBB?BBBBABB)BABBBAA=AA7AA=;A@AA??=@    CS:Z:G31310210133033201330210330022333331

1000_1000_1921    83    chr10    15880069    45    6H44M    =    15879996    -116    TAATTTAAAGGCAAATTAATCCTGAAGCAATGGATTTAAAATTT    EIIBIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    RG:Z:20110216225354574    NH:i:1    CM:i:0    SM:i:97    CQ:Z:BBBB=>@BB:A>BB@97BB;;=BB<;0BB@99>@<4>@@=03<A%%)%;8    CS:Z:T30030003003201301320212023030300130200300303333030

1000_1000_1921    163    chr10    15879996    45    33M2H    =    15880069    116    TTAATGTTTTTGAAAAGCGTATCTGGGTAGTTA    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIC    RG:Z:20110216225354574    NH:i:1    CM:i:0    SM:i:58    CQ:Z:B6BBBB7ABB?9BABB)BBAB.@<B?4A5=;2<(B    CS:Z:G10303110000120002331332210013210333

1000_1000_260    99    chr2    145882329    42    50M    =    145882410    112    AATGTATACAGGACTTATATGCTGAAAACATGTTGCTGAAAAAATCAAAG    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII7AIIIIIIII1    RG:Z:20110216225354574    NH:i:1    CM:i:0    SM:i:100    CQ:Z:BBBA@BABA=7@@A<<ABA?@B@@<;4?@=B=>98B=@83%=A@=0?@?1    CS:Z:T30311333112021203333132120001131101321200000321002

Here is the head of Homo_sapiens.GRCh37.67_DEXSeq.gff:

PHP Code:


1    Homo_sapiens.GRCh37.67.gtf    aggregate_gene    11869    14412    .    +    .    gene_id "ENSG00000223972"

1    Homo_sapiens.GRCh37.67.gtf    exonic_part    11869    11871    .    +    .    transcripts "ENST00000456328"; exonic_part_number "001"; gene_id "ENSG00000223972"

1    Homo_sapiens.GRCh37.67.gtf    exonic_part    11872    11873    .    +    .    transcripts "ENST00000456328+ENST00000515242"; exonic_part_number "002"; gene_id "ENSG00000223972"

1    Homo_sapiens.GRCh37.67.gtf    exonic_part    11874    12009    .    +    .    transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "003"; gene_id "ENSG00000223972"

1    Homo_sapiens.GRCh37.67.gtf    exonic_part    12010    12057    .    +    .    transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "004"; gene_id "ENSG00000223972"

1    Homo_sapiens.GRCh37.67.gtf    exonic_part    12058    12178    .    +    .    transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "005"; gene_id "ENSG00000223972"

1    Homo_sapiens.GRCh37.67.gtf    exonic_part    12179    12227    .    +    .    transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "006"; gene_id "ENSG00000223972"

1    Homo_sapiens.GRCh37.67.gtf    exonic_part    12595    12612    .    +    .    transcripts "ENST00000518655"; exonic_part_number "007"; gene_id "ENSG00000223972"

1    Homo_sapiens.GRCh37.67.gtf    exonic_part    12613    12697    .    +    .    transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "008"; gene_id "ENSG00000223972"

1    Homo_sapiens.GRCh37.67.gtf    exonic_part    12698    12721    .    +    .    transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "009"; gene_id "ENSG00000223972"

It appears that part of the problem is differences in the annotation of chromosomes between the two files?

Thanks in advance for any advice......

**areyes** · 07-18-2012, 08:44 AM

Indeed, I believe that is the problem! The chromosome names should match

Alejandro

**Ashwini Kumar** · 08-21-2012, 06:48 AM

Hi there,

I am new to DEXSeq. I want to know how to calculate the number of reads mapping to each of the exons of a genome. I mean how to prepare input for the DEXSeq analysis.

Many thanks
Ashwini

**areyes** · 08-21-2012, 06:50 AM

Hi Ashwini,

The pasilla package is an example dataset for the DEXSeq vignette and in the pasilla vignette you will find the steps to create an ExonCountSet object.

Alejandro

**Ashwini Kumar** · 08-21-2012, 07:00 AM

Hi Alejandro,

Thanks for your quick response. OK I think in section 9 I can found this creating ExonCountSet objects. can you please also let me know that do I need to write any python script or I can found this in the DEXSeq package.

Thanks again
Ashwini
Ashwini

**areyes** · 08-21-2012, 07:03 AM

You will find these scripts in the DEXSeq package directory!

**Ashwini Kumar** · 08-21-2012, 07:05 AM

Many Thanks!!!!!!!!!!!!!! Alejandro
You saved me

**Ashwini Kumar** · 08-27-2012, 11:27 PM

Hi Alejandro,

I have run both the python scripts-
1.[akumar@mars python_scripts]$ /apps/python262/bin/python dexseq_prepare_annotation.py /homes/akumar/PROJECT_FOLDER/Homo_sapiens.GRCh37.68.gtf Homo_sapiens.GRCh37.68.gff

2. [akumar@mars python_scripts]$ /apps/python262/bin/python dexseq_count.py Homo_sapiens.GRCh37.68.gff /homes/akumar/PROJECT_FOLDER/tophat_paired_sorted.sam Output.txt

Now the output is -48100000 reads processed.

[akumar@mars python_scripts]$ head Output.txt
ENSG00000000003:001 6
ENSG00000000003:002 0
ENSG00000000003:003 1
ENSG00000000003:004 1
ENSG00000000003:005 0
ENSG00000000003:006 0
ENSG00000000003:007 0
ENSG00000000003:008 0
ENSG00000000003:009 1
ENSG00000000003:010 0

So now what I have to do, how to create ExonCountSet and how to do the further analysis. I am stuck here, please help me.

Many thanks
Ashwini

**areyes** · 08-27-2012, 11:33 PM

Check the function read.HTSeqCounts from DEXSeq. This will read this files, parse relevant information for the package and return an ExonCountSet object in your R session.

**Ashwini Kumar** · 08-27-2012, 11:42 PM

Thanks Alejandro

**Ashwini Kumar** · 09-04-2012, 05:52 AM

Hi Alejandro,

I am wondering that when I am converting the bam file to sam file using samtools at that time is there any need to sort the sam file. If so then which command I have to use.

Thanks
Ashwini

**Simon Anders** · 09-04-2012, 10:22 PM

Originally posted by Ashwini Kumar View Post

I am wondering that when I am converting the bam file to sam file using samtools at that time is there any need to sort the sam file. If so then which command I have to use.

Sorting is required only for paired-end data; for single-end data, there is no need for sorting. To sort by read name, use 'samtools sort -n'.

**gokhulkrishnakilaru** · 10-09-2012, 11:29 AM

Originally posted by areyes View Post

Indeed, I believe that is the problem! The chromosome names should match

Alejandro

So should we add "chr" to the gff file?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Problem working with DEXSeq package

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News