SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
tophat Error running running 'prep_reads' victoryhe Bioinformatics 2 10-17-2011 05:53 AM
library issues jjenrette Illumina/Solexa 1 07-05-2011 01:50 PM
issues with SOAPdenovo bioBob Bioinformatics 2 04-20-2011 01:17 PM
Issues in running sff_extract yy01 Bioinformatics 3 09-23-2010 08:58 AM
gsMapper issues mjleaks 454 Pyrosequencing 1 05-12-2009 07:13 AM

Reply
 
Thread Tools
Old 04-09-2012, 09:11 AM   #1
emilyjia2000
Member
 
Location: usa

Join Date: May 2011
Posts: 59
Default issues on running ExomeCNV

Hello,

I am trying to use ExomeCNV to do copy number analysis on exome sequencing data. I have bam files sorted already. I tried to use shell scripts for generating coverage files, one error message came out:

making pileup
[main] The `pileup' command has been removed. Please use `mpileup' instead.

I checked on all of the scripts provided by author, I saw 'pileup' command there, I don't know what that's supposed to mean. Also the coverage files generated from this run are "0" in the last four columns.

command Line I used:
sh bam2coverage.sh sorted.bam exome.bed hg19.fasta script-directory

Thanks,
emilyjia2000 is offline   Reply With Quote
Old 04-13-2012, 04:04 AM   #2
spoonman
Junior Member
 
Location: Austria, Vienna

Join Date: Jul 2011
Posts: 4
Default

Hi Emily,

It think your samtools version is causing all the trouble. ExomeCNV wiki page is recommending "v0.1.16 as newer version may not support pileup function/files". If your samtools version is newer you'll run into trouble, cause the pileup command has been replaced with mpileup. What you can do is the following:
1.) Temporily replace samtools with version 0.1.16 OR
2.) Manually edit the relevant scripts (those containing the line samtools pileup and replace it with mpileup OR
3.) I recommend the following: Install GATK and run it as explained in their wiki: https://secure.genome.ucla.edu/index...CNV_User_Guide

Best,
spoonman
spoonman is offline   Reply With Quote
Old 04-17-2012, 08:48 AM   #3
emilyjia2000
Member
 
Location: usa

Join Date: May 2011
Posts: 59
Default

Thanks Spoonman! it is version issue.

Now I sort of have another problem. I got it run through, but the results don't look right. only chromosome1 has deletion/amplifications, on other chromosomes, there is really nothing happened. please see the attached output file.
Do you happen to know what's wrong with it?

Thanks again.
Attached Files
File Type: pdf Rplots.pdf (22.1 KB, 91 views)
emilyjia2000 is offline   Reply With Quote
Old 04-17-2012, 10:33 AM   #4
spoonman
Junior Member
 
Location: Austria, Vienna

Join Date: Jul 2011
Posts: 4
Default

Hi Emily,
It' kind of hard to troubleshoot based on that picture. I personally never ran into that problem. I'll try to give you some hints:

1.) Check your input file. Did the coverage calculation for your cell lines work properly? Do you have coverage information for all chromosomes?
The output file should look similar to this: http://genome.ucla.edu/~fah/ExomeCNV...erage.gatk.txt

2.) In R you have to define the chromosomes you want to analyse. Are you sure you included all in your list? [chr.list = c("chr1","chr2","chr3",......)]

3.) A more general recommendation: If you run ExomeCNV, I would run each command individually and check what the output variable looks like. Makes it easier to pinpoint where something started going wrong.

Best,
spoonman
spoonman is offline   Reply With Quote
Old 04-18-2012, 02:09 PM   #5
emilyjia2000
Member
 
Location: usa

Join Date: May 2011
Posts: 59
Default

Thanks spoonman for the good tips!

I used the author shell script to do the coverage, from the overview of coverage, the chromosome1 looks the best coverage compared to other chromosomes, I don't know why, it looks strange.
If I want to run through R script, I have to add chrmosome 1 in the first option of chromosome list, otherwise it won't work, the error will be generated.

Thanks,
emilyjia2000 is offline   Reply With Quote
Old 05-14-2012, 12:33 PM   #6
emilyjia2000
Member
 
Location: usa

Join Date: May 2011
Posts: 59
Default

Hi Spoonman,

I tried to use GATK to get the coverage file, from the user guide, there is one file unclear to me, which is "-L <exome.interval_list>". how did you get exome.interval_list using GATK?

My output file only has 9 columns (no last column), but the sample file has 10 columns. What's yours?

Thanks,

Last edited by emilyjia2000; 05-14-2012 at 01:03 PM.
emilyjia2000 is offline   Reply With Quote
Old 05-14-2012, 01:47 PM   #7
spoonman
Junior Member
 
Location: Austria, Vienna

Join Date: Jul 2011
Posts: 4
Default

Hi Emily,
-L <exome.interval_list> is the list of your targeted exons. I don't know what exons/genes you targeted (bait design). Each experiment has its own setup. You should probably ask the person that runs the NGS machine, I'm sure he/she knows what you need. But to be clear, GATK is not needed for "getting" the exon.interval_list. The exon.interval_list is information GATK needs so it can calculate read depth within the specified chromosomal range - listed in your exome.interval_list file. Be carful about the format of that file, as explained in their wiki: https://secure.genome.ucla.edu/index...CNV_User_Guide the file needs the following format:
chr#:start-end
I think my file had all 10 columns, but I wouldn't be worried about that, as long as the first 3 column are there, once you read in the file in R the rest of the file information get's lost anyway.
Read this:
"Q: I noticed that when running function "read.coverage.gatk(file)" to reformat GATK output, these following two fields "sequenced.base" and "bease.with..10.coveratge" are all with "NA". Do you think "NA" in these two fields will affect the detection results?
A: No, the fields "sequenced.base" and "bases.with..10.coverage" are degenerate and are not important for ExomeCNV to function. They are there for a historical reason."

Good luck!
spoonman
spoonman is offline   Reply With Quote
Old 06-29-2012, 02:42 AM   #8
marcela
Junior Member
 
Location: Sweden

Join Date: Feb 2011
Posts: 7
Question One single chromosome

Hi!

I have a bam file that containg only information for one chromosome (chr6).
When I run

% sh ~/programs/CNA/bam2coverage.sh bamFile exons.bed genome.fa ~/programs/CNA/

I got the coverage files for all the chromosomes with 0 values, while all the wig files refere to chr1


probe1 chr1 67000041 67000051 11 0 0 0 0
probe35 chr1 48999844 48999965 122 0 0 0 0
probe36 chr1 49000561 49000588 28 0 0 0 0
probe37 chr1 49005313 49005410 98 0 0 0 0
probe38 chr1 49052675 49052838 164 0 0 0 0


So I run it with bed/fasta files including only data from chr6, but I still get 0 values in the coverage files, and all the wig files had info from chr6.

Any ideas on how to solve this?

Thanks!
marcela is offline   Reply With Quote
Old 04-18-2013, 08:40 AM   #9
shruti
Member
 
Location: Milan

Join Date: Mar 2010
Posts: 35
Default

Hi I'm running ExomeCNV for the my data.
I managed to get the coverage output though I get error while running the R script that they provide.

========================================
Error in if (lim.logR[1] == -Inf) lim.logR[1] = min(all.ecnv$logR[all.ecnv$logR != :
missing value where TRUE/FALSE needed
Calls: do.plot.eCNV
Execution halted
========================================

has any of you got this error and could suggest a solution?

thanks
shruti
shruti is offline   Reply With Quote
Old 03-17-2014, 02:32 AM   #10
haojam
Member
 
Location: seoul, korea

Join Date: Jun 2010
Posts: 12
Default

Hi All,

I have a bam file that containg only information for chr21.
When I run sh bam2coverage.sh

command :
sh bam2coverage.sh /path for bam file/chr21.sortedpicard.rmdup.realigned.recal.bam /path for chr21.bed/chr21.CapturedRegion.bed /path for reference/chr21.fa

Output :
probe chr probe_start probe_end targeted base sequenced base coverage average coverage base with >10 coverage
probe1 chr21 9907173 9907501 329 0 0 0 0

I am running this bam2coverage script under samtools-0.1.16 which supports bam2coverage.sh

Any suggestion and response would be highly accepted.

Thanks,
Rocky
haojam is offline   Reply With Quote
Old 01-26-2015, 09:48 AM   #11
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

@spoonman

I would like to know how to make the files read if I take the output of GATK depthofcoverage, since it contains one file that contains all the chromosome summary together. How do I use the GATK files as input for ExomeCNV, I could only see that the command read.gatk.coverage but then what are the parameters?

Thanks
vd4mindia is offline   Reply With Quote
Old 01-27-2015, 06:16 AM   #12
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

am using the GATK DepthofCoverage output for exomeCNV, but am getting some error in dataframe.
My commands for normal and tumor for 3 chromosomes
chr.list = c("chr19","chr20","chr21")
normal = read.coverage.gatk("~/Desktop/Whole Genome Sequencing/CNV_detection_2015/ExomeCNV/GATK_DepthofCov_out/summary_coverage/S_313_N_S8980.coverage.sample_interval_summary")

tumor = read.coverage.gatk("/Users/vdas/Desktop/Whole Genome Sequencing/CNV_detection_2015/ExomeCNV/GATK_DepthofCov_out/summary_coverage/S_313_T_S7998.coverage.sample_interval_summary")



demo.logR = calculate.logR(normal, tumor)

demo.eCNV = c()
for (i in 1:length(chr.list)) {
idx = (normal$chr == chr.list[i])
ecnv = classify.eCNV(normal=normal[idx,], tumor=tumor[idx,], logR=demo.logR[idx], min.spec=0.9999, min.sens=0.9999, option="spec", c=0.5, l=70)
demo.eCNV = rbind(demo.eCNV, ecnv)
}

When am executing demo.eCNV runs with an error
Error in `$<-.data.frame`(`*tmp*`, "copy.number", value = NA) :
replacement has 1 row, data has 0

Also in the website it is written last column can be NA but I am having 2 columns as NA if I read the coverage file of gatk output. Once is the column of sequenced.base and the other is the last one. Also my chr column is in order or chr# but one calling the read.coverage.gatk it seems it expects chr1 should be like just 1 and chr string will be appended to it so the typical file uploaded is shown like below

Quote:
probe chr probe_start probe_end targeted.base sequenced.base coverage average.coverage base.with..10.coverage
1 chr1:762098-762270 chrchr1 762098 762270 173 NA 141772 819.49 NA
2 chr1:861282-861490 chrchr1 861282 861490 209 NA 21078 100.85 NA
3 chr1:865592-865791 chrchr1 865592 865791 200 NA 26005 130.03 NA
4 chr1:866326-866498 chrchr1 866326 866498 173 NA 9629 55.66 NA
5 chr1:871060-871244 chrchr1 871060 871244 185 NA 52937 286.15 NA
6 chr1:874365-874774 chrchr1 874365 874774 410 NA 66160 161.37 NA
So there is a discrepancy in the dataframe. Can anyone suggest me how to get rid of this problem
vd4mindia is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO