Hi Ies,
Sorry I missed your post earlier. There isn't a verbose logging mode in CNVkit, but the messages on standard error are fairly verbose already and should always report when there is an error. In particular, if something crashes then you'll see a Python traceback message. However, if you parallelized the batch run (-p >1), the messages from each process will be interleaved, which makes them somewhat harder to intepret.
The scatter and diagram PDFs should always be generated in a batch run; there isn't a special code path where they would be skipped. Does the log say something like "Wrote MySample-scatter.pdf" for the missing PDFs, or not?
The --scatter option uses matplotlib to generate a PDF. On a cluster, the default matplotlib backend (e.g. Wx or Gtk) might not be available, and so I guess it's possible the plotting engine of matplotlib gets confused and silently fails to write the file. You could address that by setting a different backend on your cluster -- create a file called "matplotlibrc" in the current working directory or your home folder, with the contents:
backend : pdf
The --diagram option uses a different backend, Reportlab, which always generates a PDF from scratch and does not have an interactive mode. I can't think of a reason why this one would occasionally fail to write a file. Can you suggest anything unusual about your system's configuration? Outdated software versions, maybe?
If the diagram is showing labels for hundreds of genes, that means:
(a) you did exome sequencing, so there's lots to show;
(b) significant copy number alterations cover large regions of chromosomes in your sample; and/or
(c) the purity of your tumor samples is fairly high.
You can:
- Thin out the labeling to some extent by specifying a higher threshold (-t) log2 ratio value in the diagram command; the default is 0.6, so try 0.8 or 0.9 to only show the higher-amplitude CNAs.
- Drop the labels altogether by specifying a high value for -t or just passing the .cns segment file (with -s), without the .cnr.
- Use the "heatmap" command instead to view the unlabeled CNA regions for many samples at once.
- Use the "gainloss" command to list all genes with log2 ratio amplitudes beyond a given threshold, essentially the labels you're currently seeing on the diagram but in a more manageable plain-text, tabular format.
The diagram is based on Biopython's Graphics/BasicChromosome module. If you're handy with Python and have a specific modification in mind, you could edit cnvlib/diagram.py (202 lines) to do it. For example, you can change PAGE_SIZE to much larger dimensions like 22x17" and the chromosomes will scale proportionally, but the gene labels stay the same size and will be more readable if they were overlapping before.
Thanks for the suggestion on sample.bai, I'll look into it.
Cheers,
Eric
Sorry I missed your post earlier. There isn't a verbose logging mode in CNVkit, but the messages on standard error are fairly verbose already and should always report when there is an error. In particular, if something crashes then you'll see a Python traceback message. However, if you parallelized the batch run (-p >1), the messages from each process will be interleaved, which makes them somewhat harder to intepret.
The scatter and diagram PDFs should always be generated in a batch run; there isn't a special code path where they would be skipped. Does the log say something like "Wrote MySample-scatter.pdf" for the missing PDFs, or not?
The --scatter option uses matplotlib to generate a PDF. On a cluster, the default matplotlib backend (e.g. Wx or Gtk) might not be available, and so I guess it's possible the plotting engine of matplotlib gets confused and silently fails to write the file. You could address that by setting a different backend on your cluster -- create a file called "matplotlibrc" in the current working directory or your home folder, with the contents:
backend : pdf
The --diagram option uses a different backend, Reportlab, which always generates a PDF from scratch and does not have an interactive mode. I can't think of a reason why this one would occasionally fail to write a file. Can you suggest anything unusual about your system's configuration? Outdated software versions, maybe?
If the diagram is showing labels for hundreds of genes, that means:
(a) you did exome sequencing, so there's lots to show;
(b) significant copy number alterations cover large regions of chromosomes in your sample; and/or
(c) the purity of your tumor samples is fairly high.
You can:
- Thin out the labeling to some extent by specifying a higher threshold (-t) log2 ratio value in the diagram command; the default is 0.6, so try 0.8 or 0.9 to only show the higher-amplitude CNAs.
- Drop the labels altogether by specifying a high value for -t or just passing the .cns segment file (with -s), without the .cnr.
- Use the "heatmap" command instead to view the unlabeled CNA regions for many samples at once.
- Use the "gainloss" command to list all genes with log2 ratio amplitudes beyond a given threshold, essentially the labels you're currently seeing on the diagram but in a more manageable plain-text, tabular format.
The diagram is based on Biopython's Graphics/BasicChromosome module. If you're handy with Python and have a specific modification in mind, you could edit cnvlib/diagram.py (202 lines) to do it. For example, you can change PAGE_SIZE to much larger dimensions like 22x17" and the chromosomes will scale proportionally, but the gene labels stay the same size and will be more readable if they were overlapping before.
Thanks for the suggestion on sample.bai, I'll look into it.
Cheers,
Eric
Comment