Converting tab-delimited text file into HTML/PDF/latex/knitr report.

Anil K

Junior Member

Join Date: Apr 2015

Posts: 2
- Share
- Tweet
#1

Converting tab-delimited text file into HTML/PDF/latex/knitr report.

06-19-2015, 04:44 AM

I have a bash script that takes ABI file as input and uses ANNOVAR for annotating the variants. A tab-delimited text file is produced that contains the annotated variants. So everytime the bash script is executed for different ABI files, the number of columns are fixed in the tab-delimited file but the number of rows as well as the individual annotations may vary for each resulting variant.

Please see "Annovar_Result.txt" for ANNOVAR's tab-delimited file.

Attempts so far-->

I have tried to write a bash script that extracts [for the first variant] different fields from the tab-delimited text file, saves it as text file, combines all the resulting text individual files and using AWK script it assigns different variables to each of the fields in the Combined Text File. I have created HTML page using AWK and have used these variables in AWK script to print in respective tags in HTML and it works fine for a file that follows the same pattern in tab-delimited text file. But when a particular field is not present for other annotated results with different pattern, the script prints different fields than the variable it has been assigned for.

So in the above example, the first variant contains the Clinically significant mutation since there is annotation present in the "clinvar" column and thus it needs to be reported in a different section along with other details.

Please see "combined_fields.png " for the combined text file.

The order of the combined text file is not the same for each variant, hence the report generated for it is not correct.

Expected Result-->

Since the format of the tab-delimited file is not uniform, is there any way that for each row I can set multiple conditions wherein for example If a specific column [for ex:clinvar] has a value, then print it in between HTML tags and if it is not present, then check for another column [for ex: rsID] and if a value is present then print it in some other HTML tags, and so on for other columns as well!

Please see "expected_result.png " for the expected output.

In a similar manner, when there is a novel variant wherein the ExonicFunc.refGene column contains "non-synonymous" and there is no value in the snp138 column, then it should print the SIFT_score along with other details in between HTML tags. These are just some of the conditions that are needed, but if anyone can give an idea as to how to go about all this, it will be really helpful!!!

Thank you for reading such a long issue and any help on this problem would be greatly appreciated.
Attached Files

expected_result.png (14.9 KB, 2 views)

combined_fields.png (13.4 KB, 2 views)

Annovar_Result.txt (1.2 KB, 3 views)
Tags: annovar annotation, awk, html, parsing, reporting variants
GenoMax

Senior Member

Join Date: Feb 2008

Posts: 7140
- Share
- Tweet
#2

06-19-2015, 04:53 AM

Cross-posted on Biostars: https://www.biostars.org/p/147283/

Since this is not strictly an bioinformatics issue (but a formatting one) you may want to post it on another site such as stackexchange.
Comment
Anil K

Junior Member

Join Date: Apr 2015

Posts: 2
- Share
- Tweet
#3

06-19-2015, 08:44 PM

Well, I am not able to delete posts either here or in biostar!
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Converting tab-delimited text file into HTML/PDF/latex/knitr report.

Comment

Comment

Latest Articles

ad_right_rmr

News