Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TCGA cancer data, and bioinformatics design questions for SNP/ mirna analysis

    I'm looking for some help either developing a pipeline or using the proper tools and the correct data sets for the below (see goal).

    My languages of choice would be in python/R .

    Goal: I'm looking to create a disease specific profile of just SNPs and SNPs in miRNAs and miRNA target sites. Ideally I would get chromosome information, location and how the various SNPs mentioned above interact for a disease profile.

    PART 1: TCGA
    My first problem is using TCGA data which lists a ton of abhorrent mutations in a LOH .txt format. I'd like to be able to map those mutations to SNP's or genes or miRNA (whatever entities they belong to). The TCGA datasheet is here. Example data is here for breast cancer. I guess I can use the miRNA and mRNA data as well from there.

    Questions here:
    1. How to decipher the LOH data to figure out if it's meaningful and where it maps?
    2. Which tools to use for mapping and what formats for the final data ? Fasta ?
    3. miRNA/Targets and SNPs Next up is getting cancer specific miRNAs and mRNAs and mapping SNPs to them? I'm assuming using dbSNP or Sanger miRNA databases to get miRNA/targets and seed sequences.


    Part 2:
    I'm a bit lost as how to combine all these pieces of information, what formats to use for output (linked to individual pieces) and which tools if any to use to gather all this data using python. This tool is useful as well I think, mirdsnp.

    Any help for how to combine all this data, best practices for mapping snps and miRNAs etc.. and if there are any biopython/bioconductor tools or approaches. I'm having trouble with where to start, how to parse LOH files to get meaningful data out and how to combine it with the other tools..

    I'm doing this in an exploratory method so that I can use this information to design understand experiments later on.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin


    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
55 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
52 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
45 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X