Generating "artificial" technical replicates of NGS data in Immunogenetics

wieni

Junior Member

Join Date: Jun 2012

Posts: 6
- Share
- Tweet
#1

Generating "artificial" technical replicates of NGS data in Immunogenetics

03-29-2013, 05:13 AM

Hi,

I hope this question has not been posted somewhere here and I haven'nt found it. In fact I would like to generate a kind of replicate dataset from an existing 454 Junior NGS dataset based on known error rates. The problem is that I am dealing with reads coming from a project which aims to analyse the B-cell repertoire. This means, I have no real reference (for those which are not familiar - in short: the reads represent quasi a combination of differnt genes which are mixed during an infection and additionally affected by mutations, so that an alignment, or a generation of synthethic reads based on the reference genome does not help me.)

I would like to do something quiet simple:
- take the reads I have
- build any kind of error model including respective error rates
- generate new NGS dataset, in fact representing the original one, but modified based on error rates.

I there anything out there, which can do that already? I had a look on Art, but I guess this does not help me.

Alternatively I could also implement something by myself...But then I was wondering how I practically consider for example an indel-error-rate of 0.38/100bp. I have reads which are ~220 bp long --> error-rate = 0.84/read

I am wondering what I can do with this number... is it valid to say:
0.84 idel_error / read --> 84 indel_error_events / 100 reads

My dataset has ~20.000 reads, so that I would have to perform ~16700 indel_events. I could here select the reads randomly, check for homopolymers (in case of >1 homopolymer, random selection) and the inserting or deleting lets say bewteen 2 and 4 nts. Substitution errors I would add with a very low rate also randomly.

My aim is to get a rough idea about the robustness of my analysis (and to convince the biologist that a technical replication or a control sample might make sense for his particular research question ). I know that it is not 100% correct, since my data already include an error.

I would be happy for any suggestion.
Thanks in advance.
Tags: None

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Generating "artificial" technical replicates of NGS data in Immunogenetics

Latest Articles

ad_right_rmr

News