Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating "artificial" technical replicates of NGS data in Immunogenetics

    Hi,

    I hope this question has not been posted somewhere here and I haven'nt found it. In fact I would like to generate a kind of replicate dataset from an existing 454 Junior NGS dataset based on known error rates. The problem is that I am dealing with reads coming from a project which aims to analyse the B-cell repertoire. This means, I have no real reference (for those which are not familiar - in short: the reads represent quasi a combination of differnt genes which are mixed during an infection and additionally affected by mutations, so that an alignment, or a generation of synthethic reads based on the reference genome does not help me.)

    I would like to do something quiet simple:
    - take the reads I have
    - build any kind of error model including respective error rates
    - generate new NGS dataset, in fact representing the original one, but modified based on error rates.

    I there anything out there, which can do that already? I had a look on Art, but I guess this does not help me.

    Alternatively I could also implement something by myself...But then I was wondering how I practically consider for example an indel-error-rate of 0.38/100bp. I have reads which are ~220 bp long --> error-rate = 0.84/read

    I am wondering what I can do with this number... is it valid to say:
    0.84 idel_error / read --> 84 indel_error_events / 100 reads

    My dataset has ~20.000 reads, so that I would have to perform ~16700 indel_events. I could here select the reads randomly, check for homopolymers (in case of >1 homopolymer, random selection) and the inserting or deleting lets say bewteen 2 and 4 nts. Substitution errors I would add with a very low rate also randomly.

    My aim is to get a rough idea about the robustness of my analysis (and to convince the biologist that a technical replication or a control sample might make sense for his particular research question ). I know that it is not 100% correct, since my data already include an error.

    I would be happy for any suggestion.
    Thanks in advance.

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
22 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Working...
X