Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 genomes - a beginner question

    Hello,

    First, thank you for your time!

    I work on yeast NGS. I like to run an analysis based on the 1000 genomes.
    What kind of computer/facility I need?

    I would like to be able to analyze reads that map to specific 20kb genomic region. I would like to find all reads that at least one of the pair maps this 20kb region and to be able to reassemble them.

    Also, if someone can point me to a practical to work with the 1000 genome (what you need to have before you start) it would be great.

    Thanks!
    EHC

  • #2
    The sort of compute facility you need very much depends on what you want to do.

    Getting the variants which have been discovered in a particular 20KB region is relatively easy and is doable in a limited disc space (say 500GBytes)

    To get the alignments from the bam files for such a region is going to take a lot longer as you have to use samtools to download over ftp your particular 20KB region for more than 1000 individuals and this is both going to time time a disk, I would want to have at least 5TBytes available for your analysis

    If you also want all the unmapped reads for the 1094 low coverage individuals you will need 5Tbytes of disc for those files alone before you are getting your subsections for your 20KB region of the genome

    Then depending on what sort of assembly tools you wish to use you may need a combination of a lot of compute nodes and at least one machine with a lot of memory

    Can you give us details of what you are looking for in these assemblies?

    Comment


    • #3
      Thank you Laura. I am interested in reads that do not behave as expected (mainly broken reads). Is there any easy way to get these in a specific region.

      Comment


      • #4
        Our FAQ describes how to get subsections of files http://www.1000genomes.org/faq/how-d...ction-bam-file

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X