Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem: "Pair-end" reads scRNA seq data (Drop-seq)

    In case of Drop-seq, we have paired end data.

    Read 1: Cell code + UMI (unique molecule identifier)

    Read 2: The transcript information

    But I have a problem/doubt with the sample I am working on.

    The sample I am using is the following:


    (Check the "Reads" tab)

    As you know the Drop-seq is "paired-end", we are expected to see two reads for a spot. Although this sample say paired-end, it has only one read per spot.

    For example I can share a link of a different scRNA-seq data where you can properly see two reads for a spot

    Example sample:

    https://trace.ncbi.nlm.nih.gov/Trace...run=SRR8086553 (Check the "Reads" tab)

    Where I am going wrong?


    I asked one of the main authors of the paper. The following is the reply I got :

    "I recommend that you download the aligned BAM files that are hosted in the same GEO record. Read 1 is already processed into the cell and UMI barcodes and held as custom tags (XC and XM) in the BAM files. The cells are already barcode-corrected, so if you use those files, your cell barcodes will line up with mine; if you start from FASTQs, they will not. For most aligners, you can just use the BAM file as input to realign. (It has all reads, even those that did not align.)"

    But I could not find any "XM" or "XC" keywords in the bam file

    To understand his reply you have to be familiar the processing steps of the Drop-seq: Link: https://github.com/broadinstitute/Dr...1.2Jan2016.pdf

    Looks like they have submitted some kind of processed data. I could not figure out how much the data is processed. I am trying to use the data starting at different processing steps but I could not figure out how much the data is processed.

  • #2
    This type of processing (parsing the barcode and UMI) is standard for scRNA-Seq data. Would you post the header and first 10 lines of the BAM file? That would help us to troubleshoot your problem.
    Last edited by HESmith; 11-13-2018, 05:21 AM.

    Comment


    • #3
      Header part of the sam file

      The first 10 lines:
      @HD VN:1.4 SO:coordinate
      @SQ SN:1 LN:58871917 M5:4ec834d5c957b0204ffb37ac619ac286 UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:10 LN:45574255 M5:07b063dca6221fc12a0c7af99a693a0a UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:11 LN:45107271 M5:34028488116d0ce140a9651e56b3361f UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:12 LN:49229541 M5:b93ef3975b8f9e3e291bf14fa725a87b UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:13 LN:51780250 M5:16c8bde090ec09d34d473ee462e266f8 UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:14 LN:51944548 M5:d3684e66d05aeeddfef5a365ed1d44ff UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:15 LN:47771147 M5:20a0e5e9ea8953e48ce8e93117a406a4 UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:16 LN:55381981 M5:85d5826023b6bde850fea5e42b0d22b5 UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:17 LN:53345113 M5:128b86b035cfaa4c62018d7bc2978024 UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta


      Somewhere in between: (It gives a clue that the reads have been already aligned to the reference genome using Bowtie2 aligner)

      @SQ SN:KN150247.1 LN:728 M5:35c483ee725789db7c67a0acd4ec7cb7 UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:KN150525.1 LN:650 M5:66c855181f1c9a7e22d6061d7c43964b UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:mCherry LN:1198 M5:05f1786feb0995593cbbb0f2e0822bfb UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @SQ SN:GFP LN:1699 M5:6d8fabfb60ba8de2f53ced3d571125a3 UR:file:/ahg/regevdata/projects/vertebrate_sc/dstools/metadata/dr82/dr82spike.fasta
      @RG ID:A SM:ZF6S-DS5b_S3
      @RG ID:A-2F1E5D7C SM:ZF6S-DS5b_S3
      @PG ID:bowtie2 PN:bowtie2 VN:2.2.1 CL:"/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie2_2.2.1/bowtie2-align-s --wrapper basic-0 --phred33 --reorder -p 8 -x /ahg/regevdata/proje
      cts/vertebrate_sc/dstools/metadata/dr82/dr82 -S /broad/hptmp/jfarrell_dropseq/ZF6S-DS5b_S3//ZF6S-DS5b_S3.5.aligned.sam -U /broad/hptmp/jfarrell_dropseq/ZF6S-DS5b_S3//ZF6S-DS5b_S3.4.aligner.fastq"
      @PG ID:bowtie2-7C029FB6 PN:bowtie2 VN:2.2.1 CL:"/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie2_2.2.1/bowtie2-align-s --wrapper basic-0 --phred33 --reorder -p 8 -x /ahg/regevda
      ta/projects/vertebrate_sc/dstools/metadata/dr82/dr82 -S /broad/hptmp/jfarrell_dropseq/ZF6S-DS5b_S3//ZF6S-DS5b_S3.5.aligned.sam -U /broad/hptmp/jfarrell_dropseq/ZF6S-DS5b_S3//ZF6S-DS5b_S3.4.aligner.fastq"
      1 16 1 85 11 35M1D27M * 0 0 ACAACATACGACCTCTAAAAAAGGTGCTGTAACATTACCTATATGCAGCACCACTATATGAG E/EE/EEEAEE/EA/<EEEEEEA/<</EE/</AAEEEEEEEEE/E6E6/E///EEEEAAA
      AA RG:Z:A NH:i:1 NM:i:1
      2 16 1 85 15 35M1D27M * 0 0 ACAACATACGACCTCTAAAAAAGGTGCTGTAACATTACCTATATGCAGCACCACTATATGAG A/EE/EE/EEEE/E/EA66E/AA6AA</EEEEEE/EE//E/AA/EEEEAEE/E/EEE//A
      AA RG:Z:A NH:i:1 NM:i:1
      3 16 1 99 6 62M * 0 0 CTAAAAAAGGTGCTGTAACATGTACCTATATGCAGCACCACTATATGAGAGCAGCATAGCAG EEE6A//E<</EA/E/EAEEAEAEAEEEE/<EEEEEEEAEE/EAEEEEEEEEEEEEAAA/AA RG:Z
      :A NH:i:1 NM:i:1
      4 16 1 99 6 62M * 0 0 CTAAAAAAGGTGCTGTAACATGTACCTATATGCAGCACCACTATATGAGAGCAGCATAGCAG 6/EEEE/EEEA/AAEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEAAAAA RG:Z
      :A NH:i:1 NM:i:1
      5 16 1 99 6 62M * 0 0 CTAAAAAAGGTGCTGTAACATGTACCTATATGCAGCACCACTATATGAGAGCAGCATAGCAG <A/EEEEA/EEEEEE/EEAEEEEE/AEEEEEEEEEEEAEAEEEAEEEEEEEEEEEEEAAAAA RG:Z
      :A NH:i:1 NM:i:1
      6 0 1 108 1 55M * 0 0 GTGCTGTAACATGTACCTATATGCAGCACCACTATATGAGAGCGGCATAGCAGTG AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/AEEAEAEEAEE RG:Z:A-2F1E5D7C NH:i:1 NM:i:0
      7 0 1 110 1 62M * 0 0 GCTGTAACATGTACCTATATGCAGCACCACTATATGAGAGCGGCATAGCAGTGTTTAGTCAC AAAAAEEEEE/EEEEEEEAEEEEEEEEEEEEEEEEEEE/EE/EEEEEAEAAEEEAEEAEEA< RG:Z:A NH:i:1 NM:i:0
      8 16 1 185 35 62M * 0 0 TTATATTAACTTGAAAGTGTGTTTTAGCTATTGAGTTTAAACAAAGGGAGCGGTTTACATTG AEEEEEEEAAEAEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA RG:Z:A NH:i:1 NM:i:0

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      47 views
      0 likes
      Last Post seqadmin  
      Working...
      X