Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variant Calling from paired end RNAseq data

    Hello everybody, since everybody is getting on the variant calling from RNAseq bandwagon, my P.I wants to get in on the party as well. Too bad for overworked ppl like me So I have this huge dataset of illumina paired end rnaseq which I am trying to run through GATK. I am taking the accepted_hits.bam file from tophat/cufflinks as the input file for GATK. My reference for the tophat/cufflinks pipeline was UCSC annotated genes (knownGenes.gtf) and UCSC hg19 reference.

    My pipeline for gatk is:
    convert bam>sam, sort sam> insert read groups> fixmates using picard> sam to bam> remove duplicates > reindex bam > realign indels

    When i Run the indel realignment for GATK, I get the following error: contig chr 1 missing from reference. So i go ahead and look it up, and use the reorder sam option in picards tools. I modify my pipeline as follows:

    convert bam>sam, sort sam> insert read groups> fixmates using picard> createdictionary.jar for hg 19 reference using picard > reorder sam >
    sam to bam> remove duplicates > reindex bam > realign indels

    My error is not solved even after reordering the sam file and i still get the same error of "chr1 contig not found in your reference"

    Is it something to do with the references I have been using? I have used hg19 reference both for my tophat/cufflinks as well as the GATK pipeline.

    Thanks a ton in advance!

  • #2
    How are you making your .sam? I'm pretty sure that bwa sampe will add read group in for with the -r option.

    And I think Picard would add them to a .bam. I'm pretty sure you do NOT have to expand your .bam to a .sam in order to do that.

    And you double-cheked to make sure that the name of Chr 1 is exactly the same between your .bam and your reference genome?

    Comment


    • #3
      Hey thanks for a quickie reply Like i said, I am not redoing any alignments here. I have already run the tophat cufflinks pipeline on my data to assemble it into known transcripts using UCSC genes and hg 19 as a reference. SO this way I already have access to an "accepted_hits.bam" as an output from the tophat runs. I am using this accepted_hits.bam as my alignment file for GATK and converting this to a sam file using Picard tols. I am skimming down on my analysis time by not redoing the alignments.

      to your query about whether I checked the reference and my sam file for the chr1, yea i did. And this is what is troubling me. I am using the same reference for both my rnaseq and variant calling. So logically there shouldnt be any discrepancies. I was wondering if anybody has faced the same issues when using gatk for variant calling in RNAseq? I am pretty new to this so I might me messing up somewhere in the pipeline..

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X