Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Abhishek
    Junior Member
    • Feb 2011
    • 3

    Looking for Known Gene Mutation in ELAND export file

    Hello All,

    I am working on NGS data analysis and I am relatively new in the field as compared to others present here.

    I have a question regarding ELAND export file (the file on further processing gives me RPKM values for each gene in the cell, for which RNA seq was performed), I want to know wheather a gene X in the cell for which RNA seq was performed has a mutation or not (this mutation is known, say D314K, where D and K are amino acids).
    Please give a detailed answer with steps. I am very confused as to how I can get this information from my RNA seq data.

    Thank you,
    Last edited by Abhishek; 03-24-2012, 01:33 AM.
  • severin
    Genome Informatics Facility
    • Sep 2009
    • 105

    #2
    What does an Eland export file look like? Can you cut and paste a sample?

    Comment

    • Abhishek
      Junior Member
      • Feb 2011
      • 3

      #3
      This is the format of file:

      The number of fields per line remains a constant 22, any not relevant to a particular read are left blank (the empty string ""). In particular, for a single-read analysis the Read Number, Paired Read Alignment Score and Partner Chromosome/Contig/Offset/Strand fields will all be blank

      1. Machine (as parsed from run folder name.
      2.Run Number (as parsed from run folder name).
      3. Lane.
      4. Tile.
      5. X Coordinate of cluster.
      6. Y Coordinate of cluster.
      7. Index String (blank for a non-indexed run).
      8. Read Number ("1" or "2" for paired read, blank for a single read).
      9. Read.
      10. Quality String - in symbolic ASCII format (ASCII chracter code = quality value + 64) by default, set QUALITY_FORMAT --numeric in the GERALD config file to get numeric values instead.
      11. Match Chromosome - name of chromosome match was to OR code indicating why no match was done.
      12. Match Contig (blank if no match found) - gives contig name if there is a match and the match chromosome is split into contigs.
      13. Match Position (always with respect to forward strand, numbering starts at 1).
      14. Match Strand ("F" for forward or "R" for reverse, blank if no match).
      15. Match Descriptor - concise description of alignment. A numeral denotes a run of matching bases, a letter denotes substituation of a nucleotide, so e.g. for a 35 base read, "35" denotes an exact match and "32C2" denotes substitution of a "C" at the 33rd position.
      16. Single Read Alignment Score - alignment score of single read match (if a paired read, gives alignment score of read if it were to be treated as a single read).
      17. Paired Read Alignment Score - alignment score of read pair (alignment score of a paired read and its partner, taken as a pair. Blank for a single read run).
      18. Partner Chromosome - not blank only if read is paired and its partner aligns to another chromosome, in which case it gives the name of the chromosome.
      19. Partner Contig - not blank only if read is paired and its partner aligns to another chromosome and that partner is split into contigs.
      20. Partner Offset - if a paired read's partner hits to the same chromosome (as it will in the vast majority of cases) and contig (if the chromosome is split into contigs) then this number added to Match Position gives the alignment position of the read's partner.
      21. Partner Strand - which strand did the paired read's partner hit to("F" for forward or "R" for reverse, blank if no match).
      22. Filtering. Did the read pass quality filtering? "Y" for yes, "N" for no.

      Comment

      • chadn737
        Senior Member
        • Jan 2009
        • 392

        #4
        There are probably more direct ways, but I would use the samtools export2sam.pl to convert to a Sam file. You can then use samtools mpileup or other variant callers like GATK to find mutations.

        Comment

        • Abhishek
          Junior Member
          • Feb 2011
          • 3

          #5
          Hi !

          Below I have listed set of steps I am following and the error I am getting please look into this and suggest a remedy:

          Step 1. export2sam.pl --read1=RNA_export.txt | perl -wpe 's/(chr.*)\.fa/$1/' > aln.sam

          Step 2. since the al.sam does not have the header @SQ, I did following to get the header and alignment.

          samtools faidx ref.fa (where ref.fa is my reference file)
          samtools view -bt ref.fa.fai aln.sam > aln.bam

          As soon as I enter the second command I get the error
          [sam_read1] reference 'chr21_random.fa' is recognized as '*'.

          Kindly help.

          Thank you

          Comment

          • severin
            Genome Informatics Facility
            • Sep 2009
            • 105

            #6
            solutions

            There are several solutions you can try.

            One has already been posted above.

            You can also convert your output to a bam file and load into Integrated Genome Viewer and then look at the region you are interested in.

            You can also use awk to filter the reads that overlap with the location of your mutation and look manually to see if the position you have your mutation has the expected nucleotide change.

            awk '$12="contigname" && ($13>startpositionbeforedeletion && $13<startpositionbeforedeletion+lengthofreads)' NameofElandFile | more

            Good luck!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Pathogen Surveillance with Advanced Genomic Tools
              by seqadmin




              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
              03-24-2025, 11:48 AM
            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            49 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            57 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            50 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            200 views
            0 reactions
            Last Post seqadmin  
            Working...