Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Virus integration sites

    Hi all,
    I have made a RNA-seq experiment with the following features:

    - Samples from Mus musculus
    - ~20 millons of single-end reads, 75 pb length (good quality)
    - Tecnology Illumina 1.9

    Now I need to find the virus integration sites that have been inserted into
    the genome of our mouse model. The genomic virus sequence is known. I have tried many procedures to locate that site with no success. Those procedures were:

    - Generate the data base (consensus sequence) with Samtools tool, using
    the reads mapped with lax parametres (BWA tool). Then I tried to find
    the virus sequence in it (Blast tool).
    - The Novo Assembly (Trinity tool). Then I tried to find the virus sequence in
    it (Blast tool).
    - Tophat-fusion and FusionMap, including the known virus sequence as a fake chromosome in order to search for fusion events in the whole genome.

    Any other ideas or hints to find out viral integration sites in data types as described above will be really welcome.

    Thanks a lot in advance for your replies.

  • #2
    Have you tried aligning your reads to the viral genome?

    Comment


    • #3
      Originally posted by molecules View Post
      Have you tried aligning your reads to the viral genome?
      Thanks for replying.

      Yes I have, and with these reads I have done Blast against the genome to find possible areas in these reads (mismatches areas) that align with the genome, but the results aren't who I hoped.

      which is your idea to do this?

      Comment


      • #4
        Sorry, I should have first mentioned that you probably should be using short read aligners (e.g. bowtie), instead of BLAST. They are optimized for this type of work.

        After installing bowtie and creating indices for the mouse and virus genomes, you would run something like the following:

        bowtie --al virus_aligned.fq --un not_aligned_to_virus.fq virus_index mouse_reads.fq

        bowtie --al mouse_aligned.fq --un not_aligned_to_mouse.fq mouse_index mouse_reads.fq

        (see the bowtie manual http://bowtie-bio.sourceforge.net/manual.shtml)

        This will create four files:

        virus_aligned.fq containing reads aligned to the virus genome
        mouse_aligned.fq containing reads aligned to the mouse genome
        not_aligned_to_virus.fq containing reads that did not align to the virus genome
        not_aligned_to_mouse.fq containing reads that did not align to the mouse genome

        Comment


        • #5
          I have done something like that with BWA and I tried to get what I needed, but it was not possible.

          In your suggestion, ¿How would you relate these files to get the viral integration site?

          Comment


          • #6
            Have you tried mapping integration sites as in:

            The question of where retroviral DNA becomes integrated in chromosomes is important for understanding (i) the mechanisms of viral growth, (ii) devising new anti-retroviral therapy, (iii) understanding how genomes evolve, and (iv) developing safer methods for gene therapy. With the completion of geno …


            Basically, you fragment the DNA, end repair, add adapter, and then PCR with a virus-specific primer and adapter-specific primer. Bushman, et al have been using this technique to successfully map retrovirus insertion sites with 454.

            From your description, it sounds to me like you are doing de novo assembly of your mouse genome, and then looking for virus sequence in that. Is that correct? If so, I don't think that will work because I wouldn't expect your virus sequences to show up in the consensus genome, as they will be in a minority of reads for any given genomic region.

            Comment


            • #7
              You're right about one of my several attempts, I used Trinity tool to do the Novo Assembly but, like you say, I is not good solution.

              I'm trying to found some bioinformatic method to solve this problem. If you have some suggestion, I would very appreciate you.

              Comment


              • #8
                Do you know if there is a fusion transcript between the virus and the host ? If not I don't know if it's possible from RNA-seq data .. the best is to do DNA-seq.

                Comment


                • #9
                  I'm also curious - you have a mouse model with some viruses inserted into the genome - and you would like to get all the integration sites?

                  Now using RNA-seq data you would like to recover the sites that fall in rna-coding regions (I guess)?

                  I think you can map with tophat2 and softclip the reads - then search all the softclipped parts for the virus sequence - but it seems very far from the few integration site analyses I have done using NGS (using some viral primer + random genomic primer to generate fragments for NGS).

                  Comment


                  • #10
                    Another method:

                    You know what the virus sequence is at the virus/genome junction. Search your fastq for all the reads that have, say, the last 20 bases of virus. Then cut the virus off of all those reads, and align what's left to the genome.

                    Comment


                    • #11
                      Originally posted by pallevillesen View Post
                      I'm also curious - you have a mouse model with some viruses inserted into the genome - and you would like to get all the integration sites?

                      Now using RNA-seq data you would like to recover the sites that fall in rna-coding regions (I guess)?

                      I think you can map with tophat2 and softclip the reads - then search all the softclipped parts for the virus sequence - but it seems very far from the few integration site analyses I have done using NGS (using some viral primer + random genomic primer to generate fragments for NGS).
                      Your question is correct. I have tried your suggesting but the reads (with soft-clipp) have maximum 4S, and with this length there are many matches in the virus sequence, so that I can't use this method.

                      Thank for your help

                      Comment


                      • #12
                        Originally posted by swbarnes2 View Post
                        Another method:

                        You know what the virus sequence is at the virus/genome junction. Search your fastq for all the reads that have, say, the last 20 bases of virus. Then cut the virus off of all those reads, and align what's left to the genome.
                        In the beginning of my analysis I tried this method, but I didn't find the results that I hoped. I take as reference a particular position on the chromosome 2 where the integration site could be (according to the experimental method).

                        Comment


                        • #13
                          Like I said maybe there is no fusion transcript between the virus and the host, so you will never see any reads aligning on the virus and the host.

                          Comment


                          • #14
                            Originally posted by NicoBxl View Post
                            Like I said maybe there is no fusion transcript between the virus and the host, so you will never see any reads aligning on the virus and the host.
                            The virus transcribed because I found it when I did the alignment of the reads with the virus sequence, and when I did the Novo Assembly. Is this what you mention?

                            Comment


                            • #15
                              If you want to know where the virus is integrated you have to know if there are fusion transcript between the virus and the host. If not, you'll only see the viral transcript and the host transcript in your data.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X