Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • michaellim
    replied
    Dear All,

    May I also ask, since my RNA seq libraries were about 260 bp in size according to Illumina's preparation protocol, for the FASTQ files which I've currently have, do I need to remove the Adapter (Index) sequences before mapping on the reference genome?

    Many thanks.

    Leave a comment:


  • michaellim
    replied
    Originally posted by piet View Post
    I use 'bwa mem' but my use case is processing of DNA sequencing data. It is very fast and reliable with default settings. Nevertheless, bwa and similar mappers should be suited also for bacterial RNA sequencing since bacteria do not splice their messanger RNA.

    In the beginning it took me quite a while to fiddle out how to write shell scripts to start bwa runs in a comfortable way and to handle the resulting sam files. You will definitely need to learn some kind of shell or script programming if you want to go that route.

    Why don't you do a DNA sequencing run of your particular isolate before you go into RNA sequencing?
    --
    piet

    Hi Piet,

    I see, I have close to none coding/programming knowledge, then maybe BWA is not suitable then. But I will check out the website for more info about it.

    I did consider DNA sequencing the genome of my sequence type strain, but the lab has limited funds.

    Thank you very much.

    Leave a comment:


  • piet
    replied
    Originally posted by michaellim View Post
    May I know what kind of alignment/mapping software do you use?
    I use 'bwa mem' but my use case is processing of DNA sequencing data. It is very fast and reliable with default settings. Nevertheless, bwa and similar mappers should be suited also for bacterial RNA sequencing since bacteria do not splice their messanger RNA.

    In the beginning it took me quite a while to fiddle out how to write shell scripts to start bwa runs in a comfortable way and to handle the resulting sam files. You will definitely need to learn some kind of shell or script programming if you want to go that route.

    Why don't you do a DNA sequencing run of your particular isolate before you go into RNA sequencing?
    --
    piet
    Last edited by piet; 12-19-2014, 02:57 PM.

    Leave a comment:


  • michaellim
    replied
    Originally posted by piet View Post
    Multi locus sequence typing (MLST) is a method frequently used to characterized bacterial genomes. MLST schematas have been published for most pathogenic bacteria. For the species Escherichia coli (including Shigella) there exist even three concurring schematas. With the schema maintained at Cork University sequence type 11 (ST11) refers to isolates typically found with cattle (serovar O157:H4), while strains belonging to ST131 are uropathogenic which means they are assoziated with infections of the urinary tract in humans. The chromosome of E.coli encodes more than 4000 proteins. Maybe half of them belongs to the accessory genome, which means they are only found in some strains or clonal groups.

    If you want to map your reads from RNA sequencing I would recommend to use a genome from the same or a very closely related sequence type. Otherwise you will miss several genes from the accessory genome. For E.coli ST131 there are several genomes available in Genbank, even fully finished ones (AP009378.1 and plasmid AP009379.1, CP002797.2). Sequences for ST131 isolates KTE173, KTE49, KTE162, KTE6, KTE211, KTE175, KTE178, KTE216, KTE148, KTE139 are available as WGS contigs.

    I would recommend to try several reference genomes. A mapping run usually takes only a few minutes on a desktop PC.
    --
    piet
    Hi Piet,

    Many thanks for the clarification. I will give it a try with different genomes then if it doesn't take too long. May I know what kind of alignment/mapping software do you use? Is there any particular reasons for that choice?

    Cheers.

    Leave a comment:


  • piet
    replied
    Originally posted by michaellim View Post
    For example, E.coli ST11 will be different from ST131. However, we aren't certain whether there is any genes which is specific to ST131 which cannot be found in other E. coli sequence types.

    So, if ST11 has a completed genome, but ST131 is in contigs, and my current RNA seq data is on ST131, should I use ST131 (multiple contigs) as the reference or the completed genome of ST11 which is not so related as the reference genome. That was my question. Hope that makes it clearer.
    Multi locus sequence typing (MLST) is a method frequently used to characterized bacterial genomes. MLST schematas have been published for most pathogenic bacteria. For the species Escherichia coli (including Shigella) there exist even three concurring schematas. With the schema maintained at Cork University sequence type 11 (ST11) refers to isolates typically found with cattle (serovar O157:H4), while strains belonging to ST131 are uropathogenic which means they are assoziated with infections of the urinary tract in humans. The chromosome of E.coli encodes more than 4000 proteins. Maybe half of them belongs to the accessory genome, which means they are only found in some strains or clonal groups.

    If you want to map your reads from RNA sequencing I would recommend to use a genome from the same or a very closely related sequence type. Otherwise you will miss several genes from the accessory genome. For E.coli ST131 there are several genomes available in Genbank, even fully finished ones (AP009378.1 and plasmid AP009379.1, CP002797.2). Sequences for ST131 isolates KTE173, KTE49, KTE162, KTE6, KTE211, KTE175, KTE178, KTE216, KTE148, KTE139 are available as WGS contigs.

    I would recommend to try several reference genomes. A mapping run usually takes only a few minutes on a desktop PC.
    --
    piet

    Leave a comment:


  • michaellim
    replied
    Originally posted by Brian Bushnell View Post
    All aligners are designed to handle references with multiple contigs; you don't need to combine anything (nor should you). You just need to index it.



    Well since you ask me, I will recommend BBMap, which also handles RNA-seq data, but is faster and more sensitive than Tophat. But bacteria generally lack introns - when they are present, they are very short and only in a handful of genes. So it's not strictly necessary to use a splice-aware aligner for bacterial RNA-seq, though I would still recommend it.
    Thanks Brian for the info.

    I will give it a go first and see what happens.

    Leave a comment:


  • michaellim
    replied
    Originally posted by Sergioo View Post
    What do you mean exactly by sequence type? Maybe those assigned from MLST typing?
    Hi Sergioo,

    Yes, MLST. For example, E.coli ST11 will be different from ST131. However, we aren't certain whether there is any genes which is specific to ST131 which cannot be found in other E. coli sequence types.

    So, if ST11 has a completed genome, but ST131 is in contigs, and my current RNA seq data is on ST131, should I use ST131 (multiple contigs) as the reference or the completed genome of ST11 which is not so related as the reference genome. That was my question. Hope that makes it clearer.

    Thank you.

    Leave a comment:


  • Sergioo
    replied
    Originally posted by michaellim View Post
    Dear everyone,

    A complete and annotated reference genome of a bacteria from a different sequence type.

    Which would be more appropriate? Would appreciate some advice.

    Thank you.
    What do you mean exactly by sequence type? Maybe those assigned from MLST typing?

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by michaellim View Post
    Hi Brian,

    So if I were to use the multiple contigs for my reference when aligning my RNAseq data, may I ask how should I do this? Do I need to first combine all the contigs (how can I do this?)?
    All aligners are designed to handle references with multiple contigs; you don't need to combine anything (nor should you). You just need to index it.

    And during alignment, which is the best to be used for bacterial RNAseq? Tophat or BWA or Bowtie? I heard Tophat is used a lot in eukaryotic RNAseq as it looks for splice-junctions.

    Thank you very much.
    Well since you ask me, I will recommend BBMap, which also handles RNA-seq data, but is faster and more sensitive than Tophat. But bacteria generally lack introns - when they are present, they are very short and only in a handful of genes. So it's not strictly necessary to use a splice-aware aligner for bacterial RNA-seq, though I would still recommend it.

    Leave a comment:


  • michaellim
    replied
    Originally posted by GenoMax View Post
    If the overall organization of the genomes is similar then whole genome comparison can be informative. Mauve is designed for doing these kinds of comparisons, which can help locate genome level rearrangements. Comparing multiple Ecoli strains would be appropriate as in this example from Yersinia: http://asap.genetics.wisc.edu/softwa...creenshots.php
    Hi GenoMax,

    Thanks for the info. Could you please advise how do I compare the "Published completed genome" with the other "published genome which is in contigs", do I need to merge the contigs first before using Mauve (may I ask how can I do that?)?

    Many thanks.

    Leave a comment:


  • michaellim
    replied
    Originally posted by Brian Bushnell View Post
    It's difficult to get single-contig assemblies (unless you use PacBio data). Multiple contigs typically mean that the coverage was too low in places to assemble correctly, or there were long repeats that confused the assembler. When we assemble a microbe from Illumina data, we might get 50 contigs or more. Probably 99%+ of the genome is there, but typically the order and orientation of the contigs is not know. There are not necessarily gaps but there may be.

    As for "ST", I've just never heard that terminology before; people I work with normally refer to those as "strains". And yes, I think it's still best to use the genome that is most closely related to your organism unless the assembly is really bad (hundreds of small contigs).

    Edit - also, as GenoMax pointed out, plasmids will cause correct multi-contig assemblies.
    Hi Brian,

    So if I were to use the multiple contigs for my reference when aligning my RNAseq data, may I ask how should I do this? Do I need to first combine all the contigs (how can I do this?)?

    And during alignment, which is the best to be used for bacterial RNAseq? Tophat or BWA or Bowtie? I heard Tophat is used a lot in eukaryotic RNAseq as it looks for splice-junctions.

    Thank you very much.

    Leave a comment:


  • GenoMax
    replied
    Originally posted by michaellim View Post
    Hi Brian,

    For example with E. coli, although this is ONE species, but there are various version of it, i.e. sequence type (ST), for example the human adapted E. coli which causes problematic infections around the world is ST131. Between the different sequence types, there might be mutations/genes specific to each of them.
    If the overall organization of the genomes is similar then whole genome comparison can be informative. Mauve is designed for doing these kinds of comparisons, which can help locate genome level rearrangements. Comparing multiple Ecoli strains would be appropriate as in this example from Yersinia: http://asap.genetics.wisc.edu/softwa...creenshots.php

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by michaellim View Post
    Hi Brian,

    For example with E. coli, although this is ONE species, but there are various version of it, i.e. sequence type (ST), for example the human adapted E. coli which causes problematic infections around the world is ST131. Between the different sequence types, there might be mutations/genes specific to each of them.

    I'm totally new to sequencing. When they are in several contigs, does it mean that there are gaps between the sequences, hence the authors deposited the sequences in contigs rather than a circular 4Mb chromosome?

    Many thanks for the advice.
    It's difficult to get single-contig assemblies (unless you use PacBio data). Multiple contigs typically mean that the coverage was too low in places to assemble correctly, or there were long repeats that confused the assembler. When we assemble a microbe from Illumina data, we might get 50 contigs or more. Probably 99%+ of the genome is there, but typically the order and orientation of the contigs is not know. There are not necessarily gaps but there may be.

    As for "ST", I've just never heard that terminology before; people I work with normally refer to those as "strains". And yes, I think it's still best to use the genome that is most closely related to your organism unless the assembly is really bad (hundreds of small contigs).

    Edit - also, as GenoMax pointed out, plasmids will cause correct multi-contig assemblies.

    Leave a comment:


  • GenoMax
    replied
    Originally posted by michaellim View Post

    I'm totally new to sequencing. When they are in several contigs, does it mean that there are gaps between the sequences, hence the authors deposited the sequences in contigs rather than a circular 4Mb chromosome?

    Many thanks for the advice.
    That is a likely explanation. If submitters are not completely sure that the contigs go together (there could be multiple plasmids in some bacteria and the separate pieces may be real) they would be left in that state.

    Leave a comment:


  • michaellim
    replied
    Originally posted by AntonioRFranco View Post
    You can do a whole genome comparison with some programs such as Mauve or Act. There are tutorials around explaining how to use them
    Hi Antonio,

    Do you mean compare the two options first? What if there's a difference between the two genomes? What do you suggest I do then?

    Many thanks.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    An Introduction to the Technologies Transforming Precision Medicine
    by seqadmin


    In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
    01-27-2025, 07:46 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 09:07 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-31-2025, 08:31 AM
0 responses
23 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-24-2025, 07:35 AM
0 responses
78 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-23-2025, 09:43 AM
0 responses
46 views
0 likes
Last Post seqadmin  
Working...
X