phylogenetic analysis and identification of dioxane degrader

76
Phylogenetic Analysis and Identification of 1, 4-Dioxane Degrading Genes Keith Sanders May 9 th 2016 Dr. Iyer and Brian Iken Abstract 1, 4-dioxane is a substance that was used as a solvent for other organic compounds. Exposure to this compound can have numerous deleterious effects on a living organisms and is suspected as a carcinogen. Originally, this substance was known as just an occupational hazard. Unfortunately, 1, 4-dioxane has also been found to contaminate ground water. After a brief analysis of the compound, bioremediation became a key possibility in the degradation of the substance in contaminated areas. In order to accomplish this, the bacterium Pseudonocardia Dioxanivorans was discovered. Pseudonocardia Dioxanivorans has the ability to degrade 1, 4-dioxane thanks to its multicomponent monooxygenase, which contained specific genes working together with the monooxygenase. It is my hypothesis that organisms with a multicomponent monooxygenase system phylogenetically similar Pseudonocardia Dioxanivorans will also be effective 1, 4-dioxane degraders. After the particular genes of interest were discovered, programs like BLAST were used to discover similar sequences within many biotechnology databases. Next Clustal Omega was used to create multiple sequence alignments, as well as output data which would provide further phylogenetic data from the sequences gathered. The results of the project showed that while many organism contained components of the monooxygenase or

Upload: keith-sanders

Post on 16-Jan-2017

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phylogenetic Analysis and Identification Of Dioxane Degrader

Phylogenetic Analysis and Identification of 1, 4-Dioxane Degrading Genes

Keith Sanders May 9th 2016 Dr. Iyer and Brian Iken

Abstract

1, 4-dioxane is a substance that was used as a solvent for other organic compounds.

Exposure to this compound can have numerous deleterious effects on a living organisms and is

suspected as a carcinogen. Originally, this substance was known as just an occupational hazard.

Unfortunately, 1, 4-dioxane has also been found to contaminate ground water. After a brief

analysis of the compound, bioremediation became a key possibility in the degradation of the

substance in contaminated areas. In order to accomplish this, the bacterium Pseudonocardia

Dioxanivorans was discovered. Pseudonocardia Dioxanivorans has the ability to degrade 1, 4-

dioxane thanks to its multicomponent monooxygenase, which contained specific genes working

together with the monooxygenase. It is my hypothesis that organisms with a multicomponent

monooxygenase system phylogenetically similar Pseudonocardia Dioxanivorans will also be

effective 1, 4-dioxane degraders. After the particular genes of interest were discovered,

programs like BLAST were used to discover similar sequences within many biotechnology

databases. Next Clustal Omega was used to create multiple sequence alignments, as well as

output data which would provide further phylogenetic data from the sequences gathered. The

results of the project showed that while many organism contained components of the

monooxygenase or notable biomarkers, three notable organisms provided sufficient evidence

of being true 1,4 dioxane degraders. These organisms are Rhodococcus sp. YYL, Pseudonocardia

tetrahydrofuranoxydans, and Pseudonocardia sp. ENV478.

Introduction

1, 4 dioxane was used as a solvent for numerous organic and inorganic compounds. This

compound is a clear colorless liquid with an odor similar to ether. 1, 4 dioxane is also soluble in

water. This compound is also known to be highly flammable in both its liquid and vapor state. 1,

4 dioxane is hazardous for humans. Short-term exposure such as inhalation of this chemical can

cause minor ailments such as dizziness and headaches, or major aliments such as irritation of

Page 2: Phylogenetic Analysis and Identification Of Dioxane Degrader

the throat, lungs, eyes. 1, 4 dioxane can also be absorbed through the skin causing mild to

severe skin irritation. Chronic exposure to 1, 4 dioxane can be extremely detrimental, and even

lethal. Studies have shown that long term exposure to 1, 4 dioxane can damage the kidney and

the liver. Multiple studies using exposing rats to 1, 4 dioxane in both their drinking water and

vapor resulting in a number of rats suffering damage to the organs in their endocrine system

(Kasai, T., Kano, H., Umeda, Y., Sasaki, T., Ikawa, N., Nishizawa, T., . . . Fukushima, S. (2009). These

rats also developed cancerous cells. These studies lead 1, 4 dioxane to be classified as a

probable human carcinogen. Typically, humans only come in contact with this substance as a

part of occupational hazards. However, 1, 4 dioxane has been detected as a contaminant in

both surface and ground water. 1, 4-dioxane is a very dangerous chemical and unfortunately it

is also problematic to get rid of. The purpose of this study is to find bio-degraders, organisms

which can perform bioremediation by degrading one substance and converting it into a

different product. These bio-degraders are often favored for remediation problems because

they are easy to maintain and generally less harmful to the environment.

After performing a literary review on 1, 4 dioxane, I started to search for literature

about organisms. More specifically I was looking for genes which had the ability to degrade this

substance and the organisms they belong too. After reviewing articles I discovered that a key

gene of interest was a monooxygenase component MmoB/DmpM. This gene was mentioned in

the organism Pseudonocardia Dioxanivorans strain 1190. This monooxygenase was particularly

interesting because it did not require other organic substrates to degrade 1,4-dioxane(

Gedalanga, P. B., Pornwongthong, P., Mora, R., Chiang, S. D., Baldwin, B., Ogles, D., & Mahendra, S.

(2014) ) .Interesting moreover, the monooxygenase found in Pseudonocardia Dioxanivorans

contained a multi-component gene cluster which aided in its ability to degrade 1,4 -dioxane.

These components included things such as an alpha and beta subunits, a reductase. The other

genes that were in the complexes were evaluated and used to confirm or scrutinize the results,

however the monooxygenase MmoB/DmpM was the target gene. Another article led me to

examine biomarkers which showed promise in being 1, 4-dioxane degraders. This article

provided me the means to search for genes like phenol-2 monooxygenase and propane

monooxygenase which needed specific substrates to operate. This article also guided me into

Page 3: Phylogenetic Analysis and Identification Of Dioxane Degrader

looking into alcohol dehydrogenase genes. Since there was a lot of information on

Pseudonocardia Dioxanivorans, I used the genes from this organism as a comparative measure

against new information (Gedalanga, P. B., Pornwongthong, P., Mora, R., Chiang, S. D., Baldwin, B.,

Ogles, D., & Mahendra, S. (2014)).

Going into the research project, I wanted to make sure I had enough information to

evaluate the results of my search. Information about the degradation pathway was discovered

in order to get a better idea of target genes and organisms to further look into (Stevenson, E., &

Turnbull, M. (2013, April 17). This article also pointed me to different avenues which could be revisited

for additional experimentation.

In this project I will use principle bioinformatics techniques to approach and analyze

genes capable of degrading 1, 4-dioxane. Starting the project I already know of a select

organism which can perform 1, 4-dioxane degradation so there are a few only possible

outcomes. One is that Pseudonocardia Dioxanivorans is alone in its degradation ability while

the other being that large multitude of organisms which can perform this task. A compromise

between the two possible outcomes is that while the genes themselves are not exclusive to

Pseudonocardia Dioxanivorans, there is a system at work in this organism which makes it more

effective on a critical level than most organisms. It is my hypothesis that organisms with

monooxygenase systems phylogenetically similar Pseudonocardia Dioxanivorans will also be

effective 1, 4-dioxane degraders.

Materials

The materials used in this project were entirely composed of bioinformatics

practices using computational applications and databases. As a result there weren’t any

chemical reagents used. Instead, many different computer applications and databases were

used to conduct and explore the subject material. Although the specifics of the hardware are

not important, it is noteworthy to state that most of the work conducted on this project were

done at the College of Technology computer lab and the library computer lab at the University

of Houston.

Page 4: Phylogenetic Analysis and Identification Of Dioxane Degrader

The key materials of this project are the sequences used during this project. These

sequences come in FASTA form and are found on the National Center of Biotechnology

Information or NCBI. FASTA in terms of this project, is a text based format used to represent

both DNA, RNA, and Protein sequences. These sequences are placed in FASTA format because it

is nearly universal among many different types of Bioinformatics’ applications. Most of the

work done in this experiment will be conducted, translated, or produced from a FASTA format.

NBCI also plays a critical role in this process. NBCI is the central hub for many databases

used to produce information that the project builds off of. Other databases such as PDB will

play a role in the analysis of the protein created by the genes. NBCI contains the Pubmed

Database which was used during most of the literary review. It also contained ascension

numbers, which allowed sequences to be streamed and referenced across other databases.

Also ExPASy and EBI were used as a database for applications. ExPASy is a

Bioinformatics resource portal. This was used as a source for other bioinformatics applications

including GENIO/LOGO, T-Coffee, and the PHYLIPS tools. GENIO/LOGO was used to create the

consensus sequence logo. T-Coffee was the secondary tool used to create multiple sequence

alignments, and the PHYLIP tools are a set of programs ranging from DNA and protein

sequences, as well as phylogeny tree building programs. EBI is an acronym for The European

Bioinformatics Institution. This placed many of the programs used during the production of

Multiple Sequence Alignments or MSAs. The primary program found from EBI and used in the

project was Clustal Omega. This program was able to create MSAs and make outputs using both

visual and FASTA formats. Clustal Omega also had the ability to create phylogeny trees and tree

file output data, which could be used in other programs.

The final program used was Treeview. This program had the ability to read tree file or

phylip tree files outputs, and convert them into the visual images of the phylogeny tree. This

program also had different styles of phylogeny tree. Ultimately, this tool was used to create the

phylogeny trees seen in the results section.

Methods

Page 5: Phylogenetic Analysis and Identification Of Dioxane Degrader

Literary Review

This project has three main goal. The first one is to discover the identity of genes which

could degrade 1, 4 Dioxane. The second is to provide a MSA of the genetic sequence of the

gene in question. The last part is to conduct a phylogenetic tree of the genes with the

organisms that accompany them.

To accomplish my first task, I conducted a literary review. This simply means that I

searched my resources to find publications pertaining to the scope of my study. In this case, it

was the identity of a gene which could degrade the 1, 4 Dioxane substance. The identity of a

gene and organism was discovered using articles discovered on Pub med. Pubmed is one of the

many databases located on the NCBI website. Likewise, the other databases on NCBI like gene,

protein, nucleotide, and Genbank were utilized when gaining find and record new sequences.

During my initial searches the organism.

Articles eventually led me to the discovery of the organism Pseudonocardia

dioxanivorans. More importantly this lead me to my first gene of interest, monooxygenase

component MmoB/DmpM. I found information on the gene using the Gene database located

on NCBI. With this information I was able to gather key features of the gene. The most

important of these features were its family identifies and it’s FASTA. While still using the gene

database located on NCBI I found more genes related to the gene family MmoB/DmpM. The

search of these monooxygenase genes also lead to the discovery of many different

monooxygenases which unlike the monooxygenase component MmoB/DmpM, used different

organic substances to perform and degrade monooxygenase. The result of searches for this

substrate included monooxygenase that used propane, phenol, and toluene as substrates. MY

literary review led me to believe that some substrate dependent tested viable options while

others were not.

BLAST

After I discovered all the genes I could using the NBCI search I used the algorithm BLAST

to analyze the FASTA sequence and compare it with sequences used in other databases. BLAST

Page 6: Phylogenetic Analysis and Identification Of Dioxane Degrader

stands for Basic Local Alignment Search Tool. This tool lead me to a few genes that I missed

during my previous search just using the NBCI database.

This tool also had parameters which allowed me to control my searches. When I would

perform a search using the nucleotide sequence of a gene I would do so using Megablast

parameter. Also I would exclude models and uncultured/ environmental samples in my search

because I felt like it was important to the project that I obtain non hypothetical results.

Furthermore, this search was conducted using the nucleotide collection (nr/nt) database

because it contained the largest source of DNA sequence information. Whenever I used a

protein sequence in BLAST I would use the protein BLAST algorithm, with the DELTA-BLAST

parameter. Searches performed in this method were used under the UniProt/SwissProt

database. This was mostly done because I was more familiar with this databases. The DELTA-

BLAST parameter helped validity my results by excluding matches found with low similarities.

Like before I also excluded models and uncultured/ environmental samples in this search.

Multiple Sequence Alignments

After more genes were discovered and compiled it was time to perform MSAs. To do

this I used EBI’s Clustal Omega program. I kept the parameters at the default for all my

alignments. I broke my sequencing analysis down in a way which would allow me to look at a

type of gene on a separate basis before compiling everything together. The groups examined

were first was the monooxygenase component MmoB/DmpM genes. The genes to follow were

the broken down components of the gene cluster associated with the monooxygenase

component MmoB/DmpM. These genes include the alpha subunit, the beta subunit, and the

reductase. The next genes evaluated were the propane monooxygenase, the phenol2-

monooxygenase, and alcohol dehydrogenase. Once the MSAs were completed, the output files

were converted into identity matrix scores, FASTA MSAs, visual MSAs, and phylogeny trees files.

All four if these output files were created from Clustal Omega. If a particular gene showed a

score at 60% or below on the MSA identity matrix is was excluded from further processing.

Some exceptions include the score being higher on the DNA score but failing the protein

threshold, or vise versa. Another program named T-Coffee was also used to conduct the MSAs.

This program ran sequences using its default parameters. I thought that it was important to

Page 7: Phylogenetic Analysis and Identification Of Dioxane Degrader

provide a second opinion of MSAs. Although the algorithms used between T-coffee and Clustal

Omega might be slightly different, I was mostly looking for huge changes in MSA scores rather

than small ones. In the end I decided to stick with the Clustal Omega MSA

Phylogenetic Tree

After the MSAs were conducted it was time to move into the final phase of the analysis

work for the project. A phylogenic tree was conducted using the sequences discovered. Even

though both Clustal and T-coffee produce a phylogenic tree using the results I decided that I

wanted to use a different program for this. Using the output Phyllis file from the Clustal Omega

output I conducted phylogenic trees using the Tree view program. Trees were completed using

the default parameters of the Treeview programs. The style the phylogenetic tree is produced

in is known as a phylogram. I started using making tree files of gene groups individually like

before. This means the first group of trees contained just monooxygenase component

MmoB/DmpM genes, then the next contained monooxygenase genes with substrates. Finally a

phylogenetic tree using all the sequences I compiled was produced. The trees produced by

Treeview lost their distance information visually placed on the image. To compensate for that.

There is a scale to size distance at the bottom of the left hand corner.

Results

The results of the project yielded results for the genes Alcohol Dehydrogenase, Phenol 2-monooxygenase, Propane monooxygenase, multi-component monooxygenase MMoB/DmpB, as well as the other components of the multi-component monooxygenase complex. These additional components include an Alpha and Beta subunit, and a reductase. I thought it was also important to evaluate the whole all together multi-component unit.

When applicable, both the results of DNA and protein sequences are present. However, there were situations where either DNA or protein sequences could not be obtained.

Alcohol Dehydrogenase

Protein MSA Score:

This contains the score of each protein sequence used from the multiple sequence alignment.

Page 8: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 1

Protein Multiple Sequence Alignment (See Attachment):

Figure 2

Protein sequence consensus Logo (See Attachment):

Using multiple amino acid residue sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. The taller the residue, the more often it appears in that position.

Figure 3

Protein Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the amino acid residue sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 4

Phenol 2-monooxygenase

DNA MSA Score:

This contains the score of each DNA nucleotide sequence used from the multiple sequence alignment.

Figure 5

DNA Multiple Sequence Alignment (See Attachment):

Figure 6

DNA nucleotide sequence consensus logo (See Attachment):

Page 9: Phylogenetic Analysis and Identification Of Dioxane Degrader

Using multiple DNA sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. The taller the residue, the more often it appears in that position.

Figure 8

Protein MSA Score:

This contains the score of each protein sequence used from the multiple sequence alignment.

Figure 9

Protein sequence consensus Logo (See Attachment):

Using multiple amino acid residue sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. . The taller the residue, the more often it appears in that position.

Figure 10

Protein Multiple Sequence Alignment (See Attachment):

Figure 12

Propane Monooxygenase

DNA MSA Score:

This contains the score of each DNA nucleotide sequence used from the multiple sequence alignment.

Figure 13

Page 10: Phylogenetic Analysis and Identification Of Dioxane Degrader

DNA nucleotide sequence consensus logo (See Attachment):

Using multiple DNA sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. The taller the residue, the more often it appears in that position.

Figure 15

Protein MSA Score:

This contains the score of each protein sequence used from the multiple sequence alignment.

Figure 17

Protein Multiple Sequence Alignment (See Attachment):

Figure 18

Protein sequence consensus Logo (See Attachment):

Using multiple amino acid residue sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. . The taller the residue, the more often it appears in that position.

Figure 19

Protein Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the amino acid residue sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 20

Multi-component monooxygenase

DNA MSA Score:

Page 11: Phylogenetic Analysis and Identification Of Dioxane Degrader

This contains the score of each DNA nucleotide sequence used from the multiple sequence alignment.

Figure 21

DNA Multiple Sequence Alignment (See Attachment):

Figure 22

DNA nucleotide sequence consensus logo (See Attachment):

Using multiple DNA sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. The taller the residue, the more often it appears in that position.

Figure 23

DNA Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the DNA nucleotide sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 24

Protein MSA Score:

This contains the score of each protein sequence used from the multiple sequence alignment.

Page 12: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 25

Protein Multiple Sequence Alignment (See Attachment):

Figure 26

Protein sequence consensus Logo (See Attachment):

Using multiple amino acid residue sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. The taller the residue, the more often it appears in that position.

Figure 27

Protein Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the amino acid residue sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 28

Alpha Subunit

DNA MSA Score:

This contains the score of each DNA nucleotide sequence used from the multiple sequence alignment.

Page 13: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 29

DNA Multiple Sequence Alignment (See Attachment):

Figure 30

DNA nucleotide sequence consensus logo (See Attachment):

Using multiple DNA sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. The taller the residue, the more often it appears in that position.

Figure 31

DNA Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the DNA nucleotide sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 32

Protein MSA Score:

This contains the score of each protein sequence used from the multiple sequence alignment.

Figure 33

Page 14: Phylogenetic Analysis and Identification Of Dioxane Degrader

Protein Multiple Sequence Alignment (See Attachment):

Figure 34

Protein sequence consensus Logo (See Attachment):

Using multiple amino acid residue sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. . The taller the residue, the more often it appears in that position.

Figure 35

Protein Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the amino acid residue sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 36

Beta Subunit

DNA MSA Score:

This contains the score of each DNA nucleotide sequence used from the multiple sequence alignment.

Figure 37

DNA Multiple Sequence Alignment (See Attachment):

Figure 38

DNA nucleotide sequence consensus logo (See Attachment):

Using multiple DNA sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. The taller the residue, the more often it appears in that position.

Figure 39

Page 15: Phylogenetic Analysis and Identification Of Dioxane Degrader

DNA Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the DNA nucleotide sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 40

Protein MSA Score:

This contains the score of each protein sequence used from the multiple sequence alignment.

Figure 41

Protein Multiple Sequence Alignment (See Attachment):

Figure 42

Protein sequence consensus Logo (See Attachment):

Using multiple amino acid residue sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. . The taller the residue, the more often it appears in that position.

Figure 43

Protein Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the amino acid residue sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 44

Reductase

Page 16: Phylogenetic Analysis and Identification Of Dioxane Degrader

DNA MSA Score:

This contains the score of each DNA nucleotide sequence used from the multiple sequence alignment.

Figure 45

DNA Multiple Sequence Alignment (See Attachment):

Figure 46

DNA nucleotide sequence consensus logo (See Attachment):

Using multiple DNA sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. The taller the residue, the more often it appears in that position.

Figure 47

DNA Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the DNA nucleotide sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 48

Protein MSA Score:

This contains the score of each protein sequence used from the multiple sequence alignment.

Page 17: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 49

Protein Multiple Sequence Alignment (See Attachment):

Figure 50

Protein sequence consensus Logo (See Attachment):

Using multiple amino acid residue sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. . The taller the residue, the more often it appears in that position.

Figure 51

Protein Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the amino acid residue sequences. The phylogenic trees were created using the Average Distance % Identity.

Figure 52

Multi-component Gene complex

DNA MSA Score:

This contains the score of each DNA nucleotide sequence used from the multiple sequence alignment.

Figure 53

DNA nucleotide sequence consensus logo (See Attachment):

Page 18: Phylogenetic Analysis and Identification Of Dioxane Degrader

Using multiple DNA sequences, a consensus sequence is created. A Logo is used to visually represent the sequence where the height of the residue represents its appearance the given position. The taller the residue, the more often it appears in that position.

Figure 54

DNA Phylogeny Tree (See Attachment):

This is a phylogeny tree created from the DNA nucleotide sequences. It creates the Phylogenic trees using the Average Distance % Identity.

Figure 55

Master Phylogeny Tree (See Attachment):

This is the phylogeny tree created from all the sequences collected during the project. It is annotated with color to help for easier navigation.

The first tree contained the propane monooxygenase and the regular monooxygenase gene clusters along with the Phen 2-monoxygenase. This tree does not include the alcohol dehydrogenase.

Figure 56

The last figure shows the compete phylogeny tree with all the sequences used for the project.

Figure 57

Conclusion

At the start of my project, I was able to find literature that led me to believe Pseudonocardia

dioxanivorans was an organism with the ability to degrade 1, 4-dioxane. Furthermore the

identity of the gene that made this possible was discovered. Monooxygenase MmoB/DmpM

was the target gene which started much of the research. After more research was conducted, it

was discovered that the monooxygenase MmoB/DmpM worked within a gene complex. This

complex contained a reductase, an alpha, and beta subunit. The complex was then analyzed

with its individual components, as well as whole. Part of the reason for analyzing parts

Page 19: Phylogenetic Analysis and Identification Of Dioxane Degrader

individually was to find different sequences that may be lost in the overall gene cluster. When

this was performed only the monooxygenase for the alpha subunit yielded new results.

The initial analysis provided me with the organisms Pseudonocardia sp. K1,

Pseudonocardia sp. ENV478, and Rhodococcus sp. YYL. It is also important to note that while

using DNA sequences, the organism Pseudonocardia sp. K1 was displayed. However, when

using protein the organism changed its name to Pseudonocardia tetrahydrofuranoxydans.

However, Pseudonocardia tetrahydrofuranoxydans and Pseudonocardia sp. K1 are indeed the

same organism. Looking at the percent identity score, you can see that these organisms have

the strongest identity score with our target organism believe Pseudonocardia dioxanivorans.

The gene of interest was analyzed along with the individual components of the gene complex

and the complex as a whole. When it comes to the monooxygenase complex, in both its

individual and complete components, the percent identity score never drops below 90%. This is

a very strong indicator of functional similarity. The percent identify score of the propane

monooxygenase stayed mostly in the 70% range. This implied an identify score strong enough

to be relevant. Alcohol dehydrogenase percent identify score ranged from the mid-80s to high

60s. The range provided me with significant enough results to continue the project. The Phen-2

monooxygenase scores never got above 70% but was never below 60%. This coupled with the

suggestion to investigate from my literary review is what kept these gene in for further

evaluation.

The alcohol dehydrogenase, phen-2 monooxygenase, and propane monooxygenase,

were all evaluated as well using the same analytical biotechnology techniques. The propane

monooxygenase was the only gene discovered to have a gene cluster similar to the previous

monooxygenase gene cluster. This provoked me to exclude propane monooxygenase that were

not a part of the cluster because overall they had low percent identity or they only had

relations to one part of the cluster and no relationship to the gene cluster as a whole. The only

reason the monooxygenase alpha subunits were allowed to keep their singular similarity

matches was because the present identity score was still much too high to exclude. The overall

identity scores of the Propane monooxygenase, Phen-2 monooxygenase, and alcohol

Page 20: Phylogenetic Analysis and Identification Of Dioxane Degrader

dehydrogenase were high enough to be significant, but not as high as the monooxygenase

within the gene cluster previously spoken of.

The consensus logo is a way to visualize the results of the MSA and the percent identity

score. In DNA the gene cluster for the monooxygenase shared the strongest consistency, with it

having multiple matches at a 100% frequency. This was found in the individual components.

Odd enough, while the gene cluster still holds a percent identity score above 90% the

consensus sequences varies frequently among two different nucleotides. In proteins, the

consensus sequence varied. This variation was observed with at least two amino acids sharing a

50% frequency each.

The Phen-2 monooxygenase showed strong and mixed consensus among its DNA logo.

The protein logo showed mixed consensus with three amino acids usually fighting over

consensus. The alcohol dehydrogenase showed strong consensus among its protein sequence.

Many sections had a 100% frequency. The Propane monooxygenase was given a gene logo of

the entire cluster when it came to the nucleotide however, since creating a logo of the same

cluster was problematic on a protein, only the actual propane monooxygenase was given a

logo. The DNA logo of the cluster shows plenty of conflicting consensus and very little 100%

frequency. The protein logo showed much stronger consensus among its sequences with most

sequences in a 50% frequency. The alcohol dehydrogenase only has a protein logo created

because not all of the nucleotide sequences could be found. These sequences show strong

consensus with many 100% frequencies.

Multiple Sequence Alignments were also produced. In typical fashion a “*” represents a

completely conserved residue, ‘’:“indicates a conserved residue, and a “.”. A blank represents a

portion with no kind of conservative match. Represents a semi-conserved residue. MSAs were

conducted for all sequences, however it was problematic to exhibit the MSAs for the complete

gene cluster for monooxygenase and the propane monooxygenase due to the enormous size of

that data. Individual DNA and protein MSAs for each of the components of the monooxygenase

have been provided. The individual and the gene cluster show very strong fully conserved

regions. This matches up with their percent identity score. The propane monooxygenase

Page 21: Phylogenetic Analysis and Identification Of Dioxane Degrader

portion of its gene cluster has been provided. This shows a mixture of both fully conserved and

to a lesser extent, conserved regions. The phen-2 monooxygenase and the alcohol

dehydrogenase show similar results, with a mix of fully, conserved, and semi-conserved regions.

Mostly fully conserved regions.

Phylogeny trees were conducted for every type of gene, however only results from

genes containing more than three entries will be provided. This is because a tree with three or

less results give you little to no practical information, especially with the scope of this project.

When it came to monooxygenase, it is important to notice that our target organism

Pseudonocardia dioxanivorans and the genes associated with it usually closest related to

Rhodococcus sp. YYL. This can be observed in both individual genes and the gene cluster. The

alcohol dehydrogenase genes show a varied amount of diversity among themselves. An

important factor to notice is that a Pseudonocardia dioxanivorans organism was located. If you

look at the phylogeny tree you can see that this organism is more closely related to other

Pseudonocardia rather than the Rhodococcus. Comparing this to the monooxygenase tree may

suggest that while these two organism have strong similarities within this gene, there are still

many avenues were there are different. The final tree is shown with all the genes examined in

this project in based on their protein. Like before, you see Pseudonocardia dioxanivorans and

Rhodococcus sp. YYL being the closest related among each other. The exception to this is when

it comes to the alpha subunit. The alcohol dehydrogenase genes are isolated furthest away

from the rest of the genes. This could suggest that their role in 1, 4-dioxane degradation is

entirely different from the rest of the genes.

The results of this tree prompted me to make another tree without the alcohol

dehydrogenase. This tree uses the entire gene cluster of the primary and propane

monooxygenase. I did this because I thought that the results of the gene complex separated

was mostly redundant. These results showed that Pseudonocardia dioxanivorans and

Rhodococcus sp. YYL are still the closest in relation.

There are errors and limitations that have that occurred during the project. A major

limitation I faced was the amount of sequences available. I had to work with sequences that

Page 22: Phylogenetic Analysis and Identification Of Dioxane Degrader

available from the databases and my ability to find those sequences. This means that I could

have missed a sequence, or that there could possibly be more organisms whose genetic

sequences are not available but also can become 1, 4-dioxane degraders. Another source of

error could be my human error, better explained as my critic on what counts as valuable

information. I wanted to include sequences that I thought were relevant but I fear that I might

have excluded some sequences based on my own exclusion criteria.

The information gained in this project has many useful applications. The first being this

increases the number of organism which are suspected to be 1, 4-dioxane degraders. While

more experiments are needed to evaluate their effectiveness, phylogenetic analysis does

provide evidence to support further study. Having multiple organisms which can perform this

task makes using them for bioremediation purposes more feasible. Furthermore, upon

researching 1, 4-dioxane degradation, I discovered articles about fungi which could perform this

task. This means that more information about different organisms who have the ability to

perform this task may still be out there (Kinne, M., Poraj-Kobielska, M., Ralph, S. A., Ullrich, R.,

Hofrichter, M., & Hammel, K. E. (2009).). This information can also b e evaluated with the

results of this project to examine the different or similar processes both organisms provide to

degrade 1, 4-dioxane.

In summation, propane monooxygenase and phen-2 monooxygenase have the ability to

degrade 1, 4-dioxane, but only when their particular substrates are available. This makes them

less optimal than the other monooxygenase examined in this project. Alcohol Dehydrogenase is

the least related to any of the genes, which would suggest that its role in dioxane degradation is

not as direct as the other genes. The key piece of information obtained from the results showed

that three organisms Pseudonocardia sp. K1, Pseudonocardia sp. ENV478, and Rhodococcus sp.

YYL have genes most closely related to the gene of interest. The gene complex which contains

the monooxygenase, as well as the alpha subunit, beta subunit, reductase, and the

monooxygenase is what gives these organisms more affinity for the degradation task. This

supports the idea that these organisms are true 1, 4-dioxane degradation. Furthermore, the

genes associated with Rhodococcus sp. YYL are more closely related to Pseudonocardia

Dioxanivorans.

Page 23: Phylogenetic Analysis and Identification Of Dioxane Degrader

References

1.Sales, C. M., Mahendra, S., Grostern, A., Parales, R. E., Goodwin, L. A., Woyke, T., . . . Alvarez-Cohen, L. (2011). Genome Sequence of the 1,4-Dioxane-Degrading Pseudonocardia dioxanivoransStrain CB1190. Journal of Bacteriology, 193(17), 4549-4550. doi:10.1128/jb.00415-11

2. Gedalanga, P. B., Pornwongthong, P., Mora, R., Chiang, S. D., Baldwin, B., Ogles, D., & Mahendra, S. (2014). Identification of Biomarker Genes To Predict Biodegradation of 1,4-Dioxane. Applied and Environmental Microbiology, 80(10), 3209-3218. doi:10.1128/aem.04162-13

3. 1,4-Dioxane (1,4-Diethyleneoxide). (n.d.). Retrieved April 30, 2016, from https://www3.epa.gov/airtoxics/hlthef/dioxane.html

4. Kasai, T., Kano, H., Umeda, Y., Sasaki, T., Ikawa, N., Nishizawa, T., . . . Fukushima, S. (2009). Two-year inhalation study of carcinogenicity and chronic toxicity of 1,4-dioxane in male rats. Inhalation Toxicology, 21(11), 889-897. doi:10.1080/08958370802629610

5. Stevenson, E., & Turnbull, M. (2013, April 17). 1,4-Dioxane Pathway Map. Retrieved May 09, 2016, from http://eawag-bbd.ethz.ch/diox/diox_map.html

6. Kinne, M., Poraj-Kobielska, M., Ralph, S. A., Ullrich, R., Hofrichter, M., & Hammel, K. E. (2009). Oxidative Cleavage of Diverse Ethers by an Extracellular Fungal Peroxygenase. Journal of Biological Chemistry, 284(43), 29343-29349.

doi:10.1074/jbc.m109.040857

Attachments

Alcohol Dehydrogenase

Protein Multiple Sequence Alignment (See Attachment)

Page 24: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 2

Page 25: Phylogenetic Analysis and Identification Of Dioxane Degrader

Protein sequence consensus Logo (See Attachment):

Figure 3

Page 26: Phylogenetic Analysis and Identification Of Dioxane Degrader

Protein Phylogeny Tree (See Attachment):

Figure 4

Phenol 2-monooxygenaseDNA Multiple Sequence Alignment (See Attachment):

Page 27: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 6

Page 28: Phylogenetic Analysis and Identification Of Dioxane Degrader

DNA nucleotide sequence consensus logo (See Attachment):

Figure 7

Page 29: Phylogenetic Analysis and Identification Of Dioxane Degrader

Protein Multiple Sequence Alignment (See Attachment):

Page 30: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 10

Protein sequence consensus Logo (See Attachment):

Page 31: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 11

Page 32: Phylogenetic Analysis and Identification Of Dioxane Degrader

Propane Monooxygenase

Figure 15

Page 33: Phylogenetic Analysis and Identification Of Dioxane Degrader

Protein Multiple Sequence Alignment (See Attachment):

Figure 18

Protein sequence consensus Logo (See Attachment):

Page 34: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 19

Protein Phylogeny Tree (See Attachment):

Page 35: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 20

Multi-component monooxygenaseDNA Multiple Sequence Alignment (See Attachment):

Figure 22

DNA nucleotide sequence consensus logo (See Attachment):

Page 36: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 23

DNA Phylogeny Tree (See Attachment):

Page 37: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 24

Protein Multiple Sequence Alignment (See Attachment):

Figure 26

Protein sequence consensus Logo (See Attachment):

Page 38: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 27

Protein Phylogeny Tree (See Attachment):

Page 39: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 28

Alpha SubunitDNA Multiple Sequence Alignment (See Attachment):

Page 40: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 30

DNA nucleotide sequence consensus logo (See Attachment):

Page 41: Phylogenetic Analysis and Identification Of Dioxane Degrader
Page 42: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 31

DNA Phylogeny Tree (See Attachment):

Figure 32

Protein Multiple Sequence Alignment (See Attachment):

Page 43: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 34

Protein sequence consensus Logo (See Attachment):

Page 44: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 35

Page 45: Phylogenetic Analysis and Identification Of Dioxane Degrader

Protein Phylogeny Tree (See Attachment):

Figure 36

Beta Subunit

DNA Multiple Sequence Alignment (See Attachment):

Page 46: Phylogenetic Analysis and Identification Of Dioxane Degrader
Page 47: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 38

DNA nucleotide sequence consensus logo (See Attachment):

Page 48: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 39

DNA Phylogeny Tree (See Attachment):

Page 49: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 40

Protein Multiple Sequence Alignment (See Attachment):

Page 50: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 42

Protein sequence consensus Logo (See Attachment):

Page 51: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 43

Page 52: Phylogenetic Analysis and Identification Of Dioxane Degrader

Protein Phylogeny Tree (See Attachment):

Figure 44

Reductase

DNA Multiple Sequence Alignment (See Attachment):

Page 53: Phylogenetic Analysis and Identification Of Dioxane Degrader
Page 54: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 46

DNA nucleotide sequence consensus logo (See Attachment):

Page 55: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 47

DNA Phylogeny Tree (See Attachment):

Page 56: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 48

Protein Multiple Sequence Alignment (See Attachment):

Page 57: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 50

Protein sequence consensus Logo (See Attachment):

Page 58: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 51

Page 59: Phylogenetic Analysis and Identification Of Dioxane Degrader

Protein Phylogeny Tree (See Attachment):

Figure 52

Multi-component Gene complex

DNA nucleotide sequence consensus logo (See Attachment):

Page 60: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 54

Page 61: Phylogenetic Analysis and Identification Of Dioxane Degrader

DNA Phylogeny Tree (See Attachment):

Figure 55

Page 62: Phylogenetic Analysis and Identification Of Dioxane Degrader

Master Phylogenic Tree

Figure 56

Page 63: Phylogenetic Analysis and Identification Of Dioxane Degrader

Figure 56

Master Sequence Collection

(Protein)

Propane Monooxygenase

>gi|323461805|dbj|BAJ76721.1| phenol and propane monooxygenase coupling protein [Mycobacterium goodii]

>gi|38678096|dbj|BAD03959.1| propane monooxygenase coupling protein [Gordonia sp. TY-5]

>gi|115511403|dbj|BAF34311.1| propane monooxygenase coupling protein [Pseudonocardia sp. TY-7]

Alcohol Dehydrogenase

>gi|503437687|ref|WP_013672348.1| alcohol dehydrogenase [Pseudonocardia dioxanivorans]

>gi|739178298|ref|WP_037042195.1| alcohol dehydrogenase [Pseudonocardia autotrophica]

>gi|502712806|ref|WP_012947904.1| alcohol dehydrogenase [Geodermatophilus obscurus]

>gi|655587148|ref|WP_028934354.1| alcohol dehydrogenase [Pseudonocardia spinosispora]

>gi|655567920|ref|WP_028921610.1| alcohol dehydrogenase [Pseudonocardia acaciae]

Page 64: Phylogenetic Analysis and Identification Of Dioxane Degrader

>gi|657222806|ref|WP_029336517.1| alcohol dehydrogenase [Geodermatophilaceae bacterium URHB0048]

>gi|655577420|ref|WP_028928735.1| alcohol dehydrogenase [Pseudonocardia asaccharolytica]

>gi|664302843|ref|WP_030832244.1| alcohol dehydrogenase [Streptomyces hygroscopicus]

>gi|739374752|ref|WP_037235711.1| alcohol dehydrogenase [Rhodococcus wratislaviensis]

>gi|1005622338|ref|WP_061698619.1| alcohol dehydrogenase [Rhodococcus sp. LB1]

>gi|522114292|ref|WP_020625501.1| alcohol dehydrogenase [Pseudonocardia sp. P2]

>gi|983563300|ref|WP_060713791.1| alcohol dehydrogenase [Pseudonocardia sp. HH130629-09]

>gi|517141860|ref|WP_018330678.1| alcohol dehydrogenase [Actinomycetospora chiangmaiensis]

>gi|652459636|ref|WP_026854440.1| alcohol dehydrogenase [Geodermatophilaceae bacterium URHB0062]

Alpha-Subunit

>gi|10443292|emb|CAC10506.1| alpha-subunit of multicomponent tetrahydrofuran monooxygenase [Pseudonocardia tetrahydrofuranoxydans]

>gi|338794148|gb|AEI99544.1| tetrahydrofuran monooxygenase oxygenase component alpha subunit [Pseudonocardia sp. ENV478]

>gi|193888337|gb|ACF28534.1| multicomponent tetrahydrofuran-degrading monooxygenase alhpa-subnit [Rhodococcus sp. YYL]

>gi|975830114|dbj|BAU36821.1| soluble di-iron monooxygenase alpha subunit, partial [Rhodococcus ruber]

>gi|975830092|dbj|BAU36810.1| soluble di-iron monooxygenase alpha subunit, partial [Pseudonocardia dioxanivorans]

>gi|975830110|dbj|BAU36819.1| soluble di-iron monooxygenase alpha subunit, partial [Pseudonocardia sp. D17]

Beta-Subunit

>gi|315936315|gb|ADU55885.1| ThmB [Rhodococcus sp. YYL]

>gi|10443295|emb|CAC10509.1| beta-subunit of multicomponent tetrahydrofuran monooxygenase [Pseudonocardia tetrahydrofuranoxydans]

>gi|338794150|gb|AEI99546.1| tetrahydrofuran monooxygenase oxygenase component beta subunit [Pseudonocardia sp. ENV478]

>gi|326955346|gb|AEA29039.1| methane/phenol/toluene hydroxylase (plasmid) [Pseudonocardia dioxanivorans CB1190]

Page 65: Phylogenetic Analysis and Identification Of Dioxane Degrader

Monooxygenase component

>gi|503969935|ref|WP_014203929.1| monooxygenase component MmoB/DmpM [Pseudonocardia dioxanivorans]

>gi|10443296|emb|CAC10510.1| regulatory protein of multicomponent tetrahydrofuran monooxygenase [Pseudonocardia tetrahydrofuranoxydans]

>gi|315936316|gb|ADU55886.1| ThmC [Rhodococcus sp. YYL]

>gi|338794151|gb|AEI99547.1| tetrahydrofuran monooxygenase coupling protein [Pseudonocardia sp. ENV478]

Phen 2- monooxygenase

>gi|326949330|gb|AEA23027.1| Phenol 2-monooxygenase [Pseudonocardia dioxanivorans CB1190]

>gi|93354574|gb|ABF08663.1| Phenol hydroxylase P3 protein (Phenol 2-monooxygenase P3 component) [Cupriavidus metallidurans CH34]

>gi|187728549|gb|ACD29713.1| methane/phenol/toluene hydroxylase [Ralstonia pickettii 12J]

Reductase

>gi|10443294|emb|CAC10508.1| reductase component of multicomponent terahydrofuran monooxygenase [Pseudonocardia tetrahydrofuranoxydans]

>gi|338794149|gb|AEI99545.1| tetrahydrofuran monooxygenase reductase component [Pseudonocardia sp. ENV478]

>gi|193888338|gb|ACF28535.1| multicomponent terahydrofuran-degrading monooxygenase reductase component [Rhodococcus sp. YYL]

>gi|375129130|ref|YP_004991225.1| Ferredoxin--NAD(+) reductase (plasmid) [Pseudonocardia dioxanivorans CB1190]

(DNA)

Gene Cluster

>gi|10443289|emb|AJ296087.1| Pseudonocardia sp. K1 ORF y, thmS gene, thma gene, ORF x, thmD gene, thmB gene, thm C gene, ORF Q, ORF Z and thm H gene

>gi|338794146|gb|HQ699618.1| Pseudonocardia sp. ENV478 tetrahydrofuran degradation gene cluster, complete sequence

>gi|315936312|gb|EU732588.2| Rhodococcus sp. YYL tetrahydrofuran-degrading gene cluster, partial sequence

>gb|CP002597.1|:28891-38076 Pseudonocardia dioxanivorans CB1190 plasmid pPSED02 genomic sequence

Alpha Subunit

Page 66: Phylogenetic Analysis and Identification Of Dioxane Degrader

>gi|10443289:2945-3326 Pseudonocardia sp. K1 ORF y, thmS gene, thma gene, ORF x, thmD gene, thmB gene, thm C gene, ORF Q, ORF Z and thm H gene

>gb|HQ699618.1|:3382-3763 Pseudonocardia sp. ENV478 tetrahydrofuran degradation gene cluster, complete sequence

>gb|EU732588.2|:2916-3297 Rhodococcus sp. YYL tetrahydrofuran-degrading gene cluster, partial sequence

>gi|975830109|dbj|LC114144.1| Pseudonocardia sp. D17 SDIMO gene for soluble di-iron monooxygenase alpha subunit, partial cds

>gi|975830113|dbj|LC114146.1| Rhodococcus ruber SDIMO gene for soluble di-iron monooxygenase alpha subunit, partial cds, strain: T5

>gb|CP002597.1|:31833-32214 Pseudonocardia dioxanivorans CB1190 plasmid pPSED02 genomic sequence

Beta Subunit

>gi|10443289:5108-6148 Pseudonocardia sp. K1 ORF y, thmS gene, thma gene, ORF x, thmD gene, thmB gene, thm C gene, ORF Q, ORF Z and thm H gene

>gi|315936312>gb|HQ699618.1|:5547-6590 Pseudonocardia sp. ENV478 tetrahydrofuran degradation gene cluster, complete sequence

>gb|EU732588.2|:5083-6123 Rhodococcus sp. YYL tetrahydrofuran-degrading gene cluster, partial sequence

>gb|CP002597.1|:34003-35043 Pseudonocardia dioxanivorans CB1190 plasmid pPSED02 genomic sequence

Monooxygenase Component

>gb|CP002597.1|:35043-35396 Pseudonocardia dioxanivorans CB1190 plasmid pPSED02 genomic sequence

>gb|EU732588.2|:6123-6476 Rhodococcus sp. YYL tetrahydrofuran-degrading gene cluster, partial sequence

>gb|HQ699618.1|:6587-6940 Pseudonocardia sp. ENV478 tetrahydrofuran degradation gene cluster, complete sequence

>gi|10443289:6148-6501 Pseudonocardia sp. K1 ORF y, thmS gene, thma gene, ORF x, thmD gene, thmB gene, thm C gene, ORF Q, ORF Z and thm H gene

Reductase

>gi|10443289:3995-5077 Pseudonocardia sp. K1 ORF y, thmS gene, thma gene, ORF x, thmD gene, thmB gene, thm C gene, ORF Q, ORF Z and thm H gene

Page 67: Phylogenetic Analysis and Identification Of Dioxane Degrader

>gb|HQ699618.1|:4434-5516 Pseudonocardia sp. ENV478 tetrahydrofuran degradation gene cluster, complete sequence

>gb|EU732588.2|:3964-5052 Rhodococcus sp. YYL tetrahydrofuran-degrading gene cluster, partial sequence

>gi|375129105:32884-33972 Pseudonocardia dioxanivorans CB1190 plasmid pPSED02, complete sequence

Phen 2-Monooxygenase

>gb|CP002593.1|:820783-822312 Pseudonocardia dioxanivorans CB1190, complete genome

>gb|CP000352.1|:1935574-1937121 Cupriavidus metallidurans CH34, complete genome

>gb|CP001069.1|:954525-956069 Ralstonia pickettii 12J chromosome 2, complete sequence

Propane Monooxygenase

>gi|115511399|dbj|AB250942.1| Pseudonocardia sp. TY-7 prm2A, prm2B, prm2C, prm2D genes for propane monooxygenase

>gi|38678092|dbj|AB112920.1| Gordonia sp. TY-5 prmA, prmB, prmC, prmD, orf1, orf2, adh1, orf3 genes for propane monooxygenase

>gi|323461801|dbj|AB568291.1| Mycobacterium goodii mimA, mimB, mimC, mimD genes, complete cds, strain: 12523