The National Institutes of Health National Library of Medicine integrates and distributes the Unified Medical Language System (UMLS). UMLS is a set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems. Unfortunately, UMLS is not distributed using a semantic specification like RDF or OWL.
The OntoPortal Virtual Appliance supports OBO and OWL ontology formats, but does not support UMLS in its native form. To bridge this gap, we have developed a project called UMLS2RDF that transforms UMLS ontologies into OWL/RDF.
There is no automated way to convert UMLS content to RDF. You must have your own UMLS MySQL installation and an OSX/Linux/Unix machine with 8GB+ of RAM in order for the conversion process to work.
The scripts to convert UMLS to RDF are available on Github (see below). Once you have converted UMLS to RDF, you will get Turtle (.ttl) files that can be uploaded using the BioPortal Web UI. Please select UMLS as the format for these ontologies.
2 Install UMLS2RDF 3 Configure UMLS2RDF 4 Run UMLS2RDF 5 Upload files to the NCBO Virtual Appliance 6 Hardware Considerations 7 Example Workflow 7.1 Install UMLS using mmsys 7.2 Load subset in MySQL 7.3 Generate RDF from MySQL
To import UMLS ontologies, a local installation of the UMLS MySQL release needs to be available. Please refer to the UMLS documentation for instructions on how to install the UMLS MySQL distribution.
umls2rdf is a Python script that connects to a UMLS MySQL installation and extracts the UMLS ontologies in a format that the Appliance can work with.
First clone the github project:
git clone https://github.com/ncbo/umls2rdf/
Install the MySQL Python driver. We recommend to use pip for this:
pip install MySQL-python
umls2rdf has two configuration files:
conf.pywhere the database configuration (host,name,user and password) needs to be specified. Also the output folder.
umls.confwhere one can specified the UMLS ontologies to be extracted. This is a comma separated file with the following 4 fields:
! Only 3 fields specified above?
In our configuration file, you can see the settings used by our production system. These are all the UMLS ontologies that are publicly available in BioPortal.
Once the configuration files have the settings run the command:
Depending on how many ontologies are extracted the run time can range from a few minutes to four hours. This process is memory intensive and to transform the largest UMLS ontologies (in particular, SNOMED) one needs at least 16G RAM available.
The output files will be located in the folder specified in
To submit the extracted ontologies,
use the OntoPortal Web form available in your appliance,
or other upload approaches described in
IMPORTANT: The specified ontology format in the submission process should be
The BioPortal system dedicates powerful servers to handle many of the UMLS ontologies; some of the ontologies contain millions of classes. To import the largest UMLS ontologies (e.g., RXNORM or SNOMEDCT) users will have to run the Appliance in a powerful dedicated environment with 8GB RAM and 5GB hard disk space available.
This workflow for importing UMLS data has been provided by Vincent Emonet. It is provided here without testing by the OntoPortal team. Descriptive comments throughout this section are provided by Vincent.
This offers more details on how to generate UMLS turtle files. It is simplified to provide the essentials needed to generate the RDF files you want. Note that I am using linux and the 2015AB release for this tutorial.
>create database umls2015ab;`
>ALTER DATABASE umls2015ab CHARACTER SET utf8 COLLATE utf8_unicode_ci;
MYSQL_HOME=/usr user=<username> password=<password> db_name=umls2015ab
#Folder to dump the RDF files. OUTPUT_FOLDER = "output" #DB Config DB_HOST = "localhost" DB_NAME = "umls2015ab" DB_USER = "root" DB_PASS = "<password>" UMLS_VERSION = "2015ab" UMLS_BASE_URI = "http://purl.bioontology.org/ontology/"