Introduction
This is an advanced reference document on the ncbo-cron utility.
Debug Cron Job
- The code is deployed to /srv/ontoportal/ncbo_cron.
- There is a log for the scheduler under logs/scheduler.log.
- Each ontology gets its own log under the ontology’s repo folder,
- /srv/ontoportal/data/repository/STY/1/STY_parsing.log.
- This file name is output to the scheduler.log file when parsing starts.
- Useful log file greps:
# get a list of new ontology submissions
grep 'new submission' /srv/ontoportal/ncbo_cron/logs/scheduler-pull.log
# get a list of errors
grep 'ERROR' /srv/ontoportal/ncbo_cron/logs/scheduler-pull.log
# get a list of errors with 20 lines of traceback for each
grep -A20 'ERROR' /srv/ontoportal/ncbo_cron/logs/scheduler-pull.log
Check processing queue
To show cron queued jobs: bundle exec bin/ncbo_cron --view-queue
Alternatively you can check parseQueue key directly in redis:
redis-cli hgetall parseQueue
1) "sub:http://data.bioontology.org/ontologies/STY/submissions/1"
2) "{\"process_rdf\":true,\"index_search\":true,\"index_properties\":true,\"run_metrics\":true,\"process_annotator\":true,\"diff\":true}"
Run ncbo-cron jobs
Default production job
The production scheduler start command is something like:
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "00 17 * * 4" -w "00 * * * *"
This does the following operations:
- Pull runs every 4 hours at 30 min.
- Flush of the graphs runs every friday at 5pm.
- warms up long running queries every hour at 00 min.
See below for additional ncbo-cron invocations.
By default ncbo_cron
will not process UMLS ontologies. To enable UMLS processing use --enable-umls
. This option can trigger heavy parsing so it should be used with care.
More information about running ncbo-cron
# See script to run the scheduler as a daemon in the bin folder:
/var/lib/ncbo-deployer/ncbo_cron/bin/ncbo_cron --help
#
# Only run it as the ncbo-deployer user.
# The option to stop the daemon is -k
# The startup invocation is more complicated; try search the ncbo-deployer user’s history for any
# command with 'ncbo_cron -d' in it (that’s the daemonize option).
#
#
./bin/ncbo_cron --help
(LD) >> Using rdf store ncboprod-4store1:8080
(LD) >> Using search server at http://ncboprod-solr1.stanford.edu:8080/solr/
(LD) >> Using HTTP Redis instance at ncboprod-redis2:6380
(LD) >> Using Goo Redis instance at ncboprod-redis1:6380
(LD) >> Enable SPARQL monitoring with cube ncbo-cube1:1180
Using cube options in Goo {:host=>"ncbo-cube1", :port=>1180}
(AN) >> Using ANN Redis instance at ncboprod-redis1:6379
ncbo_cron - This will run a scheduled job for NCBO-related processing
Usage: ncbo_cron [-p port] [-P file] [-d] [-k]
ncbo_cron --help
-p, --port PORT Specify port
(default: )
-d, --daemon Daemonize mode
--log FILE Logfile for output
-k, --kill [PORT] Kill specified running daemons - leave blank to kill all.
-u, --user USER User to run as
-G, --group GROUP Group to run as
-P, --pid FILE save PID in FILE when using -d option
(default: ./bin/ncbo_cron.pid)
-h, --redis-host HOST redis host (for shared locking)
(default: localhost)
-p, --redis-port PORT redis port (for shared locking)
(default: 6379)
-m, --minutes MIN minutes between process queue checks (override seconds)
-s, --seconds SEC seconds between process queue checks
-c, --pull-cron SCHED cron schedule for ontology pull
-l, --log-level LEVEL set the log level (debug, info, error)
(default: info)
--disable-processing disable ontology processing
--disable-pull disable ontology pull
--disable-flush disable flush archive class graphs
-v, --view-queue view queued jobs
-a, --add-submission ID submission id to add to the queue
--console REPL for working with scheduler
-f FSCHED, Delete class graphs of archive submissions
--flush-old-graphs
-?, --help Display this usage information.
Example ncbo-cron Invocations
cd /var/lib/ncbo-deployer/ncbo_cron
# Then run commands like the following:
bin/ncbo_cron -a "http://data.bioontology.org/ontologies/WB-BT/submissions/88"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -f "00 17 * * 4"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "00 * * * *"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "00 17 * * 4"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "00 17 * * 4" -m 1
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "01 * * * *"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "10 * * * *"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "21 * * * *"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "30 * * * *"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "39 * * * *"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "41 * * * *"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "43 * * * *"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "45 17 11 10 *"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug -f "55 * * * *"
bin/ncbo_cron -d -c "30 */4 * * *" -m 1 -h ncbo-stg-app-21 -l debug
bin/ncbo_cron -d -c "57 * * * *" -h "ncboprod-redis1" -l debug
bin/ncbo_cron -d --disable-processing --disable-pull -h "ncboprod-redis1" -l debug -f "38 21 * * 5"
bin/ncbo_cron -d --disable-pull -h "ncboprod-redis1" -l debug -f "41 7 * * 5"
bin/ncbo_cron -d --disable-pull -h "ncboprod-redis1" -l debug -f "44 19 * * 5"
bin/ncbo_cron -h
bin/ncbo_cron -h "ncboprod-redis1" --console
bin/ncbo_cron -h "ncboprod-redis1" --console
bin/ncbo_cron -k
bin/ncbo_cron -q
bin/ncbo_cron -q -h ncboprod-redis1
bin/ncbo_cron -q -h ncboprod-redis1 | grep process_rdf | wc -l
bin/ncbo_cron -q -h ncboprod-redis1 | wc -l
bin/ncbo_cron -v
bin/ncbo_cron -v -h ncboprod-redis1
bin/ncbo_cron -a "http://data.bioontology.org/ontologies/WB-BT/submissions/88"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -f "00 17 * * 4"
bin/ncbo_cron -d -c "30 */4 * * *" -h "ncboprod-redis1" -l debug
ncbo-cron scripts
These scripts should be considered experimental until this text is removed. Please use them at your own risk.
- Evaluate likely operational state of these scripts
Connecting to ncbo-cron
After connecting to your system with ssh, login as ncbo-deployer and go to the ncbo_cron project.
sudo su - ncbo-deployer
cd /srv/ncbo/ncbo_cron/
For easy access to above, start a screen
session after the ssh connection, then use screen -r
to reconnect to the same login session every time.
The ncbo-cron scripts
These scripts work on the most recent submission; not necessarily the latest ‘ready’ submisson.
Read-only scripts
- Script to run diagnostics on all ontologies (most recent submissions)
# Creates output files in logs/submission_status*
./bin/ncbo_ontology_diagnostics.sh
- Report format for ontologies
./bin/ncbo_ontology_format -h
- Generic ontology inspector
! This script was used by the diagnostics script; it was a work in progress, as of March 2014.
./bin/ncbo_ontology_inspector -h
- Create/Update a ticket ontology submissions that fail to parse
- Not clear whether this script is currently suitable..
- Log into ncbo_cron system (stage or prod) and run:
$ sudo su - ncbo-deployer
$ cd /srv/ncbo/ncbo_cron
$ bundle exec ./bin/ncbo_ontology_inspector -p submissionStatus -o {ONT_ACRONYM} --get_parsing_logs
Notes:
- the last option ‘–get_parsing_logs’ will retrieve the logs and post them to a JIRA issue automatically; the issue summary/title will be: “{ONT_ACRONYM} submission has ERROR_RDF”; the attachments and comments will indicate what submission fails to parse (and whether it is in stage or prod).
- the automated JIRA update will reopen a resolved or closed issue (it will not create a new issue every time an ontology has a submission that fails to parse).
- this option is not enabled by default in the daily diagnostics script; it could be, but if it were enabled it could duplicate data in JIRA every day until the ERROR_RDF status is fixed
- Find the relevant JIRA issue by typing ‘submission has ERROR_RDF’ into the ‘Quick Search’ field at top right, i.e.
https://bmir-jira.stanford.edu/issues/?jql=summary%20~%20%22submission%20has%20ERROR_RDF%22
The issue will contain comments and parsing logs (reported by the Jenkins user)
Read-Write scripts
- Process an ontology
./bin/ncbo_ontology_process -h
- Annotate an ontology
./bin/ncbo_ontology_annotate_generate_cache -h
- Calculate and save ontology index in SOLR (see option to index all ontologies)
./bin/ncbo_ontology_index -h
- Calculate and save ontology metrics
./bin/ncbo_ontology_metrics -h
- Generate a new annotation dictionary file
./bin/ncbo_ontology_annotate_generate_dictionary -h