Newer
Older
Dropboxes
=========
Jython Dropboxes
----------------
### Introduction
The jython dropbox feature makes it possible for a script written in the
Python language to control the data set registration process of the
openBIS Data Store Server. A script can modify the files in the dropbox
and register data sets, samples, and experiments as part of its
processing. The framework provides tools to track file operations and,
if necessary, revert them, ensuring that the incoming file or directory
is returned to its original state in the event of an error.
By deafult python 2.5 is used, but it's possible to use python version
2.7.
Dropboxes are dss core plugins: [Core Plugins](https://openbis.readthedocs.io/en/latest/software-developer-documentation/server-side-extensions/core-plugins.html)
### Simple Example
Here is an example that registers files that arrive in the drop box as
data sets. They are explicitly attached to the experiment "JYTHON" in
the project "TESTPROJ" and space "TESTGROUP".
**data-set-handler-basic.py**
```py
def process(transaction):
# Create a data set
dataSet = transaction.createNewDataSet()
# Reference the incoming file that was placed in the dropbox
incoming = transaction.getIncoming()
# Add the incoming file into the data set
transaction.moveFile(incoming.getAbsolutePath(), dataSet)
# Get an experiment for the data set
exp = transaction.getExperiment("/TESTGROUP/TESTPROJ/JYTHON")
# Set the owner of the data set -- the specified experiment
dataSet.setExperiment(exp)
```
This example is is unrealistically simple, but contains all the elements
necessary to implement a jython drop box. The main idea is to perform
several operations within the bounds of a transaction on the data and
metadata. The transaction is used to track the changes made so they can
be executed together or all reverted if a problem occurs.
### More Realistic Example
The above example demonstrates the concept, but it is unrealistically
simple. In general, we want to be able to determine and specify the
experiment/sample for a data set and explicitly set the data set type as
well.
In this example, we handle a usage scenario where there is one
experiment done every day. All data produced on a single day is
associated with the experiment for that date. If the experiment for a
given day does not exist, it is created.
**data-set-handler-experiment-reg.py**
```py
from datetime import datetime
def process(transaction):
# Try to get the experiment for today
now_str = datetime.today().strftime('%Y%m%d')
expid = "/TESTGROUP/TESTPROJ/" + now_str
exp = transaction.getExperiment(expid)
# Create an experiment if necessary
if None == exp:
exp = transaction.createNewExperiment(expid, "COMPOUND_HCS")
exp.setPropertyValue("DESCRIPTION", "An experiment created on " + datetime.today().strftime('%Y-%m-%d'))
exp.setPropertyValue("COMMENT", now_str)
dataSet = transaction.createNewDataSet()
incoming = transaction.getIncoming()
transaction.moveFile(incoming.getAbsolutePath(), dataSet)
dataSet.setDataSetType("HCS_IMAGE")
dataSet.setExperiment(exp)
```
More complex processing is also possible. In the following sections, we
explain how to configure a jython dropbox and describe the API in
greater detail.
### Model
The model underlying dropbox registration is the following: when a new
file or folder is found in the dropbox folder, the process function of
the script file is invoked with a [data set registration transaction](./dss-dropboxes.md#idatasetregistrationtransaction) as an argument.
The process function has the responsibility of looking at the incoming
file or folder and determining what needs to be registered or modified
in the metadata database and what data needs to be stored on the file
system. The
[IDataSetRegistrationTransaction](https://openbis.readthedocs.io/en/latest/software-developer-documentation/server-side-extensions/dss-dropboxes.html#idatasetregistrationtransaction) interface
defines the API for specifying entities to register and update.
Committing a transaction is actually a two-part process. The metadata is
stored in the openBIS application server's database; the data is kept on
the file system in a sharded directory structure beneath the data store
server's *store* directory. All modifications requested as part of a
transaction are committed atomically — they either all succeed or all
fail.
Several [Events](https://openbis.readthedocs.io/en/latest/software-developer-documentation/server-side-extensions/dss-dropboxes.html#events-registration-process-hooks) occur in the process of committing a transaction. By defining jython functions, it is possible to be notified
and intervene when an event occurs. Because the infrastructure reserves
the right to delay or retry actions if resources become unavailable, the
process function and event functions cannot use global variables to
communicate with each other. Instead, they should use the registration
context object to communicate. Anything stored in the registration
context must, however, be serializable by Java serialization.
Details
-------
### Dropbox Configuration
A jython dropbox is typically distributed as a [core
plugin](https://openbis.readthedocs.io/en/latest/software-developer-documentation/server-side-extensions/core-plugins.html) and configured in its
plugin.properties file. A dropbox configured to run a jython script,
which is kept in the same directory as plugin.properties. The
configuration requires a storage processor and the name of the script (a
full path is not necessary if the script is in the same directory as the
plugin.properties). Here is an example configuration for a dropbox that
uses the jython handler.
**plugin.properties**
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
```
#
# REQUIRED PARAMETERS
#
# The directory to watch for new data sets
incoming-dir = ${root-dir}/incoming-jython
# The handler class. Must be either ch.systemsx.cisd.etlserver.registrator.api.v2.JythonTopLevelDataSetHandlerV2 or a subclass thereof
top-level-data-set-handler = ch.systemsx.cisd.etlserver.registrator.api.v2.JythonTopLevelDataSetHandlerV2
# The script to execute, reloaded and recompiled each time a file/folder is placed in the dropbox
script-path = ${root-dir}/data-set-handler.py
# The appropriate storage processor
storage-processor = ch.systemsx.cisd.etlserver.DefaultStorageProcessor
# Specify jython version. Default is whatever is specified in datastore server service.properties under property "jython-version"
plugin-jython-version=2.5
#
# OPTIONAL PARAMETERS
#
# False if incoming directory is assumed to exist.
# Default - true: Incoming directory will be created on start up if it doesn't exist.
incoming-dir-create = true
# Defines how the drop box decides if a folder is ready to process: either by a 'marker-file' or a time out which is called 'auto-detection'
# The time out is set globally in the service.properties and is called 'quiet-period'. This means when the number of seconds is over and no changes have
# been made to the incoming folder the drop will start to register. The marker file must have the following naming schema: '.MARKER_is_finished_<incoming_folder_name>'
incoming-data-completeness-condition = marker-file
# Defines whether the dropbox should handle .h5 archives as folders (true) or as files (false). Default is true.
h5-folders = true
# Defines whether the dropbox should handle .h5ar archives as folders (true) or as files (false). Default is true.
h5ar-folders = true
```
#### Development mode
Set property `development-mode = true` in your dropbox to enable a quick
feedback loop when developing your dropbox. By default dropboxes have
complex auto-recovery mechanism working, which on errors waits and
retries the registration several times. It can be useful in case of
short network problems or other unexpected turbulences. In this case it
can take a long time between the dropbox tries to register something,
and actual error report. During development it is essential to have a
quick feedback if your dropbox does what it should or not. Thus - set
the development mode if you are modifying your script and remember to
set it back when you are done.
#### Jython version
Set property `plugin-jython-version=2.7` in your dropbox
plugin.properties to change default jython version for the single
dropbox. Available are versions 2.5 and 2.7
Jython API
----------
Loading
Loading full blame...