defines the API for specifying entities to register and update.
Committing a transaction is actually a two-part process. The metadata is
stored in the openBIS application server's database; the data is kept on
the file system in a sharded directory structure beneath the data store
server's *store* directory. All modifications requested as part of a
transaction are committed atomically — they either all succeed or all
fail.
Several [Events](./dss-dropboxes.md#events-registration-process-hooks) occur in the process of committing a transaction. By defining jython functions, it is possible to be notified
and intervene when an event occurs. Because the infrastructure reserves
the right to delay or retry actions if resources become unavailable, the
process function and event functions cannot use global variables to
communicate with each other. Instead, they should use the registration
context object to communicate. Anything stored in the registration
context must, however, be serializable by Java serialization.
Details
-------
### Dropbox Configuration
A jython dropbox is typically distributed as a [core
plugin](../core-plugins.md) and configured in its
plugin.properties file. A dropbox configured to run a jython script,
which is kept in the same directory as plugin.properties. The
configuration requires a storage processor and the name of the script (a
full path is not necessary if the script is in the same directory as the
plugin.properties). Here is an example configuration for a dropbox that
uses the jython handler.
**plugin.properties**
```
#
# REQUIRED PARAMETERS
#
# The directory to watch for new data sets
incoming-dir = ${root-dir}/incoming-jython
# The handler class. Must be either ch.systemsx.cisd.etlserver.registrator.api.v2.JythonTopLevelDataSetHandlerV2 or a subclass thereof
# Specify jython version. Default is whatever is specified in datastore server service.properties under property "jython-version"
plugin-jython-version=2.5
#
# OPTIONAL PARAMETERS
#
# False if incoming directory is assumed to exist.
# Default - true: Incoming directory will be created on start up if it doesn't exist.
incoming-dir-create = true
# Defines how the drop box decides if a folder is ready to process: either by a 'marker-file' or a time out which is called 'auto-detection'
# The time out is set globally in the service.properties and is called 'quiet-period'. This means when the number of seconds is over and no changes have
# been made to the incoming folder the drop will start to register. The marker file must have the following naming schema: '.MARKER_is_finished_<incoming_folder_name>'
# Defines whether the dropbox should handle .h5 archives as folders (true) or as files (false). Default is true.
h5-folders = true
# Defines whether the dropbox should handle .h5ar archives as folders (true) or as files (false). Default is true.
h5ar-folders = true
```
#### Development mode
Set property `development-mode = true` in your dropbox to enable a quick
feedback loop when developing your dropbox. By default dropboxes have
complex auto-recovery mechanism working, which on errors waits and
retries the registration several times. It can be useful in case of
short network problems or other unexpected turbulences. In this case it
can take a long time between the dropbox tries to register something,
and actual error report. During development it is essential to have a
quick feedback if your dropbox does what it should or not. Thus - set
the development mode if you are modifying your script and remember to
set it back when you are done.
#### Jython version
Set property `plugin-jython-version=2.7` in your dropbox
plugin.properties to change default jython version for the single
dropbox. Available are versions 2.5 and 2.7
Jython API
----------
When a new file is placed in the dropbox, the framework compiles and
executes the script, checks that the signatures of the `process`
function and any defined event-handling functions are correct, and then
invokes its `process` function.
### IDataSetRegistrationTransaction
Have a look
at [IDataSetRegistrationTransactionV2](https://openbis.ch/javadoc/20.10.x/javadoc-dropbox-api/ch/systemsx/cisd/etlserver/registrator/api/v2/IDataSetRegistrationTransactionV2.html)
for the calls available in a transaction. Note that you need to use the
file methods in the transaction, like e.g. `moveFile()`, rather than
manipulating the file system directly to get fully transactional
behavior.
#### TransDatabase queries
The query object returned
by `getDatabaseQuery(String dataSourceName)` allows to perform any query
and executing any statement on the given query database in the context
of a database transaction. Here are the methods available from the query
interface:
```java
publicinterfaceDynamicQuery{
/**
* Performs a SQL query. The returned List is connected to the database and
* updateable.
*
* @param query The SQL query template.
* @param parameters The parameters to fill into the SQL query template.
*
* @return The result set as List; each row is represented as one Map<String,Object>.
*/
List<Map<String,Object>>select(finalStringquery,
finalObject...parameters);
/**
* Performs a SQL query. The returned List is connected and
* updateable.
*
* @param type The Java type to return one rows in the returned
* result set.
* @param query The SQL query template.
* @param parameters The parameters to fill into the SQL query template.
*
* @return The result set as List; each row is represented as one Map<String,Object>.
The script can be informed of events that occur during the registration
process. To be informed of an event, define a function in the script
file with the name specified in the table. The script can do anything it
wants within an event function. Typical things to do in event functions
include sending emails or registering data in secondary databases. Some
of the event functions can be used to control the behavior of the
registration.
This table summarizes the supported events.
#### Events Table
|Function Name|Return Value|Description|
|--- |--- |--- |
|pre_metadata_registration(DataSetRegistrationContext context)|void|Called before the openBIS AS is informed of the metadata modifications. Throwing an exception in this method aborts the transaction.|
|post_metadata_registration(DataSetRegistrationContext context)|void|The metadata has been successfully stored in the openBIS AS. This can also be a place to register data in a secondary transaction, with the semantics that any errors are ignored.|
|rollback_pre_registration(DataSetRegistrationContext context, Exception exception)|void|Called if the metadata was not successfully storedin the openBIS AS.|
|post_storage(DataSetRegistrationContext context)|void|Called once the data has been placed in the appropriate sharded directory of the store. This can only happen if the metadata was successfully registered with the AS.|
|should_retry_processing(DataSetRegistrationContext context, Exception problem)|boolean|A problem occurred with the process function, should the operation be retried? A retry happens only if this method returns true.|
Note: the `rollback_pre_registration` function is intended to handle
cases when the dropbox code finished properly, but the registration of
data in openbis failed. These kinds of problems are impossible to handle
from inside of the `process` function. The exceptions raised during the
call to the `process` function should be handled by the function itself
by catching exceptions.
#### Typical Usage Table
|Function Name|Usage|
|--- |--- |
|pre_metadata_registration(DataSetRegistrationContext context)|This event can be used as a place to register information in a secondary database. If the transaction in the secondary database does not commit, false can be returned to prevent the data from entering openBIS.|
|post_metadata_registration(DataSetRegistrationContext context)|This event can be used as a place to register information in a secondary database. Errors encountered are ignored.|
|rollback_pre_registration(DataSetRegistrationContext context, Exception exception)|Undoing a commit to a secondary transaction. Sending an email to the admin that the data set could not be stored.|
|post_storage(DataSetRegistrationContext context)|Sending an email to tell the user that the data has been successfully registered. Notifying an external system that a data set has been registered.|
|should_retry_processing(DataSetRegistrationContext context, Exception problem)|Informing openBIS if it should retry processing a data set.|
Example Scripts
---------------
A simple script that registers the incoming file as a data set
OpenBIS has a complex mechanism to ensure that the data registration via
dropboxes is atomic. When error occurs during data registration, the
dropbox will try several times before it gives up on the process. The
retries can happen to the initial processing of the data, as well as to
the registration in application server. Even if these fail there is
still a chance to finish the registration. If the registration reaches
the certain level it stores the checkpoint on the disk. If at any point
the process fails, or the dss goes down it tries to recover from the
checkpoint.
There are two types of checkpoint files: State files and marker files.
There are stored in two different directories. The default location for
the state files is `datastore_sever/recovery-state`. This can be changed
by the property `dss-recovery-state-dir` in DSS `service.properties`.
The default location for the marker files was
`<store location>/<share id>/recovery-marker`. This may lead to problems
if this local is remote. Since version 20.10.6 the default location is
`datastore_sever/recovery-marker-dir`. This can be changed by the
property `dss-recovery-marker-dir` in DSS `service.properties`.
The `process` function will be retried if a
`should_retry_processing` function is defined in the dropbox script and
it returns true. There are two configuration settings that affect this
behavior. The setting `process-max-retry-count` limits the number of
times the process function can be retried. The number of times to retry
before giving up and the waiting periods are defined using properties
shown in the table below.
IMPORTANT NOTE: Please note, that the registration is considered as
failed only after, the whole retrying / recovery process will fail. It
means that it can take a long time before the .faulty\_paths file is
created, even when there is a simple dropbox error.
Therefor during development of a dropbox we recommend
using **[development mode](./dss-dropboxes.md#development-mode)** , wich
basically sets all retry values to 0, thus disabling the auto-recovery
feature.
|Key|Default Value|Meaning|
|--- |--- |--- |
|process-max-retry-count|6|The maximum number of times the process function can be retried.|
|process-retry-pause-in-sec|300|The amount of time to wait between retries of the process function.|
|metadata-registration-max-retry-count|6|The number of times registering metadata with the server can be retried.|
|metadata-registration-retry-pause-in-sec|300|The number of times registering metadata with the server can be retried.|
|recovery-max-retry-count|50|The number of times the recovery from checkpoint can be retries.|
|recovery-min-retry-period|60|The amount of time to wait between recovery from checkpoint retries.|

### Manual Recovery
The registration of data sets with Jython dropboxes has been designed to
be quite robust. Nonetheless, there are situations in which problems may
arise. This can especially be a problem during the development of
dropboxes. Here are the locations and semantics of several important
files and folders that can be useful for debugging a dropbox.
|File or Folder|Meaning|
|--- |--- |
|datastore_server/log-registrations|Keeps logs of registrations. See the registration log documentation for more information.|
|[store]/[share]/pre-staging|Contains hard-link copies of the original data. Dropbox process operate on these hardlink copies.|
|[store]/[share]/staging|The location used to prepare data sets for registration.|
|[store]/[share]/pre-commit|Where data from data sets are kept while register the metadata with the AS. Once metadata registration succeeds, files are moved from this folder into the final store directory.|
|[store]/[share]/recovery-marker (before version 20.10.6)
datastore_sever/recovery-marker-dir (since version 20.10.6)|Directories, one per dropbox, where marker files are kept that indicate that a recovery should happen on an incoming file if it is reprocessed. Deleting a marker file will force the incoming file to be processed as a new file, not a recovery.|
Classpath / Configuration
-------------------------
If you want other jython modules to be available to the code that
implements the drop box, you will need to modify the
datastore\_server.conf file and add something like
`lib-commonbase-<version>.jar` and `cisd-hotdeploy-13.01.0.jar`. The
first two are available in the distribution in the archives
`openBIS-API-commonbase-<version>.zip` and
`openBIS-API-dropbox-<version>.zip`, the third one is available in [the Ivy repo](https://sissource.ethz.ch/openbis/openbis-public/openbis-ivy/-/blob/main/cisd/cisd-hotdeploy/13.01.0/cisd-hotdeploy-13.01.0.jar).
Example path where the created `jar` should reside: