Skip to content
Snippets Groups Projects
maintenance-tasks.md 61.2 KiB
Newer Older
  • Learn to ignore specific revisions
  • Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Environment**: AS
    
    **Relevancy:** Deprecated
    
    **Description**: Re-evaluates dynamic properties of all samples which
    refer via properties of type MATERIAL directly or indirectly to
    materials changed since the last re-evaluation.
    
    **Configuration**:
    
    
    |Property Key|Description|
    |--- |--- |
    |class|`ch.systemsx.cisd.openbis.generic.server.task.DynamicPropertyEvaluationTriggeredByMaterialChangeMaintenanceTask`|
    |timestamp-file|Path to a file which will store the timestamp of the last evaluation. Default value: `../../../data/DynamicPropertyEvaluationTriggeredByMaterialChangeMaintenanceTask-timestamp.txt`.|
    |initial-timestamp|Initial timestamp of the form `YYYY-MM-DD` (e.g. 2013-09-15) which will be used the first time when the timestamp file doesn't exist or has an invalid value. This is a mandatory property.|
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Example**:
    
    **plugin.properties**
    
    
    ```
    class = ch.systemsx.cisd.openbis.generic.server.task.DynamicPropertyEvaluationTriggeredByMaterialChangeMaintenanceTask
    interval = 7 days
    initial-timestamp = 2012-12-31
    ```
    
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    ### FillUnknownDataSetSizeInOpenbisDBFromPathInfoDBMaintenanceTask 
    
    **Environment**: DSS
    
    **Relevancy:** Rare
    
    **Description**: Queries openBIS database to find data sets without a
    size filled in, then queries the pathinfo DB to see if the size info is
    available there; if it is available, it fills in the size from the
    pathinfo information. If it is not available, it does nothing. Data sets
    from openBIS database are fetched in chunks (see data-set-chunk-size
    property). After each chunk the maintenance tasks checks whether a time
    limit has been reached (see time-limit property). If so, it stops
    processing. A code of the last processed data set is stored in a file
    (see last-seen-data-set-file property). The next run of the maintenance
    task will process data sets with a code greater than the one saved in
    the "last-seen-data-set-file". This file is deleted periodically (see
    delete-last-seen-data-set-file-interval) to handle a situation where
    codes of new data sets are lexicographically smaller than the codes of
    the old datasets. Deleting the file is also needed when pathinfo
    database entries are added after a data set has been already processed
    by the maintenance task. 
    
    **Configuration**:
    
    
    
    |Property Key|Description|
    |--- |--- |
    |last-seen-data-set-file|Path to a file that will store a code of the last handled data set. Default value: "fillUnknownDataSetSizeTaskLastSeen"|
    |delete-last-seen-data-set-file-interval|A time interval (in seconds) which defines how often the "last-seen-data-set-file" file should be deleted. The parameter can be specified with one of the following time units:  `ms`, `msec`, `s`, `sec`, `m`, `min`, `h`, `hours`, `d`, `days`. Default time unit is `sec`. Default value: 7 days.|
    |data-set-chunk-size|Number of data sets requested from AS in one chunk. Default: 100|
    |time-limit|Limit of execution time of this task. The task is stopped before reading next chunk if the time has been used up. This parameter can be specified with one of the following time units: `ms`, `msec`, `s`, `sec`, `m`, `min`, `h`, `hours`, `d`, `days`. Default time unit is `sec`.|
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Example:**
    
    **plugin.properties**
    
    
    ```
    <task id>.class = ch.systemsx.cisd.etlserver.plugins.FillUnknownDataSetSizeInOpenbisDBFromPathInfoDBMaintenanceTask
    <task id>.interval = 86400
    <task id>.data-set-chunk-size = 1000
    <task id>.time-limit = 1h
    ```
    
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **NOTE**: Useful in scenarios where the path info feeding sub task of
    post registration task fails.
    
    ### PathInfoDatabaseChecksumCalculationTask
    
    **Environment**: DSS
    
    **Relevancy:** Rare, often the CRC32 is calculated during the post
    registration.
    
    **Description**: Calculates the CRC32 checksum (and optionally a
    checksum of specified type) of all files in the pathinfo database with
    unknown checksum. This task is needed to run only once. It assumes a
    data source for key 'path-info-db'. 
    
    **Configuration**:
    
    
    |Property Key|Description|
    |--- |--- |
    |checksum-type|Optional checksum type. If specified two checksums are calculated: CRC32 checksum and the checksum of specified type. The type and the checksum are stored in the pathinfo database. An allowed type has to be supported by `MessageDigest.getInstance(<checksum type>)`. For more details see http://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.html#getInstance-java.lang.String-.|
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Example**:
    
    **plugin.properties**
    
    
    ```
    class = ch.systemsx.cisd.etlserver.path.PathInfoDatabaseChecksumCalculationTask
    execute-only-once = true
    checksum-type = SHA-256
    ```
    
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    ### PathInfoDatabaseRefreshingTask
    
    **Environment**: DSS
    
    **Relevancy:** Rare
    
    **Description**: Refreshes the pathinfo database with file metadata of
    physical and available data sets in the store. This task assumes a data
    source with for 'path-info-db'.
    
    The data sets are processed in the inverse order they are registered.
    Only a maximum number of data sets are processed in one run. This is
    specified by `chunk-size`.
    
    
    ```{warning}
    Under normal circumstances this maintenance task is never needed, because the content of a physical data set is **never** changed by openBIS itself.<br /><br />Only in the rare cases that the content of physical data sets have to be changed this maintenance task allows to refresh the file meta data in the pathinfo database.
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Configuration**:
    
    
    |Property Key|Description|
    |--- |--- |
    |time-stamp-of-youngest-data-set|Time stamp of the youngest data set to be considered. The format has to be `<4 digit year>-<month>-<day> <hour>:<minute>:<second>`.|
    |compute-checksum|If `true` the CRC32 checksum (and optionally a checksum of the type specified by `checksum-type`) of all files will be calculated and stored in pathinfo database. Default value: true|
    |checksum-type|Optional checksum type. If specified and `compute-checksum = true` two checksums are calculated: CRC32 checksum and the checksum of specified type. The type and the checksum are stored in the pathinfo database. An allowed type has to be supported by `MessageDigest.getInstance(<checksum type>)`. For more details see [Oracle doc](http://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.html#getInstance-java.lang.String-).|
    |chunk-size|Number of data sets requested from AS in one chunk. Default: 1000|
    |data-set-type|Optional data set type. If specified, only data sets of the specified type are considered. Default: All data set types.|
    |state-file|File to store registration time stamp and code of last considered data set. Default: `<store root>/PathInfoDatabaseRefreshingTask-state.txt`|
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Example**:
    
    **plugin.properties**
    
    
    ```
    class = ch.systemsx.cisd.etlserver.path.PathInfoDatabaseRefreshingTask
    interval = 30 min
    time-stamp-of-youngest-data-set = 2014-01-01 00:00:00
    data-set-type = HCS_IMAGE
    ```
    
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    ### RemoveUnusedUnofficialTermsMaintenanceTask
    
    **Environment**: AS
    
    **Relevancy:** Rare
    
    
    **Description**: Removes unofficial unused vocabulary terms. For more details about unofficial vocabulary terms see [Ad Hoc Vocabulary Terms](../../uncategorized/ad-hoc-vocabulary-terms.md).
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Configuration:**
    
    
    |Property Key|Description|
    |--- |--- |
    |older-than-days|Unofficial terms are only deleted if they have been registered more than the specified number of days ago. Default: 7 days.|
    
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Example**:
    
    **service.properties of AS**
    
    
    ```
    <task id>.class = ch.systemsx.cisd.openbis.generic.server.task.RemoveUnusedUnofficialTermsMaintenanceTask
    <task id>.interval = 86400
    <task id>.older-than-days = 30
    ```
    
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    ### ResetArchivePendingTask
    
    **Environment**: DSS
    
    **Relevancy:** Rare
    
    **Description**: For each data set not present in archive and status
    ARCHIVE\_PENDING the status will be set to AVAILABLE if there is no
    command in the DSS data set command queues referring to it.
    
    **Configuration**:
    
    **plugin.properties**
    
    
    ```
    class = ch.systemsx.cisd.etlserver.plugins.ResetArchivePendingTask
    interval = 60 s
    ```
    
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    ### SessionWorkspaceCleanUpMaintenanceTask
    
    **Environment**: AS
    
    **Relevancy:** Default
    
    **Description**: Cleans up session workspace folders of no longer active
    sessions. This maintenance plugin is automatically added by default with
    a default interval of 1 hour. If a manually configured version of the
    plugin is detected then the automatic configuration is skipped.
    
    **Example**:
    
    **plugin.properties**
    
    
    ```
    class = ch.systemsx.cisd.openbis.generic.server.task.SessionWorkspaceCleanUpMaintenanceTask
    interval = 1 day
    ```
    
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    ### MaterialsMigration
    
    **Environment**: AS
    
    **Relevancy:** Relevant
    
    **Description**: Migrates the Materials entities and types to use a
    Sample based model using Sample Properties. It automatically creates and
    assigns sample types, properties and entities.
    
    It allows to execute the migration and to delete of the old Materials
    model in separate steps.
    
    Deleting Materials and material types requires the migration to have
    been a success,  before the deletion a validation check is run.
    
    **Example**:
    
    This maintenance task can be directly configured on the AS
    service.properties
    
    **service.properties**
    
    
    ```
    maintenance-plugins = materials-migration
    
    materials-migration.class = ch.systemsx.cisd.openbis.generic.server.task.MaterialsMigration
    materials-migration.execute-only-once = true
    materials-migration.doMaterialsMigrationInsertNew = true
    materials-migration.doMaterialsMigrationDeleteOld = true
    ```
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    
      
    
    ## Microscopy Maintenance Tasks
    
    ### MicroscopyThumbnailsCreationTask
    
    **Environment**: DSS
    
    **Relevancy:** Relevant
    
    **Description**: Creates thumbnails for already registered microscopy
    data sets.
    
    **Configuration:**
    
    
    |Property Key|Description|
    |--- |--- |
    |maximum-number-of-workers|If specified the creation will be parallelized among several workers. The actual number of workers depends on the number CPUs. There will be not more than 50% of CPUs used.|
    |state-file|Name of the file which stores the registration time stamp of the last successfully handled data set. Default: `MicroscopyThumbnailsCreationTask-state.txt`|
    |script-path|Path to the jython script which specifies the thumbnails to be generated. The script should have defined the method `process(transaction, parameters, tablebuilder)` as for `JythonIngestionService` (see Jython-based Reporting and Processing Plugins). Note, that tablebuilder will be ignored. In addition the global variables `image_config` and `image_data_set_structure` are defined:<br /><ul><li>image_data_set_structure: It is an object of the class `ImageDataSetStructure`. Information about channels, series numbers etc. can be requested.</li><li>image_config: It is an object of the class `SimpleImageContainerDataConfig`. It should be used to specify the thumbnails to be created. Currently only `setImageGenerationAlgorithm()` is supported.</li></ul>|
    |main-data-set-type-regex|Regular expression for the type of data sets which have actual images. Default: `MICROSCOPY_IMG`|
    |data-set-thumbnail-type-regex|Regular expression for the type of data sets which have thumbnails. This is used to test whether there are already thumbnails or not. Default: `MICROSCOPY_IMG_THUMBNAIL`|
    |max-number-of-data-sets|The maximum number of data sets to be handle in a run of this task. If zero or less than zero all data sets will be handled. Default: 1000|
    |data-set-container-type|Type of the data set container. Default: `MICROSCOPY_IMG_CONTAINER`|
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Example**:
    
    **plugin.properties**
    
        class = ch.systemsx.cisd.openbis.dss.etl.MicroscopyThumbnailsCreationTask
        interval = 1 h
        script-path = specify_thumbnail_generation.py
    
    with
    
    **specify\_thumbnail\_generation.py**
    
    
    ```py
    from ch.systemsx.cisd.openbis.dss.etl.dto.api.impl import MaximumIntensityProjectionGenerationAlgorithm
    from sets import Set
    
    
    def _get_series_num():
    series_numbers = Set()
    for image_info in image_data_set_structure.getImages():
        series_numbers.add(image_info.tryGetSeriesNumber())
    return series_numbers.pop()
    
    def process(transaction, parameters, tableBuilder):
    seriesNum = _get_series_num()
    if int(seriesNum) % 2 == 0:
        image_config.setImageGenerationAlgorithm(
                MaximumIntensityProjectionGenerationAlgorithm(
                    "MICROSCOPY_IMG_THUMBNAIL", 256, 128, "thumbnail.png"))
    ```
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    
    
    ### DeleteFromImagingDBMaintenanceTask
    
    **Environment**: DSS
    
    **Relevancy:** Relevant
    
    **Description**: Deletes database entries from the imaging database.
    
    Marco Del Tufo's avatar
    .  
    Marco Del Tufo committed
    This is special variant of [DeleteFromExternalDBMaintenanceTask](./maintenance-tasks.md#deletefromexternaldbmaintenancetask) with the same configuration parameters.
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    
    Marco Del Tufo's avatar
    .  
    Marco Del Tufo committed
    **Configuration**: See [DeleteFromExternalDBMaintenanceTask](./maintenance-tasks.md#deletefromexternaldbmaintenancetask)
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Example**:
    
    **plugin.properties**
    
    
    ```
    class = ch.systemsx.cisd.openbis.dss.etl.DeleteFromImagingDBMaintenanceTask
    data-source = imaging-db
    ```
    
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
         
    
    
    ## Proteomics Maintenance Tasks