Skip to content
Snippets Groups Projects
maintenance-tasks.md 100 KiB
Newer Older
  • Learn to ignore specific revisions
  • Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    codes of new data sets are lexicographically smaller than the codes of
    the old datasets. Deleting the file is also needed when pathinfo
    database entries are added after a data set has been already processed
    by the maintenance task. 
    
    **Configuration**:
    
    | Property Key                            | Description                                                                                                                                                                                                                                                                       |
    |-----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
    | last-seen-data-set-file                 | Path to a file that will store a code of the last handled data set. Default value: "fillUnknownDataSetSizeTaskLastSeen"                                                                                                                                                           |
    | delete-last-seen-data-set-file-interval | A time interval (in seconds) which defines how often the "last-seen-data-set-file" file should be deleted. The parameter can be specified with one of the following time units:  ms, msec, s, sec, m, min, h, hours, d, days . Default time unit is  sec . Default value: 7 days. |
    | data-set-chunk-size                     | Number of data sets requested from AS in one chunk. Default: 100                                                                                                                                                                                                                  |
    | time-limit                              | Limit of execution time of this task. The task is stopped before reading next chunk if the time has been used up. This parameter can be specified with one of the following time units: ms, msec, s, sec, m, min, h, hours, d, days. Default time unit is sec.                    |
    
    **Example:**
    
    **plugin.properties**
    
        <task id>.class = ch.systemsx.cisd.etlserver.plugins.FillUnknownDataSetSizeInOpenbisDBFromPathInfoDBMaintenanceTask
        <task id>.interval = 86400
        <task id>.data-set-chunk-size = 1000
        <task id>.time-limit = 1h
    
    **NOTE**: Useful in scenarios where the path info feeding sub task of
    post registration task fails.
    
    ### PathInfoDatabaseChecksumCalculationTask
    
    **Environment**: DSS
    
    **Relevancy:** Rare, often the CRC32 is calculated during the post
    registration.
    
    **Description**: Calculates the CRC32 checksum (and optionally a
    checksum of specified type) of all files in the pathinfo database with
    unknown checksum. This task is needed to run only once. It assumes a
    data source for key 'path-info-db'. 
    
    **Configuration**:
    
    | Property Key  | Description                                                                                                                                                                                                                                                                                                                                                                                                 |
    |---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
    
    Marco Del Tufo's avatar
    Marco Del Tufo committed
    | checksum-type | Optional checksum type. If specified two checksums are calculated: CRC32 checksum and the checksum of specified type. The type and the checksum are stored in the pathinfo database. An allowed type has to be supported by MessageDigest.getInstance(<checksum type>). For more details see http://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.html#getInstance-java.lang.String-. |
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Example**:
    
    **plugin.properties**
    
        class = ch.systemsx.cisd.etlserver.path.PathInfoDatabaseChecksumCalculationTask
        execute-only-once = true
        checksum-type = SHA-256
    
    ### PathInfoDatabaseRefreshingTask
    
    **Environment**: DSS
    
    **Relevancy:** Rare
    
    **Description**: Refreshes the pathinfo database with file metadata of
    physical and available data sets in the store. This task assumes a data
    source with for 'path-info-db'.
    
    The data sets are processed in the inverse order they are registered.
    Only a maximum number of data sets are processed in one run. This is
    specified by `chunk-size`.
    
    
    > :warning: 
    > **Under normal circumstances this maintenance task is never needed, because the content of a physical data set is **never** changed by openBIS itself.<br /><br />Only in the rare cases that the content of physical data sets have to be changed this maintenance task allows to refresh the file meta data in the pathinfo database.**
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
      
    
    **Configuration**:
    
    | Property Key                    | Description                                                                                                                                                                                                                                                                                                                                                                                                                             |
    |---------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
    
    Marco Del Tufo's avatar
    Marco Del Tufo committed
    | time-stamp-of-youngest-data-set | Time stamp of the youngest data set to be considered. The format has to be <4 digit year>-<month>-<day> <hour>:<minute>:<second>.                                                                                                                                                                                                                                                                   |
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    | compute-checksum                | If true the CRC32 checksum (and optionally a checksum of the type specified by checksum-type) of all files will be calculated and stored in pathinfo database. Default value: true                                                                                                                                                                                                                                                      |
    
    Marco Del Tufo's avatar
    Marco Del Tufo committed
    | checksum-type                   | Optional checksum type. If specified and compute-checksum = true two checksums are calculated: CRC32 checksum and the checksum of specified type. The type and the checksum are stored in the pathinfo database. An allowed type has to be supported by MessageDigest.getInstance(<checksum type>). For more details see http://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.html#getInstance-java.lang.String-. |
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    | chunk-size                      | Number of data sets requested from AS in one chunk. Default: 1000                                                                                                                                                                                                                                                                                                                                                                       |
    | data-set-type                   | Optional data set type. If specified, only data sets of the specified type are considered. Default: All data set types.                                                                                                                                                                                                                                                                                                                 |
    
    Marco Del Tufo's avatar
    Marco Del Tufo committed
    | state-file                      | File to store registration time stamp and code of last considered data set. Default: <store root>/PathInfoDatabaseRefreshingTask-state.txt                                                                                                                                                                                                                                                                                        |
    
    Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    
    **Example**:
    
    **plugin.properties**
    
        class = ch.systemsx.cisd.etlserver.path.PathInfoDatabaseRefreshingTask
        interval = 30 min
        time-stamp-of-youngest-data-set = 2014-01-01 00:00:00
        data-set-type = HCS_IMAGE
    
    ### RemoveUnusedUnofficialTermsMaintenanceTask
    
    **Environment**: AS
    
    **Relevancy:** Rare
    
    **Description**: Removes unofficial unused vocabulary terms. For more
    details about unofficial vocabulary terms see [Ad Hoc Vocabulary
    Terms](/pages/viewpage.action?pageId=80699498).
    
    **Configuration:**
    
    | Property Key    | Description                                                                                                                 |
    |-----------------|-----------------------------------------------------------------------------------------------------------------------------|
    | older-than-days | Unofficial terms are only deleted if they have been registered more than the specified number of days ago. Default: 7 days. |
    
    **Example**:
    
    **service.properties of AS**
    
        <task id>.class = ch.systemsx.cisd.openbis.generic.server.task.RemoveUnusedUnofficialTermsMaintenanceTask
        <task id>.interval = 86400
        <task id>.older-than-days = 30
    
    ### ResetArchivePendingTask
    
    **Environment**: DSS
    
    **Relevancy:** Rare
    
    **Description**: For each data set not present in archive and status
    ARCHIVE\_PENDING the status will be set to AVAILABLE if there is no
    command in the DSS data set command queues referring to it.
    
    **Configuration**:
    
    **plugin.properties**
    
        class = ch.systemsx.cisd.etlserver.plugins.ResetArchivePendingTask
        interval = 60 s
    
    ### SessionWorkspaceCleanUpMaintenanceTask
    
    **Environment**: AS
    
    **Relevancy:** Default
    
    **Description**: Cleans up session workspace folders of no longer active
    sessions. This maintenance plugin is automatically added by default with
    a default interval of 1 hour. If a manually configured version of the
    plugin is detected then the automatic configuration is skipped.
    
    **Example**:
    
    **plugin.properties**
    
        class = ch.systemsx.cisd.openbis.generic.server.task.SessionWorkspaceCleanUpMaintenanceTask
        interval = 1 day
    
    ### MaterialsMigration
    
    **Environment**: AS
    
    **Relevancy:** Relevant
    
    **Description**: Migrates the Materials entities and types to use a
    Sample based model using Sample Properties. It automatically creates and
    assigns sample types, properties and entities.
    
    It allows to execute the migration and to delete of the old Materials
    model in separate steps.
    
    Deleting Materials and material types requires the migration to have
    been a success,  before the deletion a validation check is run.
    
    **Example**:
    
    This maintenance task can be directly configured on the AS
    service.properties
    
    **service.properties**
    
        maintenance-plugins = materials-migration
    
        materials-migration.class = ch.systemsx.cisd.openbis.generic.server.task.MaterialsMigration
        materials-migration.execute-only-once = true
        materials-migration.doMaterialsMigrationInsertNew = true
        materials-migration.doMaterialsMigrationDeleteOld = true
    
      
    
    ## Microscopy Maintenance Tasks
    
    ### MicroscopyThumbnailsCreationTask
    
    **Environment**: DSS
    
    **Relevancy:** Relevant
    
    **Description**: Creates thumbnails for already registered microscopy
    data sets.
    
    **Configuration:**
    
    | Property Key                  | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
    |-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
    | data-set-container-type       | Type of the data set container. Default: MICROSCOPY_IMG_CONTAINER                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
    | data-set-thumbnail-type-regex | Regular expression for the type of data sets which have thumbnails. This is used to test whether there are already thumbnails or not. Default: MICROSCOPY_IMG_THUMBNAIL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
    | main-data-set-type-regex      | Regular expression for the type of data sets which have actual images. Default: MICROSCOPY_IMG                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
    | max-number-of-data-sets       | The maximum number of data sets to be handle in a run of this task. If zero or less than zero all data sets will be handled. Default: 1000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
    | state-file                    | Name of the file which stores the registration time stamp of the last successfully handled data set. Default: MicroscopyThumbnailsCreationTask-state.txt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
    | maximum-number-of-workers     | If specified the creation will be parallelized among several workers. The actual number of workers depends on the number CPUs. There will be not more than 50% of CPUs used.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
    | script-path                   | Path to the jython script which specifies the thumbnails to be generated. The script should have defined the method process(transaction, parameters, tablebuilder) as for JythonIngestionService (see Jython-based Reporting and Processing Plugins). Note, that tablebuilder will be ignored. In addition the global variables image_config and image_data_set_structure are defined:
    
    image_data_set_structure: It is an object of the class ImageDataSetStructure. Information about channels, series numbers etc. can be requested.
    image_config: It is an object of the class SimpleImageContainerDataConfig. It should be used to specify the thumbnails to be created. Currently only setImageGenerationAlgorithm() is supported.
     |
    
    **Example**:
    
    **plugin.properties**
    
        class = ch.systemsx.cisd.openbis.dss.etl.MicroscopyThumbnailsCreationTask
        interval = 1 h
        script-path = specify_thumbnail_generation.py
    
    with
    
    **specify\_thumbnail\_generation.py**
    
        from ch.systemsx.cisd.openbis.dss.etl.dto.api.impl import MaximumIntensityProjectionGenerationAlgorithm
        from sets import Set
    
        def _get_series_num():
            series_numbers = Set()
            for image_info in image_data_set_structure.getImages():
                series_numbers.add(image_info.tryGetSeriesNumber())
            return series_numbers.pop()
    
        def process(transaction, parameters, tableBuilder):
            seriesNum = _get_series_num()
            if int(seriesNum) % 2 == 0:
                image_config.setImageGenerationAlgorithm(
                        MaximumIntensityProjectionGenerationAlgorithm(
                            "MICROSCOPY_IMG_THUMBNAIL", 256, 128, "thumbnail.png"))
    
    ### DeleteFromImagingDBMaintenanceTask
    
    **Environment**: DSS
    
    **Relevancy:** Relevant
    
    **Description**: Deletes database entries from the imaging database.
    This is special variant of
    [DeleteFromExternalDBMaintenanceTask](#MaintenanceTasks-DeleteFromExternalDBMaintenanceTask)
    with the same configuration parameters.
    
    **Configuration**: See
    [DeleteFromExternalDBMaintenanceTask](#MaintenanceTasks-DeleteFromExternalDBMaintenanceTask)
    
    **Example**:
    
    **plugin.properties**
    
        class = ch.systemsx.cisd.openbis.dss.etl.DeleteFromImagingDBMaintenanceTask
        data-source = imaging-db
         
    
    
    ## Proteomics Maintenance Tasks