Best practices for developers#
Definitions#
The B2SAFE service relies on iRODS to implements data policies. The implementation is based on set of operations, which we call rules. Multiple rules can be combined to form workflows. Each rule or workflow can be triggered manually, client side, or automatically, server side, setting a trigger condition in the irods configuration files (usually in the file core.re in /etc/irods). In iRODS exists an object called workflow (WSO, Workflow Structured Object), but we are not referring to that, just to a generic set of rules. Based on our definition, every set of rules, a workflow, it is a rule, but we tend to call workflows only the set of rules which become quite complex.
Example of rule:
EUDATCreateAVU("EUDAT/FIO", *newPID, *path);
(it adds the key-value pair (“EUDAT/FIO”, *newPID) to the object stored in the input path)
Example of workflow:
EUDATCreatePID(*parent_pid, *path, *ror, *fio, *fixed, *newPID);
(see the combined set of rules here)
Example of trigger:
acPostProcForCollCreate {
ON($collName like "/MyZone/home/username/*")
{
*fixed="false";
EUDATPidsForColl($collName, *fixed);
writeLine("serverLog","PID Created: for collection $collName");
}
}
acPostProcForCollCreate
, is triggered every time a new collection is created.Inside a trigger you can add further filters. The example above uses:
ON($collName like "/MyZone/home/username/*")
to apply the condition only to the collections created inside a specific path.
A list of triggered conditions (called static Policy Enforcement Points, PEP) is available in the Appendix A at page 215 and in the Appendix B at page 217 of the workbook. Another one is in the iRODS manual.
Examples#
Let’s now consider an important data policy: the uploaded data is immutable. And the opposite one: the uploaded data is mutable. If the uploaded data is immutable what it is needed to enforce such policy?
Immutable data policy#
This kind of policy is usually agreed with the data owners, for example a scientific community. Therefore the B2SAFE administrator can rely on the fact that the scientific community will not break the policy and just set the following trigger in the iRODS core.re to create a PID for each uploaded object:
acPostProcForPut
{
ON($objPath like "/MyZone/home/community/*")
{
*fixed = "true";
EUDATCreatePID("None", $objPath, "None", "None", *fixed, *PID);
writeLine("serverLog","PID Created: *PID for object $objPath");
}
}
However our B2SAFE administrator does not trust the community’s users, she wants to enforce, server side, this policy.
In order to do so, she has two options:
1. to intercept every attempt to modify the uploaded objects:
2. to define a staging space where the users can upload the data and, after each upload, move them to another space where the users have only read access.
1) in this case, she could define a trigger like the following one:
acPreProcForModifyDataObjMeta
{
ON($objPath like "/MyZone/home/community/*")
{
writeLine("serverLog","attempt to modify $objPath");
msiExit("-1", "user is not allowed to perform the requested action");
}
}
dataArchiveCopy(*collPath, *archivePath) {
foreach ( *res in SELECT DATA_NAME WHERE COLL_NAME = '*collPath' ) {
*objName = *res.DATA_NAME;
*objPath = *collPath ++ "/" ++ *objName;
msiSetACL("default", "admin:own", *adminUser, *objPath);
*destination = *archivePath ++ "/" ++ *objName;
msiDataObjCopy(*objPath, *destination);
msiDataObjUnlink(*objPath, *out);
*fixed = "true";
EUDATCreatePID("None", *destination, "None", "None", *fixed, *PID);
writeLine("serverLog","PID Created: *PID for object *destination");
*owners = list();
foreach ( *R in SELECT DATA_OWNER_NAME, DATA_NAME WHERE COLL_NAME = '*collPath' AND DATA_NAME = '*objName' ) {
*owners = cons(*R.DATA_OWNER_NAME, *owners);
}
if (size(*owners) > 0) {
foreach (*user in *owners) {
msiSetACL("default", "read", *user, *destination);
}
}
}
}
The first approach could work for few collections, but it would not scale for many collections and multiple communities. The latter is more general, but it requires more care during its configuration.
Mutable data policy#
What happens, at the opposite, if the community asks to keep open the data set for future changes?
Assuming the data have PIDs and are replicated across different iRODS zones, the update of an object implies to propagate the changes both to the PID record’s attributes and to the object’s replicas. Does B2SAFE supports such propagation? It does, in the sense that it provides the means to define a workflow to manage those updates.
For example, in the case of the PID record, if you overwrite an object, you need to update the checksum. Then the B2SAFE administrator can configure the following trigger:
acPostProcForModifyDataObjMeta
{
ON($objPath like "/MyZone/home/community/*")
{
getEpicApiParameters(*credStoreType, *credStorePath, *epicApi, *serverID, *epicDebug);
EUDATSearchPID($objPath, *existing_pid);
EUDATeCHECKSUMupdate(*existing_pid, $objPath);
writeLine("serverLog","checksum updated for object: $objPath");
}
}
[...]
*replicaList = list();
EUDATgetLastAVU($objPath, "EUDAT/REPLICA", *replicas);
if ((*replicas != "") && (*replica != "None")) {
*replicaList = split(*replicas, ",");
foreach (*replica in *replicaList) {
EUDATeURLsearch(*replica, *URL);
*hostAndPath = elem(split(*URL, "://"), 1);
*replicaPath = "/" ++ triml(*hostAndPath, "/");
msiDataObjUnlink(*replicaPath, *out);
*registered = "true";
*recursive = "true";
EUDATReplication($objPath, *replicaPath, *registered, *recursive, *response);
writeLine("serverLog","Updated replica *replicaPath for object $objPath: *response");
}
}
[...]
Those are examples of possible solutions to implement the most common data policies. They are not granted to work as they are, they could require to be adapted to the specific B2SAFE environment. But they intend to show that B2SAFE provides the building blocks to support the various data workflows and it is flexible enough to meet different scenarios.