OneFS HDFS ACL support
The Hadoop Distributed File System (HDFS) permissions model for files and directories has much in common with the ubiquitous POSIX model. HDFS Access Control Lists, or ACLs, are Apache's implementation of POSIX.1e ACLs, and each file and directory is associated with an owner and group. The file or directory has separate permissions for the user who owns it, for other users who are members of the group, and for all other users.
However, in addition to the traditional POSIX permissions model, HDFS also supports POSIX ACLs. ACLs are useful for implementing permission requirements that differ from the natural organizational hierarchy of users and groups. An ACL provides a way to set different permissions for specific named users or groups, not just the file owner and file group. HDFS ACL also supports extended ACL entries that allow multiple users and groups to set different permissions on the same HDFS directory and files.
Compared to regular OneFS ACLs, HDFS ACLs differ in both structure and access verification algorithm. The main difference is that the OneFS ACL algorithm is cumulative, so when a user requests Red-Write-Execute access, it can be granted by three different OneFS ACEs (Access Control Entries). In contrast, HDFS ALCs have a strict, predefined order of ACEs for evaluating privileges, and there is no stacking of privilege bits. Therefore, OneFS uses a translation algorithm to fill this gap. For example, this is an example of the mappings between HDFS ACE and OneFS ACE:
HDFS-ACE | OneFS ACE |
Benutzer:rw- | Permitir <sid del propietario> std-read std-write <owner sid> deny standard execution |
Usuario:sheila:rw- | permitir <propietario-sheilas-sid> std-read std-write <sid-shelias-owner> deny standard execution |
group::r– | Permitir <group-sid> std-read denegar <grupo-sid> std-write std-execute |
Mask::rw- | posix-mask <cualquier-sid> std-read std-write <owner sid> deny standard execution |
andere::–x | Allow <any-sid> standard execution denegar <cualquier-sid> std-read std-write |
HDFS Mask ACEs (derived from POSIX.1e) are special access control entries that apply to all named users and groups and represent the maximum permissions that a named user or any group can have on the file. They were introduced essentially to extend the traditional Read-Write-Execute (RWX) mode bits to support the ACL model. OneFS translates these mask ACEs into a "posix mask" ACE, and the access verification algorithm in the kernel applies the mask permissions to all appropriate trustees.
When translating an HDFS ACL -> OneFS ACL -> HDFS ACL, OneFS is guaranteed to return the same ACL. However, the translation of a OneFS ACL -> HDFS ACL -> OneFS ACL can be unpredictable. OneFS ACLs are larger than HDFS ACLs and can lose information when translated to HDFS ACLs for HDFS ACLs with multiple named groups when an administrator is a member of multiple groups. So, for example, if a user has RWX in one group, RW in another, and R in a third group, the results are as expected and RWX access is granted. However, if a user has W in one group, X in another, and R in a third group, in these rare cases the ACL translation algorithm will prioritize security and produce a more restrictive "read-only" ACL.
Here is the full set of ACE mappings between HDFS and OneFS internals:
HDFS ACE permission | They are valid | OneFS internal ACE permission |
rwx | directory | allow dir_gen_read, dir_gen_write, dir_gen_execute, delete_child deny |
office hour | Erlaube file_gen_read, file_gen_write, file_gen_execute deny | |
rw- | directory | permitir dir_gen_read, dir_gen_write, delete_child deny pass |
office hour | file_gen_read, file_gen_write zulassen refuse execution | |
r-x | directory | allow dir_gen_read, dir_gen_execute, verweigern add_file, add_subdir, dir_write_ext_attr, delete_child, dir_write_attr |
office hour | Permitir file_gen_read, file_gen_execure verweigern file_write, agregar, file_write_ext_attr, file_write_attr | |
r- | directory | permite dir_gen_read verweigern add_file, add_subdir, dir_write_ext_attr, atravesar, delete_child, dir_write_attr |
office hour | permitir file_gen_read denegar file_write agregar file_write_ext_attr ejecutar file_write_attr | |
-wx | directory | permitir dir_gen_write, dir_gen_execute, delete_child, dir_read_attr Deny list, dir_read_ext_attr |
office hour | Erlaube file_gen_write, file_gen_execute, file_read_attr verweigern file_read, file_read_ext_attr | |
-w- | directory | permitir dir_gen_write, delete_child, dir_read_attr deny list, dir_read_ext_attr, traverse |
office hour | file_gen_write, file_read_attr zulassen denegar file_read, file_read_ext_attr, ejecutar | |
-X | directory | allow dir_gen_execute, dir_read_attr verweigern Liste, add_file, add_subdir, dir_read_ext_attr, dir_write_ext_attr, delete_child |
office hour | Erlaube file_gen_execute, file_read_attr verweigern file_read, file_write, agregar, file_read_ext_attr, file_write_ext_attr, file_write_attr | |
— | directory | permitir std_read_dac, std_synchronize, dir_read_attr verweigere Liste, add_file, add_subdir, dir_read_ext_attr, dir_write_ext_attr, atravesar, delete_child, dir_write_attr |
office hour | permitir std_read_dac, std_synchronize, file_read_attr verweigern file_read, file_write, agregar, file_read_ext_attr, file_write_ext_attr, ejecutar, file_write_attr |
Enabling HDFS ACLs is typically done through the OneFS CLI and can be configured on a per-access zone granularity. For example:
# change isi hdfs settings --zone=system --hdfs-acl-enabled=true# isi hdfs settings --zone=system service: yes default block size: 128 MB default checksum type: none authentication mode: simple_only root directory: /ifs/data WebHDFS enabled: Yes Ambari Server: Ambari Namenode: ODP Version: Data Transfer Encryption: none Ambari Metrics Collector: HDFS ACL enabled: Yes Hadoop version 3 or later: Yes
Note that the hadoop-version-3 parameter should be set to false if the HDFS clients are running Hadoop 2 or earlier. For example:
# change isi hdfs settings --zone=system --hadoop-version-3-or-later=false# isi-hdfs settings --zone=system service: yes default block size: 128 MB sum type of default verification: none authentication mode: simple_only root: /ifs/data WebHDFS enabled: yes Ambari server: Ambari Namenode: ODP version: Data Transfer Encryption: none Ambari Metrics Collector: HDFS ACL enabled: yes Hadoop version 3 or rear: no
Other useful ACL configuration options are:
# isi auth settings acls modify --calcmodegroup=group_only
Specifies how the group mode bits are approximated. Options include:
possibility | description |
access_group | Approximates the group mode bits using all possible group ACEs. This makes the group permissions appear more permissive than the actual file permissions. |
group_only | It approximates the group mode bits using only the ACE with the owner ID. This will show the group permissions more accurately, since you'll only see the permissions for a specific group and not the most permissive set. Note, however, that this setting can cause access denial problems with NFS clients. |
# isi auth settings acls modify --calcmode-traverse=require
Indicates whether traversal rights are required to search directories with existing ACLs.
# isi auth settings acls modify --group-ownerinheritance=parent
Specifies how group ownership and permission inheritance is handled. If you enable a setting that causes the group owner to inherit from the creator's parent group, you can override it on a folder-by-folder basis by running the chmod command to set the set-gid bit. This inheritance only applies when the file is created. The following options are available:
possibility | description |
indigenous | Specifies that if an ACL exists on a file, the group owner is inherited from the file creator's parent group. If there is no ACL, the group owner is inherited from the parent folder. |
padre | Specifies that the group owner is inherited from the file's parent folder. |
Creator | Specifies that the group owner is inherited from the file creator's parent group. |
Once configured, all of these settings can be verified using the following CLI syntax:
# isi auth setting acls viewStandard Settings Create Over SMB: allow Chmod: merge Chmod Inheritable: no Chown: own_group_and_acl Access: windowsAdvanced Settings Rwx: keep group owner inheritance: parent Chmod 007: default Calcmode Owner: aces_owner Calcmode Group: group_only Denies synthetic: remove Utimes: only_owner DOS Attr: deny_smb Calcmode: approximately Calcmode Traverse: required
When it comes to troubleshooting HDFS ACLs, the /var/log/hdfs.log file can be invaluable. Setting the hdfs.log file to the debug level generates log entries detailing ACL and ACE configuration and parsing. This can be easily accomplished with the following CLI command:
# change isi hdfs at registry level --set debug
Here is an example of debug level log entries showing ACL creation and output:
Detailed settings for a file's security descriptor can be viewed in OneFS using the ls -led command. Specifically, the "-e" argument to this command prints all associated ACLs and ACEs. For example:
# ls -led /ifs/hdfs/file1-rwxrwxrwx + 1 Garn hadoop 0 26 de enero 22:38 file1OWNER: usuario:yarnGROUP: grupo:hadoop0: cada posix_mask file_gen_read, file_gen_write, file_gen_execute1: usuario:yarn allow file_gen_read, file_gen_write, std_write_dac2: grupo:hadoop permitir std_read_dac,std_synchronize,file_read_attr3: grupo:hadoop denegar file_read,file_write,append,file_read_ext_attr, file_write_ext_attr,execute,file_write_attr
The access rights to a file for a specific user can also be viewed using the CLI isi auth access command, as follows:
# isi auth access --user=admin /ifs/hdfs/file1 Username: admin UID: 10 SID: SID:S-1-22-1-10 File Owner Name: Yarn ID: UID:5001 Filename group: Hadoop ID: GID: 5000 Effective Path: /ifs/hdfs/file1 File Permissions: file_gen_read Relevant Ases: group: admin allow file_gen_read Group: admin deny file_write, append, file_write_ext_attr,execute,file_wrote_attr All allow file_gen_read,file_gen_write,file_gen_execute Snapshot Path: No Delete Type : The parent directory allows delete_child for this user, the user can delete the file.
When using SyncIQ or NDMP with HDFS ACLs, be aware that replicating or restoring data from a OneFS 9.3 cluster with HDFS ACLs enabled to a target cluster running an older version of OneFS will cause ACLs to be lost; details can provoke. Specifically, the new ACE mask type cannot be replicated or restored to target clusters running earlier versions, as the ACE mask is not introduced until 9.3. Instead, OneFS generates two versions of the ACL, with and without the "mask", while maintaining the same level of security.
OneFS NFS Performance Resource Monitor
Another feature in the current OneFS 9.3 release is improved performance resource monitoring for the NFS protocol, which adds route tracing for NFS and updates it with SMB protocol resource monitoring.
But first, a quick review. The performance resource monitoring framework allows OneFS to track and report on the use of transient system resources (that is, resources that exist only at a specific time) and provide insight into who is consuming what resources and at what rate. Examples include CPU time, network bandwidth, IOPS, disk accesses, and cache hits (but not disk space or memory usage currently). Originally introduced in OneFS 8.0.1, OneFS performance resource monitoring is an ongoing project that will ultimately provide insight and control. This enables prioritization of work flowing through the system, prioritization and protection of business-critical workflows, and the ability to detect when a cluster is busy.
Because identifying work is highly subjective, OneFS Performance Resource Monitoring offers significant configuration flexibility, allowing cluster administrators to define exactly how they want to track workloads. For example, an administrator might want to divide their work based on criteria such as: B. which user accesses the cluster, which export/share they use, which IP address they use, and often a combination of all three.
So why not keep track of everything, you may be wondering? It would just generate too much data (a cluster might be required just to monitor your cluster!).
OneFS has always provided customer statistics and logs, but they have generally only been front-end. Similarly, OneFS offers CPU, cache, and disk statistics, but does not show who consumed them. Partitioned performance bridges these two areas, tracking CPU, disk, and cache utilization and breaking the initiator/participant barrier.
Under the hood, OneFS collects the resources consumed, grouped across different workloads, and the aggregation of these workloads forms a performance data set.
Article | description | example |
Workload | A set of identification metrics and resources used | {username:nick, zone_name:System} verbrauchte {cpu:1.5s, bytes_in:100K, bytes_out:50M, …} |
performance record | The set of identifying metrics by which workloads are aggregated The list of captured workloads that conform to this specification | {usernames, zone names} |
Filter | · A method to only include workloads that match certain identifying metrics. | Filter{Zone Name:System} ü {username:nickname, zonename:system} ü {username: Jane, zone name: System} × {username:Nick, zone name:Perf} |
The following metrics are tracked:
Category | Article |
The identification metric | · Username / UID / SID Main group name / GID / GSID Secondary group name / GID / GSID zone name Local/remote IP address/range · Outside* Release/Export ID · Protocol System name* · Order type |
Temporary Resources | Uso de CPU Byte enable/disable · PIO Disk reads/writes · L2/L3-Cache-Treffer |
performance statistics | Read/Write/Other latency |
Supported protocols | · NFS SMEs · S3 · Works Background services |
With the exception of the system log, performance logs must be configured before any statistics are collected. This is usually done via the isi performance CLI command set, but can also be done via the platform API:
https://<node_ip>:8080/platform/performance
Once a performance data set has been configured, it will continue to collect resource usage statistics every 30 seconds until it is deleted. These statistics can be viewed using the isi statistics CLI interface.
This is as simple as creating a dataset that specifies the identifying metrics you want to split the work by:
# isi performance dataset create --name ds_test1 username protocol export_id share_name
Then wait 30 seconds for the data to be collected.
Finally, viewing the performance log:
# isi stats workload --dataset ds_test1 CPU BytesIn BytesOut Ops Lecturas Escrituras L2 L3 ReadLatency WriteLatency OtherLatency UserName Protocol ExportIdShareName WorkloadType------------ ------ -------------------------------------------------- ------- ------ ------------------------------------- -------------------- ------ --------------------11.0ms 2.8M 887.4 5.5 0.0 393.7 0.3 0.0 503.0US 638.8US 7.4MS Nick NFS3 1 - - 1.2ms 10.0 K 20.0m 56.0 40.0 0.0 9.0 0.0.0.0US 0US Jane NFS3 3 - - 1.0ms 18.3m 17.0 0.047.0 0.0.0us100.2us2us 3 - - 1.0ms 18.3m 17.0 0.047.0 0.0.0us100.2us2us. 0.0us Jane SMB2 - Inicio -31.4us 15.1 11.7 0.0 0.0 0.0 349.3us 0.0us nick nfs4 4 - -166.3ms 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0us 0.0us 0.0us - - - - Excluded31.0 0.6ms.0 0.0 0.0 0.0 0.0us -0 -0us 0.0 0.0us System70.2us .0 0.0 0.0 3.3 0.1 0.0 0.0us 0.0us 0.0us - - - - Desconocido 0.0us 0.0 0.0 0.0 0.0 0.0 0 .0 0.0 0.0us 0.0us 0.0us - - - - Adicional 0.0us 0.0 0.0 0.0 0 0 0 0.0us 0.0us 0.0us - - - - Sobrevalorado----------------------- -- ------------------------------------------------ -- ------------------------------------------------ -- ----------------------------Total: 8
Please note that OneFS can only collect a limited number of workloads per data set. As such, it keeps track of the biggest resource consumers it considers to be top workloads and issues as many as possible, up to a limit of 1024 top workloads or 1.5MB of memory per sample (whichever reaches first).
When you're done with a performance dataset, you can easily delete it with the "delete performance dataset isi <dataset_name>" syntax. For example:
# Delete performance log isi ds_test1Are you sure you want to delete the performance log? (yes/[no]): yesPerformance data record "ds_test1" with ID number 1 deleted.
Performance resource monitoring includes the five special case workloads added below, and OneFS can issue up to 1024 additional pinned workloads. In addition, a cluster administrator can configure up to four custom data sets or special workloads that can add workloads for special cases.
added workload | description |
Besides | Not on "core" workloads |
Blocked | · Does not match the record definition (ie missing a required metric such as export_id). Matches no filter applied (See later in the presentation) |
overcalculated | Total work that occurred across multiple workloads within the dataset Can occur with datasets that use route and/or group metrics |
System | Background system/kernel work |
a stranger | OneFS could not determine where this job came from |
The system registry is created by default and cannot be deleted or renamed. In addition, it is the only dataset that contains resource metrics for OneFS services and job engine:
OneFS feature | details |
system service | · Any process initiated by isi_mcp / isi_daemon · Any registration service (also). |
works | Contains job ID, job type, and phase |
The no-log job includes only CPU, cache, and disk statistics, but not bandwidth usage, number of operations, or latencies. Also, protocols (eg S3) that have not yet been fully integrated into the resource monitoring framework will get these basic statistics, and only in the system log.
If no dataset is specified in an 'isi statistics workload' command, the 'system' dataset statistics are displayed by default:
# isi-Statistikarbeitslast
Performance workloads can also be "pinned" so that a workload can be tracked even if it is not a "top" workload and regardless of how many resources it consumes. This can be configured using the following CLI syntax:
# isi performance workloads pin <dataset_name/id> <metric>:<value>
Note that all metrics for the dataset must be specified. For example:
# isi performance workloads pin ds_test1 username: jane protocol: nfs3 export_id: 3 # isi statistics workload --dataset ds_test1 CPU BytesIn BytesOut Ops Reads Writes L2 L3 ReadLatency WriteLatency OtherLatency UserName Protocol ExportId WorkloadType--- --------- ----------------------------------------- --------- ----------------------------------------- --------- -------------------------------11.0ms 2.8M 887.4 5.5 0.0 393.7 0.3 0.0 503.0us 638.8us 7.4ms nick nfs3 1 - 1.2ms 10.0K 20.0m 56.0 40.0 0.0 0.0 0.0us 0.0us 0.0us Jane NFS3 3 Past < - Immer sichtbar31 .4us 15.1 11.7 0 .1 0.0 0.0 0.0 0.0 349.3us 0.0us 0.0US Jim NFS4 4 - Workload 166.3ms 0.0 0.0 0.0 0.0 0.0us 0.0 0.0 0.1 0.0 0.0 0.0 us 0.0us 0.0us - - - Ausgeschlossen31.6ms 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0us 0.0us 0.0us - - - System70.2us 0.0 0.0 0.0 0.0 3.3 0.0 0.0 0.0 .0 0.0 - - Unbekannt 0.0 us 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 us 0.0 us 0 ,0 us - - - Zusätzliche <- Working loads jo not fixed 0.0 us 0.0 0.0 0 .0 0.0 0.0 0.0 0.0 0.0 us 0.0 us 0.0 u s - - - Überbewertet das hat es nicht geschafft - --------------------------------------- --- -------- --------------------------------------- --- -------- --------------------------------------- --Gesture : 8
Workload filters can also be configured to limit output to only those workloads that match specific criteria. This allows for more granular tracking, which can be invaluable when tackling large numbers of workloads. The configuration is greedy as a workload will be included if it matchesnonefilter applied. Filtering can be implemented using the following CLI syntax:
# isi performance dataset create <all_metrics> --filters <filtered_metrics># isi performance filter apply <dataset_name/id> <metric>:<value>
For example:
# isi performance dataset create --name ds_test1 nombre de usuario protocolo export_id --filters nombre de usuario, protocolo# isi performancefilter apply ds_test1 nombre de usuario:nick protocol:nfs3# isi statistics workload --dataset ds_test1 CPU BytesIn BytesOut Ops Lecturas Escrituras L2 L3 ReadLatency WriteLatency OtherLatency UserName Protokoll ExportId Tipo de carga de trabajo---------------------------------------------- -- ------------------------------------------------ -- --------------------------------------------11,0ms 2, 8 M 887,4 5,5 0,0 393,7 0,3 0,0 503,0US 638,8us 7,4 ms Nick NFS3 1 - < - Coincide con Filter13,0 ms 1,4 m 600,4 2, 5 0,0 200,7 0,0 0,0 405.0us 638,8us 8,2 ms Nick NFS3 7 - < - Filter167.5ms 10.0k 20.0mmms3m) Markiert Filter167.5ms 10.0 k 20.0mmm 20mm 20mmmm (20.0m) Filtro minado 167,5 ms 10,0 k 20,0 mmmm (20,0 mm) 20,0 m 56,1 40,0 0,1 0,0 0,0 349,3us 0,0us 0,0us - - - Ausgeschlossen <- Summe von nicht31, 6ms 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0us 0,0us 0,0us - - - Systemanpassungsfilter70.2us 0,0 0,0 0,3 0 .0 0 ,0 nosotros 0 ,0 us 0,0 us - - - Unbekannt 0,0 us 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 us 0,0 us 0,0 us - - - Zusätzliche 0,0 us 0,0 0,0 0,0 0. 0 0,0 0,0 0,0 0,0 us 0,0 us 0,0 us - - - Überrechnung ------- --------------- ----------------------------------- --------------- ----------------------------------- --------------- -------Gesto: 8
Note that the metric to filter must be specified when creating the dataset. A dataset with a filtered metric but no filters applied returns an empty dataset.
As mentioned, NFS route tracking is the main performance resource monitoring enhancement of OneFS 9.3 and can be easily enabled (for NFS and SMB protocol) as follows:
# isi performance dataset create --name ds_test1 username path --filters=username,path
The statistics can then be viewed using the following CLI syntax (in this case, for data set 'ds_test1' with ID 1):
# isi-Statistiken-Workload-Liste --dataset1
When it comes to route tracing, there are some caveats and limitations to be aware of. First, with OneFS 9.3, it is now only available for the NFS and SMB protocols. Also, route tracking is expensive and OneFS cannot track all routes. The desired paths to follow must be listed first and can be specified by setting a workload or applying a path filter
If the resource cost of route tracing is considered too expensive, consider an equivalent alternative, if applicable, e.g. B. tracking by NFS export ID or SMB share.
Users can have thousands of child groups, often too many to keep track of. Only the parent group is tracked until the groups are specified, so the child groups to be tracked must be specified first, either by applying a group filter or by pinning a workload.
Note that some metrics only apply to specific records, e.g. eg "export_id" only applies to NFS. Similarly, "share_name" only applies to SMB. Also note that an equivalent metric (or route tracking) has not yet been implemented for S3 object logging. This issue will be fixed in a future release.
These metrics can be used individually in a data set or in combination. For example, a registry configured with export_id and metrics will list NFS and SMB workloads with export_id or share_name. Similarly, a log with only the share_name metric lists only SMB workloads, and a log with only the export_id metric lists only NFS, with no SMB workloads. However, if a workload is excluded, it is added to the special "Excluded" workload.
When viewing the collected metrics, the isi statistics workload CLI command displays only the most recent sample period in a table format, with statistics normalized to a per second granularity:
# isi-Statistikarbeitslast [--dataset <dataset_name/id>]
Names are resolved whenever possible, e.g. B. UID to username, IP address to hostname, etc., and lookup errors are reported via an additional "Error" column. Alternatively, the '—numeric' flag can be included to avoid any search:
# isi-Statistikarbeitslast [--dataset <dataset_name/id>] – numeric
Statistics for aggregated clusters are reported by default, but adding the "—nodes" argument provides per-node statistics as seen by each initiator. More specifically, the -nodes=0 flag reports only the local node, while -nodes=all reports all nodes.
The isi statistics workload command also includes standard flags for formatting the statistics output, such as "—sort", "—totalby", and so on.
--sort (CPU | (BytesIn|bytes_in) | (BytesOut|bytes_out) | Ops | Reads | Writes | L2 | L3 | (ReadLatency|latency_read) | (WriteLatency|latency_write) | (OtherLatency|latency_other) | Nodo | (UserId |user_id) | (UserSId|user_sid) | Nombre de usuario | (Proto|protocolo) | (ShareName|share_name) | (JobType|job_type) | (RemoteAddr|remote_address) | (RemoteName|remote_name) | (GroupId|group_id) | (GroupSId |group_sid) | GroupName | (DomainId|domain_id) | Path | (ZoneId|zone_id) | (ZoneName|zone_name) | (ExportId|export_id) | (SystemName|system_name) | (LocalAddr|local_address) | (LocalName|local_name ) | (WorkloadType|workload_type) | Error) Sort the data by the specified comma-separated fields. Prepend 'asc:' or 'desc:' to a field to change the sort order. --totalby (Nodo | (UserId|user_id) | (UserSId|user_sid) | UserName | (Proto|protocol) | (ShareName|share_name) | (JobType|job_type) | (RemoteAddr|remote_address) | (RemoteName|remote_name) | (GroupId|group_id) | (GroupSId|group_sid) | GroupName | (DomainId|domain_id) | Path | (ZoneId|zone_id) | (ZoneName|zone_name) | (ExportId|export_id) | (SystemName|system_name) | (LocalAddr|local_address ) | (Nombre local|nombre_local)) Aggregated by specified fields.
In addition to the CLI, the same output can be accessed via the platform API:
# https://<Knoten-IP>:8080/platform/statistics/summary/workload
Raw metrics can also be viewed using the isi stat query command:
# isi stats query <actual|historial> --format=json --keys=<nodo|clúster>.performance.dataset.<dataset_id>
Options are "Current" which provides the most recent sample, or "History" which shows the samples collected in the last 5 minutes. Names are not looked up and statistics are not normalized: the sample period is included in the result. The command syntax should also include the --format=json flag, as other output formats are not currently supported. Furthermore, there are two different types of keys:
key type | description |
cluster.performance.dataset.<dataset_id> | Aggregated statistics of the whole cluster |
node.performance.dataset.<dataset_id> | Statistics per node from the initiator's perspective |
Similarly, these raw metrics can also be obtained through the platform API as follows:
https://<node_ip>:8080/platform/statistics/<current|history>?keys=node.performance.dataset.0
OneFS Writable Snapshots: Coexistence and Warnings
In the final article in this series, we look at how writable snapshots coexist in OneFS and how they integrate and interact with the various OneFS data services.
Starting with OneFS itself, support for writable snapshots is introduced in OneFS 9.3 and the functionality is enabled after committing an upgrade to OneFS 9.3. Seamless upgrade to OneFS 9.3 and later is fully supported. However, as we've seen in this series of articles, writable snapshots in 9.3 have various biases, caveats, and best practices. This includes adhering to the default OneFS limit of 30 active writable snapshots per cluster (or at least not attempting to delete more than 30 writable snapshots at a time if the max_active_wsnaps limit increases for any reason).
There are also some limitations that determine where the mount point of a writable snapshot can be located on the file system. This does not include in an existing directory, in a snapshot source path, or in a SmartLock or SyncIQ domain. Even if the content of a write snapshot retains the permissions it had at the source, ensure that the parent directory structure has the appropriate access permissions for the users of the write snapshot.
The OneFS job engine and striping jobs also support writable snapshots, and in general most jobs can run within the path of a writable snapshot. Note, however, that jobs involving tree walks are not copied on read for LIN in writable snapshots.
The PermissionsRepair job cannot repair files in a writable snapshot that have not yet been copied. To avoid this, before starting a PermissionsRepair job, for example, the "find" CLI command (which looks for files in the directory hierarchy) can be run at the root of the writable snapshot to populate the namespace of the snapshot. recordable.
The TreeDelete job works for subdirectories in writable snapshots. TreeDelete executed in or on a writable snapshot does not delete the parent or root directory of the writable snapshot (unless scheduled by a writable snapshot library).
ChangeList, FileSystemAnalyze, and IndexUpdate jobs cannot view files in a writable snapshot. Therefore, the FilePolicy job, which is based on index updates, cannot handle files in writable snapshots.
Writable snapshots also work as expected with OneFS access zones. For example, a writable snapshot can be created in a different access zone than the source snapshot:
# isi zone zone listName Path------------------------System /ifszone1 /ifs/data/zone1zone2 /ifs/data/zone2----- -------------------Total: 2# isi-snapshots list118224 s118224 /ifs/data/zone1# crear isi-snapshot grabable s118224 /ifs/data/zone2/wsnap1# isi instantánea grabable listPath src ruta src instantánea----------------------------------------- - ------------- ----------- /ifs/datos/zona2/wsnap1 /ifs/datos/zona1 s118224---------- - --------------------------- ----------------------- ------- Total: 1
Writeable snapshots are supported on any cluster architecture running OneFS 9.3, and this includes clusters using data encryption with SED drives that also fully support writeable snapshots. Similarly, InsightIQ and DataIQ support both writable snapshots and accurate reporting on them, as expected.
Writable snapshots also support SmartQuotas and use directory quota capacity reports to track logical and physical space usage. This can be seen using the CLI commands "isi quota list/view" in addition to the "isi snapshot write view" command.
In terms of data tiers, writable snapshots coexist with SmartPools, and setting SmartPools on top of writable snapshots is supported. However, in OneFS 9.3, SmartPools tiering policies for filegroups do not apply to a writable snapshot path. Instead, the writable snapshot data follows the tier policies that apply to the source of the writable snapshot. Additionally, SmartPools are often used to store snapshots on a lower performance, capacity optimized storage tier. In this case, the performance of a writable snapshot whose source snapshot is stored in a slower pool is likely to be negatively affected. Also note that CloudPools is not supported for writable snapshots in OneFS 9.3, and CloudPools on a writable snapshot target is not currently supported.
On the data immutability front, you cannot create a SmartLock WORM domain in or on top of a writable snapshot in OneFS 9.3. The attempts fail with the following messages:
# isi writable snapshot listPath Src Path Src Snapshot---------------------------------------- - ----------- /ifs/test/rw-head /ifs/test/head1 s159776-------------- - --------- -----------------------------------Total: 1# Create worm domain isi -d forever /ifs/ test/rw-head/ WormBist Are you sure? (yes/[no]): yesSmartLock failed to activate: unsupported operation# isi worm domain create -d forever /ifs/test/rw-head/wormAre you sure? (yes/[no]): yes SmartLock could not be activated: the process is not supported
It is also not allowed to create a writable snapshot inside a directory with a WORM domain.
# isi worm domains listID path type ----------------------------------2228992 /ifs/test/worm company -- - ------------------------------Total: 1# create isi snapshot writable s32106 /ifs/test/worm/wsnapWritable snapshot can nested under domain WORM 22.0300: operation not supported
In terms of writable snapshots and data reduction and storage efficiency, the story in OneFS 9.3 is as follows. OneFS inline compression works with writable snapshot data, but inline deduplication is not supported, and inline deduplication ignores existing files in writable snapshots. However, inline deduplication can occur on any new files just created in the writable snapshot.
Post-deduplication of writable snapshot data is not supported, and the SmartDedupe job ignores files in writable snapshots.
Similarly, at the file level, attempts to clone data within a writable snapshot (cp -c) are also prohibited and fail with the following error:
# isi snapshot writable listPath Src Path Src Snapshot---------------------------------------- - ----------- /ifs/wsnap1 /ifs/test1 s32106------------------------------ . ------------ ------------------------Total: 31# cp -c /ifs/wsnap1/file1/2. ifs/wsnap1/file1.clonecp:file1.clone: unable to clone from 1:83e1:002b::HEAD to 2:705c:0053: Invalid argument
Also, in a small file packaging workload, OneFS Small File Storage Efficiency (SFSE) processing ignores files in a writable snapshot, and there is currently no support for data inode embedded within a domain either. writable snapshot.
In terms of availability and data protection, OneFS 9.3 also has some caveats related to writable snapshots to be aware of. Regarding SnapshotIQ:
Writable snapshots cannot be created from a source snapshot of the /ifs root directory. They cannot currently be locked or changed to read-only either. However, the read-only source snapshot is locked for the entire life cycle of a write snapshot.
Writable snapshots cannot be updated from a more recent read-only source snapshot. However, a new writable snapshot can be created from a more recent source snapshot to include subsequent updates to the replicated production data set. Creating a read-only snapshot of a writable snapshot is also not allowed and fails with the following error message:
# isi snapshots create /ifs/wsnap2snapshot create failure: Operation not supported
Writable snapshots cannot be nested in the namespace under other writable snapshots, and such operations return ENOTSUP.
Only IFS domain-based snapshots are allowed as the source of a writable snapshot. This means that snapshots created on a cluster prior to OneFS 8.2 cannot be used as the source for a writable snapshot.
Snapshot aliases cannot be used as the source of a writable snapshot, even if the alias target ID is used instead of the alias target name. The full name of the snapshot must be specified.
# isi Snapshot Snapshot View snapalias1 ID: 134340 Name: snapalias1 Path: /ifs/test/rwsnap2 Has Locks: Yes Scheduling: - Alias Target ID: 106976 Alias Target Name: s106976 Created: 2021-08 -16T22:18:40 Expires: - Size: 90,000,000 Shadow bytes: 0.00% Reserve: 0.00% % File system: 0.00% Status: active# Make isi snapshot writable 134340 / ifs/testwsnap1Source SnapID(134340) is an alias: unsupported operation
Creating a SnapRevert domain on or on top of a writable snapshot is not allowed. Similarly, creating a writable snapshot on a directory with a SnapRevert domain is not supported. Such operations return ENOTSUP.
Finally, the SnapshotDelete job has no interaction with writable snapshots, and the TreeDelete job handles deletion of writable snapshots for you.
Regarding NDMP backups, since NDMP uses read-only snapshots for checkpoints, you cannot back up writable snapshot data on OneFS 9.3.
Moving on to replication: SyncIQ is unable to copy or replicate the data within a writable snapshot in OneFS 9.3. More precise:
replication condition | description |
Recordable snapshot as SyncIQ source | Replication fails because snapshot creation is not allowed on the writable source snapshot. |
Recordable snapshot as SyncIQ target | The replication job fails because snapshot creation is not supported on the writable target snapshot. |
Recordable snapshot one or more levels deeper in the SyncIQ source | Data from a writable snapshot is not replicated to the destination cluster. However, the rest of the feed is replicated as expected. |
Recordable snapshot one or more levels deeper into the SyncIQ target | When the status of a writable snapshot is ACTIVE, the root of the writable snapshot is not removed from the destination, causing replication to fail. |
Attempts to replicate the files to a writable snapshot fail with the following SyncIQ job error:
"SyncIQ was unable to create a snapshot on the source cluster. Snapshot initialization error: Snapshot creation failed. The operation is not supported."
Because SyncIQ does not allow you to lock your snapshots, OneFS cannot create writable snapshots based on snapshots generated by SyncIQ. This includes all read-only snapshots with the name prefix "SIQ-*". Any attempt to use snapshots with a SIQ* prefix fails with the following error:
# isi snapshot writable create SIQ-4b9c0e85e99e4bcfbcf2cf30a3381117-latest /ifs/rwsnapSource SnapID(62356) is a snapshot related to SyncIQ: Invalid argument
A common use case for writable snapshots is in disaster recovery testing. For disaster recovery purposes, an organization typically has two PowerScale clusters configured in a source/destination SyncIQ replication relationship. Many organizations need to perform regular DR testing to verify the functionality of their processes and tools in the event of a business continuity interruption or disaster recovery event.
Given the compatibility of writable snapshots with the SyncIQ limitations described above, you can create a writable snapshot of a production dataset that is replicated to a target DR cluster as follows:
- On the source cluster, create a SyncIQ policy to replicate the source directory (/ifs/test/head) to the destination cluster:
# sync policies isi create --name=ro-head sync --source-rootpath=/ifs/prod/head --target-host=10.224.127.5 --targetpath=/ifs/test/ro-head# policies of sync isi listName Pfadaction Activates Ziel-------------------------------------------- ---------------- - -----ro-head /ifs/prod/head sync Yes 10.224.127.5-------------- ---------------- - --------------------Gesture:
- Run the SyncIQ policy to replicate the source directory to /ifs/test/ro-head on the destination cluster:
# isi sync jobs start ro-head --source-snapshot s14# isi sync jobs listPolicy Name ID Status Action Duration------------------------ ------------- ----- -----------------ro-head 1 run run 22s -------- ------------------ ----- -- ----------Total: 1# isi synchronization jobs view ro-headPolicy Name: ro-head ID: 1 Status: running Action: execute Duration: 47s Start time: 2021-06 -22T20:30:53Target:
- Create a read-only snapshot of the replicated dataset on the destination cluster:
# isi-Snapshot-Snapshots erstellen /ifs/test/ro-head# isi-Snapshot-Snapshots listID Nombre Ruta-------------------------- ---- ----------------------------------2 SIQ_HAL_ro-head_2021-07-22_20-23- initial /ifs/ test/ro-head3 SIQ_HAL_ro-head /ifs/test/ro-head5 SIQ_HAL_ro-head_2021-07-22_20-25 /ifs/test/ro-head8 SIQ-Failover-ro-head-2021-07-22_20 -26-17 /ifs/prueba/ro-head9 s106976 /ifs/prueba/ro-head----------------------------- --- ---------------------------------
- Using the snapshot (not SIQ_*) of the old replicated dataset as source, create a writable snapshot on the target cluster at /ifs/test/head:
# Make the isi snapshot writable s106976 /ifs/test/head
- Confirm that the writable snapshot was created on the destination cluster:
# isi Snapshot beschreibbar listPath Src Path Src Snapshot----------------------------------------- -----------------------------/ifs/prueba/cabeza /ifs/prueba/ro-cabeza s106976------ -------------------------------------------------- --------------Gesamt: 1# du -sh /ifs/test/ro-head21M /ifs/test/ro-head
- Export and/or share the writable snapshot data in /ifs/test/head on the target cluster using the protocols of your choice. Deploy export or sharing on client systems and perform disaster recovery testing and verification as needed.
- After the DR test is complete, delete the writable snapshot on the target cluster:
# delete the snapshot isi recordable /ifs/test/head
Note that writable snapshots cannot be updated from a more recent read-only source snapshot. A new writable snapshot would need to be created using the most recent snapshot source to reflect subsequent updates to the production dataset on the destination cluster.
So there you have it: the introduction of writable snapshots v1 in OneFS 9.3 provides the long-awaited ability to quickly, easily and efficiently create copies of datasets by enabling a writable view of a regular snapshot presented in a target directory and so that Clients can access it through the full range of supported NAS protocols.
OneFS Write Snapshot Performance, Monitoring, and Management
When it comes to monitoring and managing OneFS writable snapshots, the "isi writable snapshots" CLI syntax looks and feels similar to regular read-only snapshot utilities. The currently available writable snapshots in a cluster can be easily viewed from the CLI using the isi snapshot writable list command. For example:
# isi Snapshot beschreibbar listPath Src Path Src Snapshot----------------------------------------- -----/ifs/prueba/wsnap1 /ifs/prueba/prod prod1/ifs/prueba/wsnap2 /ifs/prueba/snap2 s73736------------------ - ----------------------
The properties of a specific writable plug-in, including its logical and physical size, can be viewed using the isi snapshot writable view CLI command:
# isi-snapshot-writable view /ifs/test/wsnap1 Route: /ifs/test/wsnap1 Src Path: /ifs/test/prodSrc Snapshot: s73735 Created: 2021-06-11T19:10:25Logical Size: 100.00Physical Size: 32.00k Status: active
OneFS SmartQuotas provides the billing level for capacity resources for writable snapshots. Application logical, logical, and physical space usage is retrieved from a directory quota at the root of the writable snapshot and is displayed through the CLI as follows:
# isiTipo quota quota list Applies to route Snap Hard Soft Adv Used Reduction Efficiency------------------------------- --------------------- ----------------------------- --------------------- --- Verzeichnis DEFAULT /ifs/test/wsnap1 Nein - - - 76.00 - 0.00 : 1--------- --------------------- ----------------------------- ------------------------------
Or from the OneFS WebUI by navigating to File System > SmartQuotas > Quotas and Usage:
For more details, the isi quota view CLI command provides a complete assessment of the directory quota domain of a writable snapshot, including physical, logical, and storage efficiency metrics, as well as a file count. For example:
# isi see quota quotas /ifs/test/wsnap1 directory path: /ifs/test/wsnap1 type: directory snapshots: no forced: yes containers: not linked: no usage files: 10 physical (with overload): 32, 00 KB FSPhysical (unduplicated): 32.00k FSLogical(W/O Overhead): 76.00 AppLogical(AparentSize): 0.00 ShadowLogical: - PhysicalData: 0.00 Protection: 0.00 Reduction(Logical/Data): None : 1Efficiency(Logical/Fysical): 0.00 : 1 Over: - Thresholds Enabled: fslogicalsize ReadyToEnforce: Yes Thresholds Hard Threshold: - Completely Exceeded: No Last Hard Exceeded: - Warning: - Warning Threshold Percentage: - Warning Exceeded: No Last Exceeded By Warning: - Threshold Soft: - Soft Threshold Percentage: - Exceeded Soft: - Soft Exceeded: No Soft Last Exceeded: - Soft Grace: -
This information is also available through the OneFS WebUI by navigating to File System > SmartQuotas > Generated Report File > View Report Details:
In addition, the CLI isi get command can be used to verify the efficiency of individual writable snapshot files. First, run the following command syntax on the file selected in the source snapshot path (in this case /ifs/test/source).
In the following example, the source file /ifs//test/prod/testfile1 is specified with a size of 147 MB and a occupancy of 18019 physical blocks:
# isi get -D /ifs/test/prod/testfile1.zipPOLICY W LEVEL PERFORMANCE COAL ENCODING FILE IADDRSdefault 16+2/2 Parallelität auf UTF-8 testfile1.txt <1,9,92028928:8192>****** ************************************************ IFS inode: [ 1 , 9,1054720:512 ]******************************************** * ****** Inode Version: 8* Directory Version: 2* Inode Revision: 145* Inode Mirror Count: 1* Fetched Flag: 0* Restrip Status: 0* Link Count: 1* Size : 147451414* Mode: 0100700 * Flags: 0x110000e0* SmartLinked: Falsch* Physical blocks: 18019
However, when you run the "isi get" CLI command on the same file within the write snapshot tree (/ifs/test/wsnap1/testfile1), the write space-efficient copy now consumes only 5 physical blocks compared to 18019 blocks in the original file:
# isi get -D /ifs/test/wsnap1/testfile1.zipPOLICY W LEVEL PERFORMANCE COAL ENCODING FILE IADDRSdefault 16+2/2 Parallelität auf UTF-8 testfile1.txt <1,9,92028928:8192>****** ************************************************ IFS inode: [ 1 , 9,1054720:512 ]******************************************** * ****** Inode Version: 8* Directory Version: 2* Inode Revision: 145* Inode Mirror Count: 1* Fetched Flag: 0* Restrip Status: 0* Link Count: 1* Size : 147451414* Mode: 0100700 * Flags: 0x110000e0* SmartLinked: Falsch* Physical blocks: 5
Writable snapshots use OneFS Policy Domain Manager or PDM for validation and verification of domain membership. For each writable snapshot, a "WSnap" domain is created in the destination directory. The isi_pdm CLI utility can be used to generate reports on the writable snapshot domain for a specific directory.
# isi_pdm -v lista de dominios --patron Wsnap /ifs/test/wsnap1Domain Patron Pathb.0700 WSnap /ifs/test/wsnap1
Additional details of the backup domain can also be displayed using the following CLI syntax:
# isi_pdm -v dominios leídos b.0700('b.0700',):{ version=1 state=ACTIVE ro store=(type=RO SNAPSHOT, ros_snapid=650, ros_root=5:23ec:0011)ros_lockid=1) }
Domain mapping has some implications for writable snapshots in OneFS 9.3, and there are some notable caveats. For example, files within the writable snapshot domain cannot be renamed outside of the writable snapshot to allow the file system to easily track the files.
# mv /ifs/test/wsnap1/file1 /ifs/testmv: rename file1 to /ifs/test/file1: operation not allowed
Also, nesting of writable snapshots is not allowed in OneFS 9.3, and an attempt to create a writable snapshot in a subdirectory under an existing writable snapshot fails with the following CLI command warning output:
# isi snapshot writable create prod1 /ifs/test/wsnap1/wsnap1-2Writable snapshot:/ifs/test/wsnap1 nested under another writable snapshot: operation not supported
When a writable snapshot is created, all existing hard and symbolic links (symlinks) that point to files in the snapshot namespace continue to work as expected. However, existing hard links to a file outside the snapshot domain are removed from the writable snapshot, including the link count.
link type | supports | details |
Existing external hard link | no | Old external hard links fail. |
existing internal hard link | Y | Existing hard links within the snapshot domain work as expected. |
external hard link | no | New external hard links fail. |
New internal hard link | Y | Existing hard links work as expected. |
external symbolic link | Y | External symbolic links work as expected. |
internal symlink | Y | Internal symbolic links work as expected. |
Note that any attempt to create a hard link to another file outside the writable snapshot boundary will fail.
# ln /ifs/test/file1 /ifs/test/wsnap1/file1ln: /ifs/test/wsnap1/file1: Operation not allowed
However, symbolic links work as expected. The actions and expectations for OneFS hard links and symbolic links with writable snapshots are as follows:
Writable snapshots are not specifically licensed and their use is governed by the OneFS SnapshotIQ data service. Therefore, to use writable snapshots in a OneFS 9.3 PowerScale cluster, SnapshotIQ must have an all-node license in addition to a OneFS general license. In addition, the ISI_PRIV_SNAPSHOT role-based administrator permission is required for each cluster management account that creates and manages writable snapshots. For example:
# isi auth roles vista SystemAdmin | grep -i Snap-ID: ISI_PRIV_SNAPSHOT
In general, access to writable snapshot files is slightly less efficient compared to source or header files because an additional layer of indirection is required to access the data blocks. This is especially true for older source snapshots, where a long read string may require significant "ditto" block resolution. This occurs when parts of a file are no longer in the source snapshot and the inode block tree in the snapshot does not point to an actual data block. Instead, it has a flag identifying it as "Ditto Block". A Ditto block indicates that the data is identical to the next newer version of the file, so OneFS automatically checks for the newer version of the block. If there are a large number (for example, hundreds or thousands) of snapshots of the same file with no changes, reading the oldest snapshot can have a significant impact on latency.
performance attribute | details |
big directories | Because a write snapshot performs a copy-on-read to complete the file's metadata on first access, initial access to a large directory (for example, containing millions of files) that attempts to enumerate its contents will be relatively slow because the write snapshot has to be iterative. fill in the metadata. This applies to namespace detection operations such as find and ls, unlink and rename, and other operations that work with large directories. However, any subsequent access to the directory or its contents is fast because the file's metadata is already there and there is no copy-on-read overhead. The unlink_copy_batch and readdir_copy_batch parameters in the efs.wsnap sysctl file control the size of metadata batch copies. These parameters can be useful for optimizing the number of iterative metadata copy reads for data sets with large directories. However, these sysctls should only be changed under the direct supervision of Dell Technical Support. |
Read/write writable snapshot metadata | The initial read and write operations make a copy on read and are therefore a bit slower compared to the head. However, once the copy-on-read has been done for the LINs, the performance of the read/write operations is almost equivalent to the header. |
Read/write writable snapshot data | In general, read and write operations for writable snapshot data are a bit slower compared to head. |
Multiple recordable snapshots from a single source | The performance of each subsequent write snapshot created from the same read-only source snapshot is the same as the first, up to the OneFS 9.3 recommended default limit of 30 total write snapshots. This is regulated by the sysctl "max_active_wsnpas". # sysctlefs.wsnap.max_active_wsnaps efs.wsnap.max_active_wsnaps: 30 While the "max_active_wsnaps" sysctl can be set to a maximum of 2048 writable snapshots per cluster, in OneFS 9.3 it is strongly discouraged to change this sysctl from its default value of 30. |
Writable snapshots and tiering of SmartPools | Because the unchanged file data in a writable snapshot is read directly from the source snapshot, writable snapshot latency is negatively affected if the source is stored at a lower performance level than the directory structure of the source. the writable snapshot. |
impact on memory | The disk space consumption of a writable snapshot is proportional to the number of write, truncate, or similar operations it receives, since only changed blocks are stored relative to their source snapshot. The metadata overhead grows linearly as a result of copies on read with each new writable snapshot created and accessed. |
snapshot deletions | Writable snapshot delete operations are offloaded by the TreeDelete job and run out of band. Therefore, the performance impact should be minimal, although the actual deletion of the data is not instantaneous. Also, the TreeDelete job has a route to prevent copying when writing files within a write plugin that have not yet been enumerated. |
Note that since writable snapshots take up very little space, the savings are related to file data only. This means that the metadata for every file and directory in a snapshot is completely consumed. Therefore, for large sizes and numbers of writable snapshots, inode consumption should be considered, especially for SSD strategies for reading and writing metadata.
In the next and final article in this series, we look at writable snapshots in the context of the other OneFS data services.
OneFS writable snapshots
OneFS 9.3 introduces writable snapshots to the PowerScale data services portfolio, enabling the creation and management of space- and time-efficient, writable copies of regular OneFS snapshots in a directory path within the /ifs namespace, via those that are accessed and manipulated through any of the following options: view the cluster object and file protocols, including NFS, SMB, and S3.
The primary focus of Writable Snaps in 9.3 is disaster recovery testing, where they can be used to quickly clone production data sets, allowing disaster recovery techniques to be routinely tested on identical, thin copies. of production data. This is a significant benefit to the growing number of organizations using Isolate & Test or Bubble networks, where disaster recovery testing is performed in a replicated environment that closely mimics production.
Other writable snapshot use cases may include parallel processing workloads spanning a fleet of servers that can be configured to use multiple writable snapshots of a single production data set to reduce time to results and deliverables. And writable snapshots can also be used to create and deploy templates for nearly identical environments, enabling highly predictable and scalable development and test pipelines.
OneFS's snapshot-on-write architecture provides an overlay to a read-only source snapshot, allowing a cluster administrator to create a lightweight copy of a production dataset with a simple CLI command and render and display it. use as a standalone write namespace.
In this scenario, a SnapshotIQ snapshot (snap_prod_1) is created from the /ifs/prod directory. The read-only snapshot snap_prod_1 is used as the base for a writable snapshot created in /ifs/wsnap. This writable snapshot contains the same subdirectory and file structure as the original "prod" directory, just without the additional data capacity footprint.
Internally, OneFS 9.3 introduces a new data structure for protection groups, PG_WSNAP, which provides an overlay that allows unchanged file data to be read directly from the source snapshot, while only structure changes are stored. of the writable snapshot.
In this example, a file (head) contains four blocks of data, A through D. A read-only snapshot is taken of the directory containing the head file. This file is then modified by a copy-on-write operation. As a result, the new head data, B1, is written to block 102 and the original data block "B" is copied into a new physical block (110). The snapshot pointer now points to block 110 and the new original data location "B" so that the snapshot has its own copy of that block.
A writable snapshot is then created using the read-only snapshot as the source. This writable snapshot is then modified so that its updated version of block C is stored in its own protection group (PG_WSNAP). A client then issues a read request for the writable snapshot version of the file. This read request is made via the read-only snapshot to the major versions of blocks A and D, the read-only snapshot version to block B, and block C's own version of the writable snapshot file (C1in Block 120).
OneFS directory quotas provide the accounting and reporting infrastructure for writable snapshots, allowing users to easily view the disk space usage of a writable snapshot. In addition, IFS domains are also used to link and manage writable snapshot memberships. In OneFS, a domain defines a set of behaviors for a collection of files in a specific directory tree. When a protection domain is applied to a directory, that domain also affects all files and subdirectories in that top-level directory.
When files are first accessed in a newly created writable snapshot, the data is read from the source snapshot and the files' metadata is populated in a process known as copy-on-read (CoR). Unchanged data is read from the source snapshot and the changes are saved to the writable snapshot namespace data structure (PG_WSNAP).
Since a new writable snapshot is not copied in advance, its creation is extremely fast. When files are subsequently accessed, they are enumerated and start consuming metadata space.
The first time a writable snapshot file is accessed, a read of the source snapshot is initiated and the file data is accessed directly from the read-only snapshot. At this point, the MD5 checksums for the source file and the writable snapshot file are identical. For example, if the first block of a file is overwritten, only that single block is written to the writable snapshot, and the remaining unchanged blocks continue to be read from the source snapshot. At this point, the source and writable snapshot files are now different, so their MD5 checksums are also different.
Before writable snapshots can be created and managed on a cluster, the following requirements must be met:
- The cluster is running OneFS 9.3 or later with the confirmed update.
- SnapshotIQ is licensed for the entire cluster.
Note that for replication environments using writable snapshots and SyncIQ, all target clusters must be running OneFS 9.3 or later, have a SnapshotIQ license, and provide sufficient capacity for the fully replicated data set.
By default, up to thirty active writable snapshots can be created and managed in a cluster using the OneFS command line interface (CLI) or the RESTful Platform API.
When you create a new writable snapshot, all files contained in the snapshot source or HEAD directory tree are immediately available for reading and writing in the destination namespace.
Once no longer needed, a writable snapshot can be easily deleted via the CLI. Note that the WebUI settings for writable snapshots are not available starting with OneFS 9.3. However, a writable snapshot can be easily created via the CLI as follows:
The source snapshot (src-snap) is an existing read-only snapshot (prod1) and the destination path (dst-path) is a new directory within the /ifs namespace (/ifs/test wsnap1). A read-only source snapshot can be generated as follows:
# isi-Snapshot-Snapshots erstellen prod1 /ifs/test/prod# isi-Snapshot-Snapshots listID Nombre Pfad --------------------------- ---- ------------------------7142 prod1 /ifs/prueba/prod
The following command then creates a writable snapshot in an "active" state.
# isi snapshot writable create prod1 /ifs/test/wsnap1# delete isi snapshot snapshots -f prod1Snapshot "prod1" cannot be deleted because it is blocked
Although the OneFS CLI is not specifically prevented from unlocking a writable snapshot for the backup snapshot, it does issue a clear warning.
# isi snap lock view prod1 1 ID: 1 Comment: Locked/unlocked by writable snapshots, do not apply delete lock. Expires: 2106-02-07T06:28:15 Quantity: 1# isi snap lock delete prod1 1Are you sure you want to remove snap lock 1 from s13590? (If not]):
Note that you cannot create a writable snapshot for an existing directory. A new directory path must be specified in the CLI syntax; otherwise the command will fail with the following error:
# create isi snapshot writable prod1 /ifs/test/wsnap1mkdir /ifs/test/wsnap1 failed: file exists
If an unsupported path is specified, the following error is returned:
# isi snapshot writable create prod1 /isf/test/wsnap2Error in campo(s): dst_pathField: dst_path tiene error: El valor: /isf/test/wsnap2 no coincidence with the expression regular: ^/ifs$|^/ifs/ Entrada Verification failed.
A writable snapshot also cannot be created from a source snapshot of the /ifs root directory and fails with the following error:
# isi snapshot writable create s1476 /ifs/test/ifs-wsnapCannot create writable snapshot from /ifs snapshot: unsupported operation
Note that OneFS 9.3 currently does not support the creation of scheduled or automated writable snapshots.
When it is time to delete a writable snapshot, OneFS uses the TreeDelete job of the background job engine to unlink all content. Therefore, running the isi snapshots writable delete CLI command automatically enqueues an instance of TreeDelete, which runs the job engine asynchronously to delete and clean up the namespace and content of a writable snapshot. Note, however, that the execution of the TreeDelete job, and therefore the deletion of data, does not happen immediately. Instead, writable snapshot files and directories are moved to a temporary *.deleted directory. For example:
# isi snap writable create prod1 /ifs/test/wsnap2# isi snap writable delete /ifs/test/wsnap2¿Estás seguro? (sí/[no]): sí# ls /ifs/testprod wsnap2.51dc245eb.deletedwsnap1
This temporary directory is then removed in a non-synchronous operation. If the TreeDelete job fails for any reason, the writable snapshot can be deleted using its renamed path. For example:
# delete isi snap escribible /ifs/test/wsnap2.51dc245eb.deleted
Deleting a writable snapshot unlocks the underlying read-only snapshot, allowing it to be deleted as well if desired, as long as there are no other active writable snapshots based on that read-only snapshot.
Also, deleting writable snapshots in OneFS 9.3 is a purely manual process. There are currently no provisions for automated policy-based control, such as the ability to set a write expiration date for snapshots or a mechanism to delete snapshots in bulk.
It is recommended that you put all client sessions in a writable snapshot before deleting them. Because the backup snapshot can no longer be trusted once its lock is released during the delete operation, ongoing I/O might fail when the writable snapshot is deleted.
In the next article in this series, we'll discuss writable snapshot performance and monitoring.