|
| 1 | +<!--- |
| 2 | + Licensed under the Apache License, Version 2.0 (the "License"); |
| 3 | + you may not use this file except in compliance with the License. |
| 4 | + You may obtain a copy of the License at |
| 5 | +
|
| 6 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 7 | +
|
| 8 | + Unless required by applicable law or agreed to in writing, software |
| 9 | + distributed under the License is distributed on an "AS IS" BASIS, |
| 10 | + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 11 | + See the License for the specific language governing permissions and |
| 12 | + limitations under the License. See accompanying LICENSE file. |
| 13 | +--> |
| 14 | + |
| 15 | +# Compatibility Benchmark over HCFS Implementations |
| 16 | + |
| 17 | +## <a name="Background"></a> Background |
| 18 | + |
| 19 | +Hadoop-Compatible File System (HCFS) is a core conception in big data storage ecosystem, |
| 20 | +providing unified interfaces and generally clear semantics, |
| 21 | +and has become the de-factor standard for industry storage systems to follow and conform with. |
| 22 | +There have been a series of HCFS implementations in Hadoop, |
| 23 | +such as S3AFileSystem for Amazon's S3 Object Store, |
| 24 | +WASB for Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object Storage, |
| 25 | +and more from storage service's providers on their own. |
| 26 | + |
| 27 | +## <a name="Problems"></a> Problems |
| 28 | + |
| 29 | +However, as indicated by [`HCFS Introduction`](hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md), |
| 30 | +there is no formal suite to do compatibility assessment of a file system for all such HCFS implementations. |
| 31 | +Thus, whether the functionality is well accomplished and meets the core compatible expectations |
| 32 | +mainly relies on service provider's own report. |
| 33 | +Meanwhile, Hadoop is also developing and new features are continuously contributing to HCFS interfaces |
| 34 | +for existing implementations to follow and update, in which case, |
| 35 | +Hadoop also needs a tool to quickly assess if these features are supported or not for a specific HCFS implementation. |
| 36 | +Besides, the known hadoop command line tool or hdfs shell is used to directly interact with a HCFS storage system, |
| 37 | +where most commands correspond to specific HCFS interfaces and work well. |
| 38 | +Still, there are cases that are complicated and may not work, like expunge command. |
| 39 | +To check such commands for an HCFS, we also need an approach to figure them out. |
| 40 | + |
| 41 | +## <a name="Proposal"></a> Proposal |
| 42 | + |
| 43 | +Accordingly, we propose to define a formal HCFS compatibility benchmark and provide corresponding tool |
| 44 | +to do the compatibility assessment for an HCFS storage system. |
| 45 | +The benchmark and tool should consider both HCFS interfaces and hdfs shell commands. |
| 46 | +Different scenarios require different kinds of compatibilities. |
| 47 | +For such consideration, we could define different suites in the benchmark. |
| 48 | + |
| 49 | +## <a name="Benefits"></a> Benefits |
| 50 | + |
| 51 | +We intend the benchmark and tool to be useful for both storage providers and storage users. |
| 52 | +For end users, it can be used to evalute the compatibility level and |
| 53 | +determine if the storage system in question is suitable for the required scenarios. |
| 54 | +For storage providers, it helps to quickly generate an objective and reliable report |
| 55 | +about core functioins of the storage service. |
| 56 | +As an instance, if the HCFS got a 100% on a suite named 'tpcds', |
| 57 | +it is demonstrated that all functions needed by a tpcds program have been well achieved. |
| 58 | +It is also a guide indicating how storage service abilities can map to HCFS interfaces, such as storage class on S3. |
0 commit comments