value reconstruction of the storage center: the practice of "integration of storage and use" in the framework of "separation of storage and calculation"
社区小助理  2026-03-17 11:44  发布于中国

Chen Zhiyuan | senior researcher, dataibao data research institute

-This article is included in the number of words· issue 3

[Abstract]] the data center architecture has derived two major technical routes: integration of storage and computing and separation of storage and computing. In comparison, the Computing separation architecture is more suitable for cloud and AI new business scenarios, such as resource scheduling efficiency, reliability, and O & M maturity, have become the mainstream solution. However, due to the long-standing problems in the data industry, such as weak data governance and difficulty in compliance with regulations, it is also difficult for the storage-Computing separation architecture to maximize the efficiency of its infrastructure such as software and hardware, and fully release the data value.

In order to meet this challenge, this paper proposes a data element transformation path-"trusted data circulation center scheme" aiming at the storage center under the structure of separation of storage and computing ". This solution integrates Huawei's accumulation of full stack technology research and development and ecological collaboration in the storage center, as well as data Treasure's capabilities and resources in fields such as data governance and circulation, it aims to maximize the capability of "gathering, managing, using and circulating" of the storage center, making it a data operation hub of "integrating storage and use. This can not only provide the computing power center with high-quality, secure and compliant data, but also release data value in various common and vertical scenarios, so as to improve the current situation that the data industry "can survive but cannot use well. Research shows that this scheme can improve data processing efficiency 70%, data product development cycle shortened 50% and improve the quality and efficiency of data development and utilization and compliance circulation.

I, the paradigm transformation of data value circulation in the center of storage power

at present, the construction of data infrastructure is facing the structural contradiction between "scale expansion" and "inefficient value extraction. Data display, 2024 annual output of national data reached 41.06 ZB, while the total amount of storage is only 2.09 ZB, data retention rate 2023 year 2.89% further reduction 2.80% [1]. According to other data, as 2025 year 6 at the end of the month, China's capacity reached 1680 EB, but there are obvious shortcomings in the process of data value transformation [2]. There are more data showing that entering AI in the era, the domestic high-quality data reserves are low, and the scale of Chinese data is small. The international mainstream big model data sets are mainly in English, popular Common Crawl the Chinese data of the dataset project only occupies 4.8% [3]. There are three deep-seated problems behind these phenomena: first, a large amount of data is "naturally lost" because it cannot be effectively retained, and data resource still faces the problem of insufficient storage capacity; Second, data Resources are facing governance difficulties, and the absence of governance leads to low quality of raw data, which restricts subsequent applications. Thirdly, data circulation is not open enough, and public data acquisition channels are not smooth, the industry still has the risk of data compliance circulation.

The trend shows that the storage center is promoting the upgrade from a "storage container" to an "storage-and-use" operation center to solve the contradictions and problems existing in the above-mentioned data infrastructure construction, and then improve the current situation of the data industry. The data shows that the advanced storage center uses the full flash storage base, cross-domain data weaving, storage endogenous security, AI data Lake storage and other key technologies to achieve "scale aggregation, efficient governance, safe supply, industrial use", to solve the problem of insufficient capacity, and continuously provide high-quality data resources for the computing power center. [8]. On the other hand, all parts of the country are accelerating the transformation from "saving" to "using. The central network information office has guided integration 27 datasets, total amount 2.7 TB chinese Internet Corpus Resource platform for large scale AI training provides basic resource support; Corpus trading zones set up by data exchanges such as Shanghai and Shenzhen promote the compliance circulation and value release of corpus resources through the marketization mechanism of compliant transactions, the operation closed loop of data aggregation, governance, circulation and application has been initially formed. [4][5].

Under the multiple trends of industrial demand, technology change and local practice, the evolution path of the power center is gradually clear-the core lies in building the full-link data factorization capability, to directly support data value circulation.

II, integration path of storage center and compliance data circulation under the architecture of separation of storage and computing

at present, the national policy level is actively guiding the collaborative construction of data infrastructure and compliance circulation system. The Guiding Opinions on Promoting high-quality development of data industry clearly states that " strengthen infrastructure support ", and " support enterprises to build a secure and reliable data infrastructure in combination with application scenarios ". At the same time, the national data infrastructure construction guidelines also pointed out that " exploring the construction of a new intelligent computing center by using the architecture of separation of storage and computing ", and strengthen " green development and organic synergy of multiple heterogeneous computing power ".

In the traditional architecture of storage and computing separation, the core development path of storage center is to decouple storage resources from computing resources to achieve independent elastic expansion of storage capacity and efficient data sharing. However, simple decoupling does not completely solve the problem of data " difficult circulation and application " therefore, the advanced storage center will further integrate data governance, compliance protection and safe circulation capabilities, from " it can be saved " xiang " well managed and useful " evolution. To put this idea into practice, the key lies in the construction of the energy center. " data preprocessing, compliant encapsulation, and trusted delivery ".

Proposed in this paper " trusted Data Circulation Center solution " the core path is built on the storage layer. " data governance and circulation middleware ". Specifically, the storage center needs to integrate data classification, privacy calculation, blockchain certificate storage and other modules to perform raw data " immediate cure " it is packaged in a standardized and clearly-qualified data product or data set. It can be directly registered and put on shelves on the data exchange or circulation platform, or directly connected to the computing power center for model training. At present, the main difficulty is that the resource level needs to break the gap between different storage systems. " data Island ", the technical level needs to solve the integration of multi-source heterogeneous data and the unification of cross-domain circulation standards, while the underlying software and hardware ecology needs frontier AI data lakes and tool chains support high storage capacity, computing capacity, and throughput to achieve trusted collaboration across regions, industries, and institutions.

III, technical architecture and practice breakthrough of trusted data circulation center

first of all, the solution integrates the three-level governance engine (standardization, quality improvement, commercialization) and ModelEngine AI the tool chain builds an intelligent governance platform for multi-source heterogeneous data. In actual operation, the joint solution improves the comprehensive efficiency of data processing. 70%, the data product development cycle has been reduced from monthly to weekly, significantly accelerating the supply of high-quality data elements.

1) the first-level standardized governance platform uses an intelligent governance engine in a trusted space to implement automatic labeling and cross-source alignment of multi-modal data, eliminating data redundancy and integrating multi-source heterogeneous data, and improve metadata tags to enhance data traceability, and finally achieve data structure, ordering, and standardization;

2) second-level quality improvement governance, the platform automatically completes missing fields based on ML models such as random forests and neural networks, and then uses the rule engine and isolated forests, clustering analysis and other algorithms identify abnormal data and start the repair mechanism. Through full-automatic error correction, leak repair, redundancy removal and compliance embedding processes, the system realizes the transformation from raw data to high quality, safe and reliable, conversion of compliant and controllable data resources;

3) third-level commercial governance, the platform uses privacy computing as data " desensitization and densification ", and perform scenario-based encapsulation to create " building blocks " flexible data products, after ensuring clear data ownership, full labeling, compliance, and automatic transactions, finally form a ready-to-use transaction, commercial application, capitalization, capitalized data products.

Secondly, the scheme adopts OceanStor Pacific the advanced all-flash data Lake architecture provides a high-performance, high-density green base for the storage and processing of massive data elements. This series of storage adopts the industry-leading highest-density design and can provide super 4PB/2U the ultimate capacity density. In the aspect of green energy saving, it reduces the power consumption of equipment 0.25W/TB [6], significantly reducing the operating costs of the storage center. This architecture supports EB level -1 Cluster expansion and intelligent data classification, and multi-protocol lossless communication capabilities, can simultaneously meet the needs of large-scale hybrid loads, to ensure automatic hierarchical storage of hot temperature data.

Third, in the data circulation process, through unified data space and trusted data space ( EDS) technology, built the cross-domain data lake resources unified scheduling capability, to achieve global data assets visible, manageable, available, and through the blockchain certificate and digital watermark technology to ensure the full data exchange credibility, controllable and verifiable. On this basis, data treasure has ISO 27001(Information security management system certification), ITSS 4 level (information technology service standards), DCMM 5 level (data management capability maturity assessment model), DSMM 5 level (Data Security Capability Maturity Model), CMMI 5 level -1 (Capability Maturity Model Integration) certification ensures the safe circulation of managed data among institutions. At the same time, data treasure is more involved in the construction. / it operates nine provincial data exchanges on its behalf, and has mature services of data asset registration, compliance assessment, value assessment, and listing transactions, the "last kilometer" of data circulation can be opened by professional ability ". This model combines the underlying circulation control capability of the storage center with the compliance transaction operation qualification of data treasure, forming a complete closed loop from data governance to capitalization, and then to multi-level market circulation.

IV, solution value and core technology

this joint solution has unique values:

on the one hand, data treasure is owned by very few people in the country. " big data asset transaction " for enterprises with license plates, the license marks that data treasure has passed the national compliance audit certification in the core links of data element determination, circulation, pricing, etc. It also gives data treasure the establishment of a government public data and industrial demand " compliance Center " the qualification. Dataibao has been deployed in many industries. 100 data capitalization cases, and through the construction / agent 9 the provincial level Stock Exchange has helped enterprises put more than 1,000 data intellectual property rights on the shelves.

In the cultural tourism industry, dataibao has already set up Super for Guizhou Wanfeng Forest Scenic Spot 50 the remaining data intellectual property rights have been registered and confirmed in Guiyang Big Data Exchange, and the core assets have been successfully leveraged. 5000 ten thousand yuan pledge financing. At the same time, according to the scenic spot operation data, create " consumer behavior analysis model "" passenger flow prediction model "" joint ticket linkage model ", accurately identify the demand of off-season customers, and ease " congestion in peak season and idle in off season " the problem of resource mismatch has increased the number of tourists in the off-season of the scenic spot year on year. 37%, merchant revenue growth 28%.

In the manufacturing industry, data treasure is the Xinjiang Production and Construction Corps of Xinjiang provincial and ministerial units -- xinjiang Tianye Group has implemented the data asset Project and completed the project of Xinjiang Tianye Group 6 A subsidiary, 27 jiasun company's data ownership confirmation registration includes Internet data, hazardous safety production platform data, and industrial intelligent control data. It not only optimizes the internal control of the enterprise to achieve predictive maintenance, but also empowers local accounting firms, law firms have effectively cultivated the local data asset inclusion ecosystem in Xinjiang.

 

In AI in the cutting-edge application field, dataibao has compiled many data asset links of fragmented enterprises into an industry data asset Network to jointly build a high-quality dataset Alliance and provide high-quality corpus for boosting. AI development, in the field of intelligent manufacturing, it has deeply served Xinjiang Tianye, Atlantic welding and other enterprises, built a special corpus for intelligent workshops, and provided core corpus empowerment for model training of smart and humanoid robots.

On the other hand, Cen uses the all-flash architecture as its core and adopts Flash-Native design, software and hardware collaboration, DTOE protocol uninstallation technology, Server NUMA acceleration and other technologies to achieve each TB data cluster performance from 108 GB/s upgrade 216 GB/s, fully release the full Flash memory potential. In terms of media, it is equipped 15.36 TB / 30.72 TB / 61.44 TB large Capacity SSD, through 2:1 compression ratio and 10 + compression operators implement each TB data reduction 88% space usage, exceeding the capacity limit. To DME (Omni-Dataverse) unified data space construction AI advanced Data infrastructure, build a multi-center data platform, achieve cross-region, cross-device data scheduling, efficient retrieval and trusted flow, and form technical barriers in hardware performance and software ecosystem. In the digital transformation project for the data center of the information department of China Unicom Group, the operation support platform based on the separation of storage and computing technology carries the internet analysis, IoT, logs, and other systems 15 PB data volume, to achieve Unicom provincial companies 2/3/4G xDR, network signaling, Internet logs, IoT for data access analysis, the amount of data imported per day exceeds 70 TB, significantly reduce procurement and O & M costs, total TCO reduce 30% [7], saving more than 10 million investment.

The trusted data circulation center solution is an ecological collaboration of hardware strength, software ecosystem, service capabilities, data resources, compliance qualifications, and business models. Compared with other solutions, this joint solution is outstanding in multiple dimensions:

V, conclusion

this article focuses on the long-standing data industry " disjointed storage " question, put forward " storage and use " as the core of the energy center value upgrade path. Research shows that by upgrading the storage center to a combination of high-performance storage, intelligent governance engines and endogenous compliance circulation capabilities " storage and use " data elements governance and circulation hub can effectively solve the difficulties of weak data governance, blocked circulation, and the lack of high-quality data resources available in the computing power center. Joint practice has verified the feasibility of this path. " trusted Data Circulation Center " the joint solution not only realizes 70% improved data processing efficiency and 50% the data product development cycle is shortened, and the computing power center provides security and compliance data, which proves that " storage and use " the feasibility of the model.

Looking forward to the future, " storage and use " the development of the energy center of the architecture is bound to move from the integration of technology and functions to the reconstruction of a deeper niche. The core trend lies in the construction of the energy center, A value Center centered on data asset operations. This means that the power center will no longer only support the computing power, but directly participate in it. AI independent value nodes for model training and intelligent application implementation. In order to achieve this goal, the optimization of the practice plan needs to focus on creating a measurable closed loop of economic value, promoting the transformation of data from the cost center to the profit center, so as to dynamically increase the value AI corpus as the target, forming from " high quality data supply " to " release scenario value ", and then " development of revenue feedback system " the closed loop. In the storage center " aggregation number, treatment number, usage number, circulation " after the full play of its capabilities, it has truly solved the problems faced by the data industry. " can be saved or not " high-quality data driven AI development.

 

* this article is included in the user special issue of "words and numbers" 3 period

references:

[1] china Information and Communication Research Institute . Advanced storage center research report ( 2025 year) [R].2025.

[2] 2025 china Computing Power Conference . 2025 capacity Development Report [R].2025-08-23.

[3] china Cyberspace Security Association . Chinese Internet Corpus Resource platform released [EB/OL].2025-01-10.

[4] shenzhen Municipal People's Government . Shenzhen city's action plan to accelerate building an artificial intelligence pioneer city ( 2025-2026 year) [Z].2025.

[5] shanghai Data Exchange . Shanghai Data Exchange Corpus trading zone operation rules [Z]. 2024.

[6] Huawei.(2025).OceanStor Pacific 9146 distributed Storage [ product Page ]. Huawei Enterprise Business . [ online ] get from : https://e.huawei.com/cn/products/storage/scale-out-storage/oceanstor-pacific-series/oceanstor-pacific-9146

[7] Huawei.(2025). Big data storage and computing separation solution [ solution page ]. Huawei Enterprise Business . [ online ] get from : https://e.huawei.com/cn/solutions/storage/scale-out-storage/decoupling-storage-and-compute

[8] zhang Li . Smooth data aggregation, supply and use of blocking point condensation to promote high-quality construction of data sets [EB/OL]. (2025-03-06) https://www.nda.gov.cn/sjj/zwgk/zjjd/0306/20250306143724097100325_pc.html.

 

全部回复(
回复
回复
发布帖子
帖子标题
行业分类
场景分类
帖子来源
发送语言版本
可切换语言,在您的个人中心检查译文是否正确
我要投稿
姓名
昵称
电话
邮箱
文章标题
行业
领域

投稿成功

感谢您的精彩投稿!✨我们的编辑团队正在快马加鞭审核中,请稍候~

如有任何修改建议,会第一时间与您联系沟通哒!

发布文章
文章标题
文章分类
发送语言版本
可切换语言,在您的个人中心检查译文是否正确