towards the "15th Five-Year Plan": building advanced data capacity and empowering the new Future of Digital Economy-OceanClub technical community.

towards the "15th Five-Year Plan": building advanced data capacity and empowering the new Future of Digital Economy

束文琦 2026-03-05 10:42 published in China

[Abstract]] today, when the digital economy has become the core growth engine, the strategic position of data as a new production factor is beyond doubt. As the cornerstone of the release of the value of data elements, the data storage industry is also facing unprecedented development opportunities and reform challenges. The "15th Five-Year Plan" proposal emphasizes "building an open and shared security national integrated data market, deepening the development and utilization of data resources", "accelerating the innovation of digital and intelligent technologies such as artificial intelligence, strengthening computing power, efficient supply of algorithms, data, etc., as well as" promoting the construction of national strategic hinterland and the backup of key industries, strengthening the network, data, artificial intelligence, biology, ecology, nuclear, space, deep sea, polar, key directions such as national security capacity building in emerging fields such as low altitude all put forward higher requirements on the performance, scalability, security and intelligence of data storage. Looking ahead, the construction of advanced data storage will surely become the key embodiment of national competitiveness and an important battlefield for a new round of international science and technology supply chain competition.

I. Outlook on the development trend of advanced data capacity during the 15th Five-Year Plan period

in recent years, "data storage" has been upgraded from a single concept of infrastructure components to a core digital capability that pays equal attention to "computing power" and "transportation power. Its development will present the following four trends:

from centralization to global integration: separation of storage and computing and collaborative evolution of edge storage

deepening the separation of storage and computing in cloud data centers: to cope with the explosive growth of large amounts of unstructured data (such as videos, logs, and IoT data), the traditional tightly coupled computing architecture encounters bottlenecks in scalability and cost. During the 15th Five-Year Plan period, the storage and computing separation architecture, represented by object storage and capable of flexible communication of multiple protocols, will become standard in cloud data centers, realizing independent and flexible expansion of storage resources, significantly reduce the total cost of ownership, continuously produce high-quality data, and prepare for the arrival of the era of "artificial intelligence.
With the rapid development of edge storage, data is generated on the edge side and needs to be processed in real time in scenarios such as industrial internet, autonomous driving, and smart cities. As the "data first site", Edge Storage will assume the important responsibilities of data caching, preprocessing, low-latency access, and security compliance, A cloud-edge-end unified data View is created in collaboration with cloud storage to meet business requirements for real-time, bandwidth, and privacy.

From "clear cold and hot" to "intelligent layering": automatic data lifecycle management

the value of data varies with time, frequency of access, and scenarios. During the 15th Five-Year Plan period, AI-based intelligent data management will become the core of advanced power. The storage system can automatically learn the data access mode to achieve fast data storage (high-performance all-flash memory), Performance Storage (such as cost-effective SSD-based all-flash memory), seamless and automatic migration between Capacity Storage (such as storage arrays with mechanical hard disks as media and tape libraries with tapes as media). This will change the hierarchical management of "hot temperature and cold ice" data from static strategy to dynamic intelligence, ensuring performance and optimizing storage costs.

From Data security to Data Resilience: building a trusted and highly resilient storage system

for data storage, security has expanded from attack prevention and leakage prevention to business continuity and data persistence. This is reflected in:

1) blackmail software protection becomes a new requirement: immutable snapshots and air isolation technologies of "one write, multiple reads" are adopted to ensure that core data copies are not tampered, is the key to effective data recovery. At the same time, through cross-device linkage response, the active memory system detects threats, backs up the system linkage response and starts protection to improve from passive recovery to active protection.

2) Full Stack trust and privacy computing: Full Stack trust computing from hardware firmware, operating system to software stack, and privacy computing technologies that are closely integrated with storage (such as secure data access in Federated learning) will ensure the confidentiality and integrity of data during storage and use.

3) cross-region active and disaster recovery: to support the "national integrated data market" and the "never downtime" of key businesses, cross-data centers, cross-region storage Multi-Active and asynchronous disaster recovery technologies will be deployed on a large scale, and will be expanded from traditional core business data to emerging business data represented by object storage.

From "high energy consumption" to "green and low carbon": Sustainable development becomes the core indicator

with the integration of data centers into dual-control management of energy consumption, during the "15th Five-Year Plan" period, green and low-carbon data storage is the only way. Trends include:

1) flash: replacing traditional mechanical hard disk arrays with all-flash arrays with lower Power consumption, higher performance, higher capacity density and lower Power consumption density has become a way to reduce PUE(Power Usage Effectiveness, direct means of power efficiency).

2) Software definition energy saving: the storage system automatically enters the energy saving mode during off-peak hours of business through intelligent power consumption management, and dynamically adjusts the balance between performance and power consumption.

II. Analysis on the development direction of data storage industry during the 15th Five-Year Plan period

facing the above trends, the data storage industry itself will also undergo profound changes and move towards a higher quality development path.

Technological breakthrough: collaborative innovation of software and hardware to tackle the problem of "sticking neck"

hardware layer: continuously tackle core media technologies such as enterprise-level SSD master control chips and 3D NAND flash memory, coordinate high-density and large-capacity innovation, carry more data per unit space and per unit energy consumption, and break foreign monopoly. At the same time, actively explore the research and development and application of next-generation media such as SCM, DNA storage and optical storage.

Software layer: vigorously develop core software such as distributed storage software (Global File System, unified data space management), storage operating system, and intelligent data management engine to realize "hardware resource pooling, software-defined functions".

Architecture layer: promote the wide application of DPU(Data Processing Unit)/IPU(Infrastructure Processing Unit) in storage systems, uninstall the storage control panel task to the dedicated processor to release the CPU computing power and improve the overall system efficiency.

Industrial Ecology: From "product delivery" to "service operation"

Storage as a service is very popular: enterprises will prefer to obtain storage capabilities through subscription mode. Domestic storage manufacturers need to accelerate the transformation to service providers, providing local deployment to public cloud, storage as a Service.

Deep integration with computing power and applications: the storage industry cannot develop in isolation. It must be deeply integrated with computing power platforms and application ecosystems such as AI, big data, and cloud native.

Market Pattern: localization substitution and globalization competition coexist

1) accelerate the development of storage industry: the "15th Five-Year Plan" proposal firmly puts forward: "strengthen the government's procurement of independent innovative products". Under the guidance of the "independent and controllable" national strategy, the "15th Five-Year Plan" will be a domestic storage brand in finance, telecommunications, key industries such as energy have achieved a crucial period from "available" to "easy to use. The storage industry chain will further focus on new energy storage, chip manufacturing, data storage technology and other directions to accelerate the formation of a complete ecosystem covering design, manufacturing, and application. The industry-wide industry digitization process led by new quality productivity will bring about an explosive golden growth period for storage efficiency and security requirements.

2) promotion of global competitiveness: Chinese storage enterprises with technical strength will take part in the global market competition more deeply by virtue of their advantages in ultra-large-scale data centers, cost control and agile innovation, from "follow-up running" to "parallel running" or even "leading".

3. Welcome the era of "artificial intelligence": new paradigm and new journey for data storage

the "15th Five-Year Plan" proposal clearly proposes to fully implement the "artificial intelligence" action, seize the commanding heights of artificial intelligence industry application, and fully empower thousands of industries. AI will move from "model room" to "commercial housing" and become an inclusive production tool. Under this strategic guidance, data, as the "fuel" of AI, will undergo fundamental changes in its storage, management and application methods, new systematic requirements for data storage from "quantitative change" to "qualitative change" are put forward.

1. The core and new requirements of "artificial intelligence" for data storage

1) from "data Lake" to "data factory": the ultimate requirement for data preprocessing and throughput

challenges: AI model training relies on high quality and regular annotation data. However, industry raw data are mostly unstructured, multimodal (text, image, video, sensor data) and full of noise. The traditional "data Lake" can only accumulate data and cannot meet the "out-of-the-box" requirements of AI ".

New requirements: the data storage system needs to be more closely integrated with the computing framework to become an efficient data preprocessing factory. It must support high-speed access and high-concurrency access to large amounts of small files (such as images and PDF files), and provide extremely high data throughput bandwidth for data cleansing, labeling, and, conversion and other ETL processes (Extraction Transformation Loading, data Extraction from the source, conversion, Loading to the destination) do not form bottlenecks. This means that the storage system needs to have strong metadata management capabilities and near-computing processing capabilities.

2) "AI Workflow" driver: multi-dimensional adaptation of performance and protocol

challenges: A complete "artificial intelligence" project includes multiple stages, such as data preparation, model training, model reasoning, and A/B testing. Each stage has different storage performance and protocol requirements.

Training Phase: high sequential read/write bandwidth and high IOPS (read/write times per second) are required to accelerate the reading of massive datasets. High-performance parallel file systems or object storage are usually required.

Inference phase: high bandwidth and stable low latency are required to handle massive random read requests from online services.

New requirements: data storage capacity must be scenario-based. In the same storage resource pool, it provides the most appropriate performance and access protocols (files, objects, and HDFS) for different links of AI workflows. To achieve seamless data flow and avoid efficiency and cost problems caused by data migration.

3) "thousands of models and thousands of states" and agile innovation: a test of storage elasticity and sharing capability

challenges:"Artificial Intelligence" means that massive and fragmented AI application scenarios will emerge in all walks of life, creating a "thousand models and thousands of states" pattern of "big model scenarios and small models in the industry. The R & D team needs frequent data access, model testing, and iteration.

New requirements: the storage infrastructure must provide the ultimate scalability and multi-tenant data sharing capabilities. Researchers can apply for storage space and performance as needed anytime and anywhere like cloud computing resources, and can safely and efficiently share datasets and model files within a team, this greatly accelerates the innovation cycle of AI applications.

4) AI native storage: intelligent management and data lifecycle reconstruction

challenges: AI workflows generate a large number of intermediate results (such as model checkpoints and training logs) and multiple versions of models and datasets. These data have different value densities, but are complex to manage and occupy a large amount of storage space.

New requirements: the storage system needs to have the "AI native" intelligent data management function and provide a new AI data platform capability. For example:

High-quality data collection and processing: the system needs to have unified visualization and mobility of cross-domain global data, provide data catalog, data lineage, data tags, and other capabilities, and be able to perform efficient vector and scalar retrieval, complete end-to-end data processing tool chain for AI model-oriented high-quality data processing.
Automated Data Value Chain Management: The system can automatically identify and classify training data, model files, inference results, and implement corresponding lifecycle strategies.
Deep integration with the MLOps platform: by connecting with the ML operation and maintenance platform (MLOps), the data lineage of each experiment is automatically stored and managed to ensure the reproducibility of the model.
CheckPoint optimization: Provides quick snapshot and recovery capabilities for CheckPoint operations during model training to reduce time loss caused by training interruptions.
Data processing capability: traditional "data" can be converted into "knowledge" that AI can read, providing highly accurate knowledge generation and retrieval capabilities.
Relying on the memory database, it can record the historical work data of the intelligent body, accumulate the situational memory and process experience in the interaction process of the intelligent body, and support memory extraction and recall.
Inference acceleration based on KV Cache: By persisting KV Cache data, the key capabilities of avoiding repeated computing, simplifying sequence length, intelligent suffix Association, and supporting resumable training are realized, it helps enterprises to infer big AI models, such as "push fast", "push fast" and "push fast". At the same time, balance storage and computing, ensure high throughput and low latency of external storage, and avoid becoming a performance bottleneck; Do a good job in dynamic scheduling, intelligently manage cache layers, and avoid mixing hot and cold data.

2. New opportunities and directions for industrial development

facing the new requirements of "artificial intelligence", the data storage industry must accelerate its evolution in technology, products and business models.

1) vigorously develop AI storage solutions: the industry will shift from providing general-purpose storage devices to launching vertical and integrated solutions for AI scenarios. For example, "AI training all-in-one machine", "automatic driving data platform", "biological information gene storage scheme" and so on are introduced to deeply optimize and integrate storage, computing, network and management software, out-of-the-box, lowering the technical threshold for enterprises to deploy "artificial intelligence.

2) build an integrated scheduling platform of "storage power-computing power": the competition in the future is no longer simply the competition of storage or computing power, but the competition of overall efficiency. Storage vendors need to work with computing power platforms and cloud service providers to promote unified orchestration and collaborative scheduling of computing resources. Users can apply for matching GPU computing power and data storage power at the same time through an entry to achieve optimal resource configuration.

3) strengthen data security and compliance capabilities: "Artificial Intelligence" goes deep into the industry and inevitably involves a large amount of sensitive data (such as medical, financial and industrial data). The storage industry must take data security, privacy protection (such as storage support in Federal learning) and industry compliance as its core competitiveness to provide security protection throughout the data lifecycle.

4) promote standards and ecological construction: actively participate in and lead the formulation of AI data storage related standards, promote storage systems and mainstream AI frameworks (such as TensorFlow and PyTorch), the pre-integration and certification of MLOps tool chain will build a prosperous "artificial intelligence storage" application ecosystem and a rich, diversified and aggregated industrial ecosystem.

IV. Conclusion

the 15th Five-Year Plan period is the key five years for China to move from a "data power" to a "data power. As the solid base of digital economy, the development level of advanced data is directly related to the cultivation of new quality productivity and the construction process of Digital China.

The full implementation of the "artificial intelligence" action has pushed data storage from behind the scenes to the front desk, transforming it from passive basic resources to a core engine that actively empowers AI innovation. To meet this change, we are required to reconstruct the data storage system with an "AI-native" thinking to make it smarter, more flexible, more integrated and safer. During the "15th Five-Year Plan" period, whoever can lead the first to build the advanced data power supporting "artificial intelligence" will occupy the commanding height in the magnificent digital wave.

We should firmly grasp the trend of technology, promote the industry to evolve in the direction of integration, intelligence, reliability and green, and build a solid foundation for the development of national digital economy through continuous scientific and technological innovation and ecological co-construction, provide inexhaustible impetus for building a secure, prosperous and inclusive digital future.

* This article is included in the 3rd issue of the user special issue of "words and numbers"

Replies（）

Sort By

Time

束文琦

Articles

Followers

Following

 Follow

Recommended