22 April 20247 minute read

China AI: Trailblazers in GenAI Standards in Asia

Taking another step forward in China’s AI journey, the Chinese regulators have recently announced standards for generative artificial intelligence (GenAI) services. In the past few months, in addition to releasing official standards, a number of standards have also been published for public consultation. We walk through below the key standards released and the high-level takeaways businesses need to bear in mind.

 

Basic Security Requirements for Generative Artificial Intelligence Service (GenAI Standards)

Issued in March this year, the GenAI Standards provide much needed clarity and guidance to complement the Interim Measures for the Management of GenAI Services (GenAI Measures) released last year. Do note that these GenAI Standards are currently only applicable to GenAI service providers that are subject to the GenAI Measures i.e. companies providing GenAI services to the public.

The key takeaways that businesses need to take into account for the GenAI Standards are as follows:

  • Security Assessment: Companies can conduct their own security assessment, or they can appoint a third-party assessment agency to do so, in accordance with the requirements of the GenAI Standards. The completed assessment report must be submitted to the authorities during the relevant record filing procedures for the GenAI service.
  • Content Regulation: The GenAI Standards prescribe 5 categories consisting of 31 types of content security risks deemed illegal or unhealthy. These security risks form one of the cornerstones in the series of China’s GenAI security standards as they are repeated in each of the subsequent security standards.

Broadly, the 5 categories cover content that: (1) is against socialist core values (e.g. endangers national security, promotes terrorism etc.); (2) is discriminatory (e.g. based on age, belief, ethnicity, gender etc.); (3) violates commercial rights of others (e.g. intellectual property (IP) rights, business ethics, unfair competition etc.); (4) violates legitimate rights and interests of others (e.g. physical/mental health, portrait rights, privacy rights etc.); or (5) is unable to meet security demands for specific service types (e.g. autonomous control, medical information services, critical information infrastructure etc.).

  • Data Security Requirements: The following data security requirements must also be considered:
    • Data sources: Training data sources must be diversified (i.e. from multiple sources, in different formats and a reasonable mix of domestic and foreign sources). Data sources that contain more than 5% of illegal and unhealthy information must not be collected.
    • Data content: There must be proper measures in place to respect the IP and data privacy rights of others. Service providers must employ content filtering methods (e.g. keywords, classification models, manual sampling etc.) to filter out illegal and unhealthy information.
    • Data annotation: There must be proper data annotation mechanisms and rules to ensure the authenticity, reliability, legality and accuracy of content generated. More on this below.
  • Third Party Models: For companies looking to leverage third party foundation models to provide GenAI services to the public in China, only foundation models that are registered with the authorities can be used.

While using registered foundation models that have undergone security assessment would reduce the risk associated with data security and content regulation to an extent, such foreign companies will remain primarily liable for the generated content and should therefore have proper vendor due diligence processes and contractual safeguards in place. The GenAI Standards also require such companies to continue employing technical measures to ensure the accuracy and reliability of the generated content.

 

Draft Standards for Security Specifications on GenAI Data Annotation (GenAI Annotation Draft Standards)

With the advent of GenAI, there has been a global increase in companies offering data labelling and annotation services to meet demands. Data annotation here refers to the process of labelling or annotating data (including training data) for proper data classification and to facilitate machine learning. This is not to be confused with the requirement to “label” AI generated content by affixing labels to the output.

In early April this year, the National Data Administration of China has reportedly announced that it is looking into setting up national-level data annotation bases to promote the development of the AI industry. Shortly after, the GenAI Annotation Draft Standards were published for public consultation as one of the world’s first set of standardised regulatory data annotation rules for GenAI.

Briefly, the GenAI Annotation Draft Standards set out provisions on:

  • the basic security requirements for data annotation (e.g. data transmission security, access controls and backups, assessment of annotation tools and platforms etc.);
  • requirements on setting data annotation rules (e.g. functional data annotation, security data annotation, fine-tuning data annotation, comparison data annotation (or more commonly known in the industry as Reinforced Learning with Human Feedback);
  • annotation personnel (e.g. training and assessments, selection and assignment of roles etc.); and
  • data annotation verification and testing.

The GenAI Annotation Draft Standards reproduce the 31 content security risks outlined in the GenAI Standards, and data annotation must be performed to ensure generated content does not violate any of these security risks. The GenAI Annotation Draft Standards also provide examples of how annotation should be done, as well as examples of what needs to be annotated for each type of data format (e.g. text, image, video, audio etc.).

 

Draft Standards for Security Specifications on GenAI Pre-training and Fine-tuning Data Processing Activities (GenAI Training Data Draft Standards)

Since the release of the GenAI Measures, a lot of attention has been centered on Article 7 which relates to an obligation on service providers to carry out pre-training, fine-tuning and other training data processing activities to meet the prescribed requirements. In an effort to provide clarity and guidance in this aspect, the GenAI Training Data Draft Standards were published for public consultation at the same time as the GenAI Annotation Draft Standards.

In summary, the GenAI Training Data Draft Standards set out requirements for each of pre-training and fine-tuning activities (e.g. requirements on data collection, data pre-processing and data usage), as well as evaluation methodologies for each stage. Once again, the GenAI Training Data Draft Standards also reproduce the 31 content security risks outlined in the GenAI Standards. More particularly, service providers are required to have in place content filtering methods to filter out illegal or unhealthy information from the training data.

Both Draft Standards above are open for public consultation which will end in 2 June 2024.

 

Conclusion

The pace and detail in which the GenAI standards above have been developed and released demonstrates China’s intention and capability to become a trailblazer in paving the path for AI standards and regulations in Asia, aligning with China’s goal to become a global leader and powerhouse in AI.

The publication of the standards above is indeed a welcomed move, as they provide much needed clarity and guidance for the industry given how fast the AI regulations were published in the past year or so. We expect that there are more standards in the works which we will see gradually being published down the line as the regulators continue to work with industry leaders to provide a standardised and uniform approach to AI subject matters. These standards also lay the supporting foundations for the eventual omnibus AI legislation which China is expected to release some time this year.

To find out more on AI and AI laws and regulations, visit DLA Piper’s Focus on Artificial Intelligence page and Technology’s Legal Edge blog. If your organisation is deploying AI solutions, you can undertake a free maturity risk assessment using our AI Scorebox tool.

For more information on how DLA Piper can support your AI Transformation strategy, please reach out to Lauren Hurcombe, Hwee Yong Neo or your usual DLA Piper contact.