US Government Accountability Office GAO


AI is a transformative technology with applications in medicine, agriculture, manufacturing, transportation, defense, and many other areas. It also holds substantial promise for improving government operations. Federal guidance has focused on ensuring AI is responsible, equitable, traceable, reliable, and governable. Third-party assessments and audits are important to achieving these goals. However, AI systems pose unique challenges to such oversight because their inputs and operations are not always visible.

DATA: bias can be amplified

Bias is not specific to AI, but the use of AI has the potential to amplify existing biases and concerns related to civil liberties, ethics, and social disparities. Biases arise from the fact that AI systems are created using data that may reflect preexisting biases or social inequities. The U.S. government, industry leaders, professional associations, and others have begun to develop principles and frameworks to address these concerns, but there is limited information on how these will be implemented to allow for third-party assessments and audits of AI systems.

Unintended Consequences of Predictive Policing Technology Local law enforcement agencies are using predictive policing software to identify likely targets for police intervention. The intended benefits are to prevent crime in specific areas and improve resource allocation of law enforcement. In one study, researchers demonstrated that the tool disproportionately identifies low-income or minority communities as targets for police intervention regardless of the true crime rates. Applying a predictive policing algorithm to a police database, the researchers found that the algorithm behaves as intended. However, if the machine learning algorithm was trained on crime data that are not representative of all crimes that occur, it learns and reproduces patterns of systemic biases. According to the study, the systemic biases can be perpetuated and amplified as police departments use biased predictions to make tactical policing decisions.


Human-out-of-the-loop refers to the lack of human supervision of the execution of decisions, as in the AI system “has full control without the option of human override.” The GAO report Artificial Intelligence: Emerging Opportunities, Challenges, and Implications provided an example of an AI-enabled cybersecurity system that can find and patch system vulnerabilities without human intervention. Mayhem, the winning system in the Defense Advanced Research Projects Agency (DARPA) 2016 Cyber Grand Challenge, is designed to protect apps (software) from new attacks by hackers. Mayhem works by hardening applications and simultaneously and continuously looking for new bugs that may be exploited by hackers. When the system finds new bugs, it autonomously produces code to protect the software vulnerability. Mayhem is an expert system that performs prescriptive analytics, where machines detect and interact without human intervention. This is in contrast to traditional signature-based intrusion detection systems, which rely on human intervention in anticipating cybersecurity attacks.

Managers should establish and maintain an environment throughout the entity that sets a positive attitude toward internal controls. The Federal Internal Control Standards note that “the oversight body and management set the tone at the top and throughout the organization by their example.”25 Similarly, forum participants highlighted the need for entities to establish governance structures for AI systems that incorporate organizational values, consider risks, assign clear roles and responsibilities, and involve multidisciplinary stakeholders. To help entities establish governance structures and processes, as well as mitigate risks of implementing AI systems, GAO identified six key practices.

MONITORING of changing data and models

According to the 2021 Final Report of the National Security Commission on Artificial Intelligence, “agencies should institute specific oversight and enforcement practices, including…a mechanism that would allow thorough review of the most sensitive/high-risk AI systems to ensure auditability and compliance with responsible use and fielding requirements…” National Security Commission on Artificial Intelligence, Final Report, (Washington, D.C.: Mar. 1 2021). In addition, according to one forum participant, entities should consider mitigating risks by limiting the scope of the AI system when there is not sufficient confidence that the stated goals and objectives can be achieved.

Entities should regularly consider the utility of the AI system to ensure that it is still useful. For example, as one forum participant noted, an AI system trained on traffic patterns in 2019 might not be useful in 2020 because of reduced traffic during the COVID-19 pandemic. In assessing utility, entities should also consider the extent to which the AI system is still needed to address the goals and objectives. In addition, changing laws, operational environments, resource levels, or risks could affect the utility of the AI system compared to other alternatives. Therefore, entities should also consider metrics for determining when to retire the system and the process for doing so.


These disclosures should also take into account privacy issues, whether sensitive law enforcement and personally identifiable information is involved, national security issues, and concerns related to other kinds of protected information. In addition, forum participants noted the importance of promoting transparency and explainability, while also protecting individual privacy and the developer’s intellectual property rights.

According to the nonprofit organization Partnership for AI, “transparency requires that the goals, origins, and form of a system be made clear and explicit to users, practitioners, and other impacted stakeholders seeking to understand the scope and limits of its appropriate use. One simple and accessible approach to increasing transparency in [machine learning] lifecycles is through an improvement in both internal and external documentation. This documentation process begins in the machine learning system design and set up stage, including system framing and high-level objective design. This involves contextualizing the motivation for system development and articulating the goals of the system in which this system is deployed, as well as providing a clear statement of team priorities and objectives throughout the system design process.”

Too generic standards and frameworks

(…) there are well-established frameworks and standards—such as the National Institute of Standards and Technology’s (NIST) cybersecurity framework and the International Organization for Standardization (ISO) data privacy management standards—which could be applied to audits until AI-specific standards are developed and adopted. Forum participants noted that these frameworks and standards address overlapping concepts in managing data, governance, and security which will be relevant in assessing AI systems.

However, according to participants, existing frameworks and standards may not provide sufficient detail on assessing social and ethical issues which may arise from the use of AI systems. Similarly, according to NIST, “while there is broad agreement that societal and ethical issues, governance, and privacy must factor into AI standards, it is not clear how that should be done and whether there is yet sufficient scientific and technical basis to develop those standards provisions.”7 Based on our review of literature, while several entities (e.g. the Organisation for Economic Co-operation and Development, the European Commission, the U.S. Department of Defense) have adopted or drafted high-level principles for implementing trustworthy and equitable AI, many entities— specifically those in industry and government—are still developing standards for these areas.

Building expertise

Participants discussed several strategies to mitigate challenges in using or adopting AI systems in the public sector. For example, one way to mitigate the challenges with a lack of expertise is to develop in-house expertise or partnerships with experts, participants noted—not just technical experts, but people who can acquire both subject matter knowledge and competence with AI systems. According to forum participants, entities within the public sector should focus on building expertise—including capacity of program managers and in the auditing community. One participant noted that a number of federal agencies have used mechanisms, such as the Intergovernmental Personnel Act’s Mobility Program and collaboration agreements with academic institutions to provide additional topic-specific expertise.


GAO’s objective was to identify key practices to help ensure accountability and responsible AI use by federal agencies and other entities involved in the design, development, deployment, and continuous monitoring of AI systems. To develop this framework, GAO convened a Comptroller General Forum with AI experts from across the federal government, industry, and nonprofit sectors. It also conducted an extensive literature review and obtained independent validation of key practices from program officials and subject matter experts. In addition, GAO interviewed AI subject matter experts representing industry, state audit associations, nonprofit entities, and other organizations, as well as officials from federal agencies and Offices of Inspector General.

The items above were selected and named by the e-Government Subgroup of the EUROSAI IT Working Group on the basis of publicly available report of the author Supreme Audit Institutions (SAI). In the same way, the Subgroup prepared the analytical assumptions and headings. All readers are encouraged to consult the original texts by the author SAIs (linked).