Episódios

  • Plus, Detecting Misbehavior in Reasoning Models.

    In this newsletter, we cover AI companies’ responses to the federal government's request for information on the development of an AI Action Plan. We also discuss an OpenAI paper on detecting misbehavior in reasoning models by monitoring their chains of thought.

    Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

    On January 23, President Trump signed an executive order giving his administration 180 days to develop an “AI Action Plan” to “enhance America's global AI dominance in order to promote human flourishing, economic competitiveness, and national security.”

    Despite the rhetoric of the order, the Trump administration has yet to articulate many policy positions with respect to AI development and safety. In a recent interview, Ben Buchanan—Biden's AI advisor—interpreted the executive order as giving the administration time to develop its AI policies. The AI Action Plan will therefore likely [...]

    ---

    First published:
    March 31st, 2025

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-50-ai-action

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • Plus, Detecting Misbehavior in Reasoning Models.

    In this newsletter, we cover AI companies’ responses to the federal government's request for information on the development of an AI Action Plan. We also discuss an OpenAI paper on detecting misbehavior in reasoning models by monitoring their chains of thought.

    Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

    On January 23, President Trump signed an executive order giving his administration 180 days to develop an “AI Action Plan” to “enhance America's global AI dominance in order to promote human flourishing, economic competitiveness, and national security.”

    Despite the rhetoric of the order, the Trump administration has yet to articulate many policy positions with respect to AI development and safety. In a recent interview, Ben Buchanan—Biden's AI advisor—interpreted the executive order as giving the administration time to develop its AI policies. The AI Action Plan will therefore likely [...]

    ---

    First published:
    March 31st, 2025

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-49-ai-action

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • Estão a faltar episódios?

    Clique aqui para atualizar o feed.

  • Plus, Measuring AI Honesty.

    Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required. In this newsletter, we discuss two recent papers: a policy paper on national security strategy, and a technical paper on measuring honesty in AI systems.

    Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

    Superintelligence Strategy

    CAIS director Dan Hendrycks, former Google CEO Eric Schmidt, and Scale AI CEO Alexandr Wang have authored a new paper, Superintelligence Strategy. The paper (and an in-depth expert version) argues that the development of superintelligence—AI systems that surpass humans in nearly every domain—is inescapably a matter of national security.

    In this story, we introduce the paper's three-pronged strategy for national security in the age of advanced AI: deterrence, nonproliferation, and competitiveness.

    Deterrence

    The simultaneous power and danger of superintelligence presents [...]

    ---

    Outline:

    (00:20) Superintelligence Strategy

    (01:09) Deterrence

    (02:41) Nonproliferation

    (04:04) Competitiveness

    (05:33) Measuring AI Honesty

    (09:24) Links

    ---

    First published:
    March 6th, 2025

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-49-superintelligence

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • Superintelligence is destabilizing since it threatens other states’ survival—it could be weaponized, or states may lose control of it. Attempts to build superintelligence may face threats by rival states—creating a deterrence regime called Mutual Assured AI Malfunction (MAIM). In this paper, Dan Hendrycks, Eric Schmidt, and Alexandr Wang detail a strategy—focused on deterrence, nonproliferation, and competitiveness—for nations to navigate the risks of superintelligence.

    ---

  • Superintelligence is destabilizing since it threatens other states’ survival—it could be weaponized, or states may lose control of it. Attempts to build superintelligence may face threats by rival states—creating a deterrence regime called Mutual Assured AI Malfunction (MAIM). In this paper, Dan Hendrycks, Eric Schmidt, and Alexandr Wang detail a strategy—focused on deterrence, nonproliferation, and competitiveness—for nations to navigate the risks of superintelligence.

    ---

  • Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required. Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

    In this newsletter, we explore two recent papers from CAIS. We’d also like to highlight that CAIS is hiring for editorial and writing roles, including for a new online platform for journalism and analysis regarding AI's impacts on national security, politics, and economics.

    Utility Engineering

    A common view is that large language models (LLMs) are highly capable but fundamentally passive tools, shaping their responses based on training data without intrinsic goals or values. However, a new paper from the Center for AI Safety challenges this assumption, showing that LLMs exhibit coherent and structured value systems.

    Structured preferences emerge with scale. The paper introduces Utility Engineering, a framework for analyzing and controlling AI [...]

    ---

    Outline:

    (00:26) Utility Engineering

    (04:48) EnigmaEval

    ---

    First published:
    February 18th, 2025

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-48-utility-engineering

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • Plus, State-Sponsored AI Cyberattacks.

    Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

    Reasoning Models

    DeepSeek-R1 has been one of the most significant model releases since ChatGPT. After its release, the DeepSeek's app quickly rose to the top of Apple's most downloaded chart and NVIDIA saw a 17% stock decline. In this story, we cover DeepSeek-R1, OpenAI's o3-mini and Deep Research, and the policy implications of reasoning models.

    DeepSeek-R1 is a frontier reasoning model. DeepSeek-R1 builds on the company's previous model, DeepSeek-V3, by adding reasoning capabilities through reinforcement learning training. R1 exhibits frontier-level capabilities in mathematics, coding, and scientific reasoning—comparable to OpenAI's o1. DeepSeek-R1 also scored 9.4% on Humanity's Last Exam—at the time of its release, the highest of any publicly available system.

    DeepSeek reports spending only about $6 million on the computing power needed to train V3—however, that number doesn’t include the full [...]

    ---

    Outline:

    (00:13) Reasoning Models

    (04:58) State-Sponsored AI Cyberattacks

    (06:51) Links

    ---

    First published:
    February 6th, 2025

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-47-reasoning

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • Plus, Humanity's Last Exam, and the AI Safety, Ethics, and Society Course.

    Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

    The Transition

    The transition from the Biden to Trump administrations saw a flurry of executive activity on AI policy, with Biden signing several last-minute executive orders and Trump revoking Biden's 2023 executive order on AI risk. In this story, we review the state of play.

    Trump signing first-day executive orders. Source.

    The AI Diffusion Framework. The final weeks of the Biden Administration saw three major actions related to AI policy. First, the Bureau of Industry and Security released its Framework for Artificial Intelligence Diffusion, which updates the US’ AI-related export controls. The rule establishes three tiers of countries 1) US allies, 2) most other countries, and 3) arms-embargoed countries.

    Companies headquartered in tier-1 countries can freely deploy AI chips in other [...]

    ---

    Outline:

    (00:16) The Transition

    (04:38) CAIS and Scale AI Introduce Humanitys Last Exam

    (08:03) AI Safety, Ethics, and Society Course

    (09:21) Links

    ---

    First published:
    January 23rd, 2025

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-46-the-transition

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • As 2024 draws to a close, we want to thank you for your continued support for AI safety and review what we’ve been able to accomplish. In this special-edition newsletter, we highlight some of our most important projects from the year.

    The mission of the Center for AI Safety is to reduce societal-scale risks from AI. We focus on three pillars of work: research, field-building, and advocacy.

    Research

    CAIS conducts both technical and conceptual research on AI safety. Here are some highlights from our research in 2024:

    Circuit Breakers. We published breakthrough research showing how circuit breakers can prevent AI models from behaving dangerously by interrupting crime-enabling outputs. In a jailbreaking competition with a prize pool of tens of thousands of dollars, it took twenty thousand attempts to jailbreak a model trained with circuit breakers. The paper was accepted to NeurIPS 2024.

    The WMDP Benchmark. We developed the Weapons [...]

    ---

    Outline:

    (00:34) Research

    (04:25) Advocacy

    (06:44) Field-Building

    (10:38) Looking Ahead

    ---

    First published:
    December 19th, 2024

    Source:
    https://newsletter.safe.ai/p/aisn-45-center-for-ai-safety-2024

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • Plus, Chinese researchers used Llama to create a military tool for the PLA, a Google AI system discovered a zero-day cybersecurity vulnerability, and Complex Systems.

    Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

    The Trump Circle on AI Safety

    The incoming Trump administration is likely to significantly alter the US government's approach to AI safety. For example, Trump is likely to immediately repeal Biden's Executive Order on AI.

    However, some of Trump's circle appear to take AI safety seriously. The most prominent AI safety advocate close to Trump is Elon Musk, who earlier this year supported SB 1047. However, he is not alone. Below, we’ve gathered some promising perspectives from other members of Trump's circle and incoming administration.

    Trump and Musk at UFC 309. Photo Source.

    Robert F. Kennedy Jr, Trump's pick for Secretary of Health and Human Services, said in [...]

    ---

    Outline:

    (00:24) The Trump Circle on AI Safety

    (02:41) Chinese Researchers Used Llama to Create a Military Tool for the PLA

    (04:14) A Google AI System Discovered a Zero-Day Cybersecurity Vulnerability

    (05:27) Complex Systems

    (08:54) Links

    ---

    First published:
    November 19th, 2024

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-44-the-trump

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • Plus, AI and Job Displacement, and AI Takes Over the Nobels.

    Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

    White House Issues First National Security Memo on AI

    On October 24, 2024, the White House issued the first National Security Memorandum (NSM) on Artificial Intelligence, accompanied by a Framework to Advance AI Governance and Risk Management in National Security.

    The NSM identifies AI leadership as a national security priority. The memorandum states that competitors have employed economic and technological espionage to steal U.S. AI technology. To maintain a U.S. advantage in AI, the memorandum directs the National Economic Council to assess the U.S.'s competitive position in:

    Semiconductor design and manufacturing

    Availability of computational resources

    Access to workers highly skilled in AI

    Capital availability for AI development

    The Intelligence Community must make gathering intelligence on competitors' operations against the [...]

    ---

    Outline:

    (00:18) White House Issues First National Security Memo on AI

    (03:22) AI and Job Displacement

    (09:13) AI Takes Over the Nobels

    ---

    First published:
    October 28th, 2024

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-43-white-house

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • Plus, “Circuit Breakers” for AI systems, and updates on China's AI industry.

    Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

    Supreme Court Decision Could Limit Federal Ability to Regulate AI

    In a recent decision, the Supreme Court overruled the 1984 precedent Chevron v. Natural Resources Defence Council. In this story, we discuss the decision's implications for regulating AI.

    Chevron allowed agencies to flexibly apply expertise when regulating. The “Chevron doctrine” had required courts to defer to a federal agency's interpretation of a statute in the case that that statute was ambiguous and the agency's interpretation was reasonable. Its elimination curtails federal agencies’ ability to regulate—including, as this article from LawAI explains, their ability to regulate AI.

    The Chevron doctrine expanded federal agencies’ ability to regulate in at least two ways. First, agencies could draw on their technical expertise to interpret ambiguous statutes [...]

    ---

    Outline:

    (00:16) Supreme Court Decision Could Limit Federal Ability to Regulate AI

    (02:18) “Circuit Breakers” for AI Systems

    (04:45) Updates on China's AI Industry

    (07:32) Links

    The original text contained 1 image which was described by AI.

    ---

    First published:
    July 9th, 2024

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-38-supreme-court

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

  • US Launches Antitrust Investigations

    The U.S. Government has launched antitrust investigations into Nvidia, OpenAI, and Microsoft. The U.S. Department of Justice (DOJ) and Federal Trade Commission (FTC) have agreed to investigate potential antitrust violations by the three companies, the New York Times reported. The DOJ will lead the investigation into Nvidia while the FTC will focus on OpenAI and Microsoft.

    Antitrust investigations are conducted by government agencies to determine whether companies are engaging in anticompetitive practices that may harm consumers and stifle competition.

    Nvidia investigated for GPU dominance. The New York Times reports that concerns have been raised about Nvidia's dominance in the GPU market, “including how the company's software locks [...]

    ---

    Outline:

    (00:10) US Launches Antitrust Investigations

    (02:58) Recent Criticisms of OpenAI and Anthropic

    (05:40) Situational Awareness

    (09:14) Links

    ---

    First published:
    June 18th, 2024

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-37-us-launches

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

  • Voluntary Commitments are Insufficient

    AI companies agree to RSPs in Seoul. Following the second AI Global Summit held in Seoul, the UK and Republic of Korea governments announced that 16 major technology organizations, including Amazon, Google, Meta, Microsoft, OpenAI, and xAI have agreed to a new set of Frontier AI Safety Commitments.

    Some commitments from the agreement include:

    Assessing risks posed by AI models and systems throughout the AI lifecycle.

    Setting thresholds for severe risks, defining when a model or system would pose intolerable risk if not adequately mitigated.

    Keeping risks within defined thresholds, such as by modifying system behaviors and implementing robust security controls.

    Potentially halting development or deployment if risks cannot be sufficiently mitigated.

    These commitments [...]

    ---

    Outline:

    (00:03) Voluntary Commitments are Insufficient

    (02:45) Senate AI Policy Roadmap

    (05:18) Chapter 1: Overview of Catastrophic Risks

    (07:56) Links

    ---

    First published:
    May 30th, 2024

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-35-voluntary

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

  • OpenAI and Google Announce New Multimodal Models

    In the current paradigm of AI development, there are long delays between the release of successive models. Progress is largely driven by increases in computing power, and training models with more computing power requires building large new data centers.

    More than a year after the release of GPT-4, OpenAI has yet to release GPT-4.5 or GPT-5, which would presumably be trained on 10x or 100x more compute than GPT-4, respectively. These models might be released over the next year or two, and could represent large spikes in AI capabilities.

    But OpenAI did announce a new model last week, called GPT-4o. The “o” stands for “omni,” referring to the fact that the model can use text, images, videos [...]

    ---

    Outline:

    (00:03) OpenAI and Google Announce New Multimodal Models

    (02:36) The Surge in AI Lobbying

    (05:29) How Should Copyright Law Apply to AI Training Data?

    (10:10) Links

    ---

    First published:
    May 16th, 2024

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-35-lobbying

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

  • Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

    AI Labs Fail to Uphold Safety Commitments to UK AI Safety Institute

    In November, leading AI labs committed to sharing their models before deployment to be tested by the UK AI Safety Institute. But reporting from Politico shows that these commitments have fallen through.

    OpenAI, Anthropic, and Meta have all failed to share their models with the UK AISI before deployment. Only Google DeepMind, headquartered in London, has given pre-deployment access to UK AISI.

    Anthropic released the most powerful publicly available language model, Claude 3, without any window for pre-release testing by the UK AISI. When asked for comment, Anthropic co-founder Jack Clark said, “Pre-deployment testing is a nice idea but very difficult to implement.”

    When asked about their concerns with pre-deployment testing [...]

    ---

    Outline:

    (00:03) AI Labs Fail to Uphold Safety Commitments to UK AI Safety Institute

    (02:17) New Bipartisan AI Policy Proposals in the US Senate

    (06:35) Military AI in Israel and the US

    (11:44) New Online Course on AI Safety from CAIS

    (12:38) Links

    ---

    First published:
    May 1st, 2024

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-34-new-military

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

  • Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

    This week, we cover:

    Consolidation in the corporate AI landscape, as smaller startups join forces with larger funders.

    Several countries have announced new investments in AI, including Singapore, Canada, and Saudi Arabia.

    Congress's budget for 2024 provides some but not all of the requested funding for AI policy. The White House's 2025 proposal makes more ambitious requests for AI funding.

    How will AI affect biological weapons risk? We reexamine this question in light of new experiments from RAND, OpenAI, and others.

    AI Startups Seek Support From Large Financial Backers

    As AI development demands ever-increasing compute resources, only well-resourced developers can compete at the frontier. In practice, this means that AI startups must either partner with the world's [...]

    ---

    Outline:

    (00:45) AI Startups Seek Support From Large Financial Backers

    (03:47) National AI Investments

    (05:16) Federal Spending on AI

    (08:35) An Updated Assessment of AI and Biorisk

    (15:35) $250K in Prizes: SafeBench Competition Announcement

    (16:08) Links

    ---

    First published:
    April 11th, 2024

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-33-reassessing

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

  • Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

    Measuring and Reducing Hazardous Knowledge

    The recent White House Executive Order on Artificial Intelligence highlights risks of LLMs in facilitating the development of bioweapons, chemical weapons, and cyberweapons.

    To help measure these dangerous capabilities, CAIS has partnered with Scale AI to create WMDP: the Weapons of Mass Destruction Proxy, an open source benchmark with more than 4,000 multiple choice questions that serve as proxies for hazardous knowledge across biology, chemistry, and cyber.

    This benchmark not only helps the world understand the relative dual-use capabilities of different LLMs, but it also creates a path forward for model builders to remove harmful information from their models through machine unlearning techniques.

    Measuring hazardous knowledge in bio, chem, and cyber. Current evaluations of dangerous AI capabilities have [...]

    ---

    Outline:

    (00:03) Measuring and Reducing Hazardous Knowledge

    (04:35) Language models are getting better at forecasting

    (07:51) Proposals for Private Regulatory Markets

    (14:25) Links

    ---

    First published:
    March 7th, 2024

    Source:
    https://newsletter.safe.ai/p/ai-safety-newsletter-32-measuring

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

  • Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

    This week, we’ll discuss:

    A new proposed AI bill in California which requires frontier AI developers to adopt safety and security protocols, and clarifies that developers bear legal liability if their AI systems cause unreasonable risks or critical harms to public safety.

    Precedents for AI governance from healthcare and biosecurity.

    The EU AI Act and job opportunities at their enforcement agency, the AI Office.

    A New Bill on AI Policy in California

    Several leading AI companies have public plans for how they’ll invest in safety and security as they develop more dangerous AI systems. A new bill in California's state legislature would codify this practice as a legal requirement, and clarify the legal liability faced by developers [...]

    ---

    Outline:

    (00:33) A New Bill on AI Policy in California

    (04:38) Precedents for AI Policy: Healthcare and Biosecurity

    (07:56) Enforcing the EU AI Act

    (08:55) Links

    ---

    First published:
    February 21st, 2024

    Source:
    https://newsletter.safe.ai/p/aisn-31-a-new-ai-policy-bill-in-california

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.

  • Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

    Compute Investments Continue To Grow

    Pausing AI development has been proposed as a policy for ensuring safety. For example, an open letter last year from the Future of Life Institute called for a six-month pause on training AI systems more powerful than GPT-4.

    But one interesting fact about frontier AI development is that it comes with natural pauses that can last many months or years. After releasing a frontier model, it takes time for AI developers to construct new compute clusters with larger numbers of more advanced computer chips. The supply of compute is currently unable to keep up with demand, meaning some AI developers cannot buy enough chips for their needs.

    This explains why OpenAI was reportedly limited by GPUs last year. [...]

    ---

    Outline:

    (00:06) Compute Investments Continue To Grow

    (03:48) Developments in Military AI

    (07:19) Japan and Singapore Support AI Safety

    (08:57) Links

    ---

    First published:
    January 24th, 2024

    Source:
    https://newsletter.safe.ai/p/aisn-30-investments-in-compute-and

    ---

    Want more? Check out our ML Safety Newsletter for technical safety research.

    Narrated by TYPE III AUDIO.