Episodes
-
xlr8harder writes:
In general I don’t think an uploaded mind is you, but rather a copy. But one thought experiment makes me question this. A Ship of Theseus concept where individual neurons are replaced one at a time with a nanotechnological functional equivalent.
Are you still you?
Presumably the question xlr8harder cares about here isn't semantic question of how linguistic communities use the word "you", or predictions about how whole-brain emulation tech might change the way we use pronouns.
Rather, I assume xlr8harder cares about more substantive questions like:
If I expect to be uploaded tomorrow, should I care about the upload in the same ways (and to the same degree) that I care about my future biological self?Should I anticipate experiencing what my upload experiences?If the scanning and uploading process requires destroying my biological brain, should I say yes to the procedure?My answers:
The original text contained 1 footnote which was omitted from this narration.
The original text contained 7 images which were described by AI.
---
First published:
April 17th, 2024
Source:
https://www.lesswrong.com/posts/zPM5r3RjossttDrpw/when-is-a-mind-me
---
Narrated by TYPE III AUDIO. -
I haven't shared this post with other relevant parties – my experience has been that private discussion of this sort of thing is more paralyzing than helpful. I might change my mind in the resulting discussion, but, I prefer that discussion to be public.
I think 80,000 hours should remove OpenAI from its job board, and similar EA job placement services should do the same.
(I personally believe 80k shouldn't advertise Anthropic jobs either, but I think the case for that is somewhat less clear)
I think OpenAI has demonstrated a level of manipulativeness, recklessness, and failure to prioritize meaningful existential safety work, that makes me think EA orgs should not be going out of their way to give them free resources. (It might make sense for some individuals to work there, but this shouldn't be a thing 80k or other orgs are systematically funneling talent into)
There [...]
---
First published:
July 3rd, 2024
Source:
https://www.lesswrong.com/posts/8qCwuE8GjrYPSqbri/80-000-hours-should-remove-openai-from-the-job-board-and
---
Narrated by TYPE III AUDIO. -
Missing episodes?
-
This is a linkpost for https://www.bhauth.com/blog/biology/cancer%20vaccines.html cancer neoantigens
For cells to become cancerous, they must have mutations that cause uncontrolled replication and mutations that prevent that uncontrolled replication from causing apoptosis. Because cancer requires several mutations, it often begins with damage to mutation-preventing mechanisms. As such, cancers often have many mutations not required for their growth, which often cause changes to structure of some surface proteins.
The modified surface proteins of cancer cells are called "neoantigens". An approach to cancer treatment that's currently being researched is to identify some specific neoantigens of a patient's cancer, and create a personalized vaccine to cause their immune system to recognize them. Such vaccines would use either mRNA or synthetic long peptides. The steps required are as follows:
The cancer must develop neoantigens that are sufficiently distinct from human surface proteins and consistent across the cancer.Cancer cells must [...]---
First published:
May 5th, 2024
Source:
https://www.lesswrong.com/posts/xgrvmaLFvkFr4hKjz/introduction-to-cancer-vaccines
Linkpost URL:
https://www.bhauth.com/blog/biology/cancer vaccines.html
---
Narrated by TYPE III AUDIO. -
I
Imagine an alternate version of the Effective Altruism movement, whose early influences came from socialist intellectual communities such as the Fabian Society, as opposed to the rationalist diaspora. Let's name this hypothetical movement the Effective Samaritans.
Like the EA movement of today, they believe in doing as much good as possible, whatever this means. They began by evaluating existing charities, reading every RCT to find the very best ways of helping.
But many effective samaritans were starting to wonder. Is this randomista approach really the most prudent? After all, Scandinavia didn’t become wealthy and equitable through marginal charity. Societal transformation comes from uprooting oppressive power structures.
The Scandinavian societal model which lifted the working class, brought weekends, universal suffrage, maternity leave, education, and universal healthcare can be traced back all the way to 1870's where the union and social democratic movements got their start.
In many developing countries [...]
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
April 22nd, 2024
Source:
https://www.lesswrong.com/posts/sKKxuqca9uhpFSvgq/priors-and-prejudice
---
Narrated by TYPE III AUDIO. -
About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it's relatively simple, you set single tasks which you have to verify you have completed with a photo.
I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the luck of being in the right mood for long enough.
It's too soon to tell whether [...]
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
April 15th, 2024
Source:
https://www.lesswrong.com/posts/DRrAMiekmqwDjnzS5/my-experience-using-financial-commitments-to-overcome
---
Narrated by TYPE III AUDIO. -
An NII machine in Nogales, AZ. (Image source)There's bound to be a lot of discussion of the Biden-Trump presidential debates last night, but I want to skip all the political prognostication and talk about the real issue: fentanyl-detecting machines.
Joe Biden says:
And I wanted to make sure we use the machinery that can detect fentanyl, these big machines that roll over everything that comes across the border, and it costs a lot of money. That was part of this deal we put together, this bipartisan deal.
More fentanyl machines, were able to detect drugs, more numbers of agents, more numbers of all the people at the border. And when we had that deal done, he went – he called his Republican colleagues said don’t do it. It's going to hurt me politically.
He never argued. It's not a good bill. It's a really good bill. We need [...]
---
First published:
June 28th, 2024
Source:
https://www.lesswrong.com/posts/TzwMfRArgsNscHocX/the-incredible-fentanyl-detecting-machine
---
Narrated by TYPE III AUDIO. -
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.[Thanks to Aryan Bhatt, Ansh Radhakrishnan, Adam Kaufman, Vivek Hebbar, Hanna Gabor, Justis Mills, Aaron Scher, Max Nadeau, Ryan Greenblatt, Peter Barnett, Fabien Roger, and various people at a presentation of these arguments for comments. These ideas aren’t very original to me; many of the examples of threat models are from other people.]
In this post, I want to introduce the concept of a “rogue deployment” and argue that it's interesting to classify possible AI catastrophes based on whether or not they involve a rogue deployment. I’ll also talk about how this division interacts with the structure of a safety case, discuss two important subcategories of rogue deployment, and make a few points about how the different categories I describe here might be caused by different attackers (e.g. the AI itself, rogue lab insiders, external hackers, or [...]
---
First published:
June 3rd, 2024
Source:
https://www.lesswrong.com/posts/ceBpLHJDdCt3xfEok/ai-catastrophes-and-rogue-deployments
---
Narrated by TYPE III AUDIO. -
(Cross-posted from my website. Audio version here, or search for "Joe Carlsmith Audio" on your podcast app.)
This is the final essay in a series that I'm calling "Otherness andcontrol in the age of AGI." I'm hoping that the individual essays can beread fairly well on their own, butsee here fora brief summary of the series as a whole. There's also a PDF of the whole series here.
Warning: spoilers for Angels in America; and moderate spoilers forHarry Potter and the Methods of Rationality.)
"I come into the presence of still water..."
~Wendell Berry
A lot of this series has been about problems with yang—that is,with the active element in the duality of activity vs. receptivity,doing vs. not-doing, controlling vs. letting go.[1] In particular,I've been interested in the ways that "deepatheism"(that is, a fundamental [...]
--- -
ARC's current research focus can be thought of as trying to combine mechanistic interpretability and formal verification. If we had a deep understanding of what was going on inside a neural network, we would hope to be able to use that understanding to verify that the network was not going to behave dangerously in unforeseen situations. ARC is attempting to perform this kind of verification, but using a mathematical kind of "explanation" instead of one written in natural language.
To help elucidate this connection, ARC has been supporting work on Compact Proofs of Model Performance via Mechanistic Interpretability by Jason Gross, Rajashree Agrawal, Lawrence Chan and others, which we were excited to see released along with this post. While we ultimately think that provable guarantees for large neural networks are unworkable as a long-term goal, we think that this work serves as a useful springboard towards alternatives.
In this [...]
The original text contained 1 footnote which was omitted from this narration.
---
First published:
June 25th, 2024
Source:
https://www.lesswrong.com/posts/SyeQjjBoEC48MvnQC/formal-verification-heuristic-explanations-and-surprise
---
Narrated by TYPE III AUDIO. -
Summary Summary .
LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible.
Longer summary
There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks:
Whether this limitation is illusory or actually exists.If it exists, whether it will be solved by scaling or is a problem fundamental to LLMs.If fundamental, whether it can be overcome by scaffolding & tooling.If this is a real and fundamental limitation that can't be fully overcome by scaffolding, we should be skeptical of arguments like Leopold Aschenbrenner's (in his recent 'Situational Awareness') that we can just 'follow straight lines on graphs' and expect AGI in the next few years.
Introduction Introduction .
Leopold Aschenbrenner's [...]
The original text contained 9 footnotes which were omitted from this narration.
---
First published:
June 24th, 2024
Source:
https://www.lesswrong.com/posts/k38sJNLk7YbJA72ST/llm-generality-is-a-timeline-crux
---
Narrated by TYPE III AUDIO. -
Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain crucial structural information beyond superposition, as seen in circular arrangements of day-of-the-week features and in the rich structures. We don’t currently have good concepts for talking about this structure in feature geometry, but it is likely very important for model computation. An eventual understanding of feature geometry might look like a hodgepodge of case-specific explanations, or supplementing superposition with additional concepts, or plausibly an entirely new theory that supersedes superposition. To develop this understanding, it may be valuable to study toy models in depth and do theoretical or conceptual work in addition to studying frontier models.
Epistemic status: Decently confident that the ideas here are directionally correct. I’ve been thinking these thoughts for a while, and recently got round to writing them up at a high level. Lots of people (including [...]
The original text contained 5 footnotes which were omitted from this narration.
---
First published:
June 24th, 2024
Source:
https://www.lesswrong.com/posts/MFBTjb2qf3ziWmzz6/sae-feature-geometry-is-outside-the-superposition-hypothesis
---
Narrated by TYPE III AUDIO. -
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.TL;DR: We published a new paper on out-of-context reasoning in LLMs. We show that LLMs can infer latent information from training data and use this information for downstream tasks, without any in-context learning or CoT. For instance, we finetune GPT-3.5 on pairs (x,f(x)) for some unknown function f. We find that the LLM can (a) define f in Python, (b) invert f, (c) compose f with other functions, for simple functions such as x+14, x // 3, 1.75x, and 3x+2.
Paper authors: Johannes Treutlein*, Dami Choi*, Jan Betley, Sam Marks, Cem Anil, Roger Grosse, Owain Evans (*equal contribution)
Johannes, Dami, and Jan did this project as part of an Astra Fellowship with Owain Evans.
Below, we include the Abstract and Introduction from the paper, followed by some additional discussion of our AI safety [...]
---
First published:
June 21st, 2024
Source:
https://www.lesswrong.com/posts/5SKRHQEFr8wYQHYkx/connecting-the-dots-llms-can-infer-and-verbalize-latent
---
Narrated by TYPE III AUDIO. -
This is a link post.I have canceled my OpenAI subscription in protest over OpenAI's lack ofethics.
In particular, I object to:
threats to confiscate departing employees' equity unless thoseemployees signed a life-long non-disparagement contractSam Altman's pattern of lying about important topicsI'm trying to hold AI companies to higher standards than I use fortypical companies, due to the risk that AI companies will exert unusualpower.
A boycott of OpenAI subscriptions seems unlikely to gain enoughattention to meaningfully influence OpenAI. Where I hope to make adifference is by discouraging competent researchers from joining OpenAIunless they clearly reform (e.g. by firing Altman). A few goodresearchers choosing not to work at OpenAI could make the differencebetween OpenAI being the leader in AI 5 years from now versus being,say, a distant 3rd place.
A [...]
---
First published:
June 18th, 2024
Source:
https://www.lesswrong.com/posts/sXhBCDLJPEjadwHBM/boycott-openai
---
Narrated by TYPE III AUDIO. -
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.New Anthropic model organisms research paper led by Carson Denison from the Alignment Stress-Testing Team demonstrating that large language models can generalize zero-shot from simple reward-hacks (sycophancy) to more complex reward tampering (subterfuge). Our results suggest that accidentally incentivizing simple reward-hacks such as sycophancy can have dramatic and very difficult to reverse consequences for how models generalize, up to and including generalization to editing their own reward functions and covering up their tracks when doing so.
Abstract:
In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be too [...]
---
First published:
June 17th, 2024
Source:
https://www.lesswrong.com/posts/FSgGBjDiaCdWxNBhj/sycophancy-to-subterfuge-investigating-reward-tampering-in
---
Narrated by TYPE III AUDIO. -
After living in a suburb for most of my life, when I moved to a major U.S. city the first thing I noticed was the feces. At first I assumed it was dog poop, but my naivety didn’t last long.
One day I saw a homeless man waddling towards me at a fast speed while holding his ass cheeks. He turned into an alley and took a shit. As I passed him, there was a moment where our eyes met. He sheepishly averted his gaze.
The next day I walked to the same place. There are a number of businesses on both sides of the street that probably all have bathrooms. I walked into each of them to investigate.
In a coffee shop, I saw a homeless woman ask the barista if she could use the bathroom. “Sorry, that bathroom is for customers only.” I waited five minutes and [...]
---
First published:
June 18th, 2024
Source:
https://www.lesswrong.com/posts/sCWe5RRvSHQMccd2Q/i-would-have-shit-in-that-alley-too
---
Narrated by TYPE III AUDIO. -
ARC-AGI post
Getting 50% (SoTA) on ARC-AGI with GPT-4o
I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go here)[2]. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs.
[This post is on a pretty different topic than the usual posts on our substack. So regular readers should be warned!]
The additional approaches and tweaks are:
I use few-shot prompts which perform meticulous step-by-step reasoning.I have GPT-4o try to revise some of the implementations after seeing what they actually output on the provided examples.I do some feature engineering [...]The original text contained 15 footnotes which were omitted from this narration.
---
First published:
June 17th, 2024
Source:
https://www.lesswrong.com/posts/Rdwui3wHxCeKb7feK/getting-50-sota-on-arc-agi-with-gpt-4o
---
Narrated by TYPE III AUDIO. -
Have you heard this before? In clinical trials, medicines have to be compared to a placebo to separate the effect of the medicine from the psychological effect of taking the drug. The patient's belief in the power of the medicine has a strong effect on its own. In fact, for some drugs such as antidepressants, the psychological effect of taking a pill is larger than the effect of the drug. It may even be worth it to give a patient an ineffective medicine just to benefit from the placebo effect. This is the conventional wisdom that I took for granted until recently.
I no longer believe any of it, and the short answer as to why is that big meta-analysis on the placebo effect. That meta-analysis collected all the studies they could find that did "direct" measurements of the placebo effect. In addition to a placebo group that could [...]
---
First published:
June 10th, 2024
Source:
https://www.lesswrong.com/posts/kpd83h5XHgWCxnv3h/why-i-don-t-believe-in-the-placebo-effect
---
Narrated by TYPE III AUDIO. -
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.As an AI researcher who wants to do technical work that helps humanity, there is a strong drive to find a research area that is definitely helpful somehow, so that you don’t have to worry about how your work will be applied, and thus you don’t have to worry about things like corporate ethics or geopolitics to make sure your work benefits humanity.
Unfortunately, no such field exists. In particular, technical AI alignment is not such a field, and technical AI safety is not such a field. It absolutely matters where ideas land and how they are applied, and when the existence of the entire human race is at stake, that's no exception.
If that's obvious to you, this post is mostly just a collection of arguments for something you probably already realize. But if you somehow [...]
---
First published:
June 14th, 2024
Source:
https://www.lesswrong.com/posts/F2voF4pr3BfejJawL/safety-isn-t-safety-without-a-social-model-or-dispelling-the
---
Narrated by TYPE III AUDIO. -
Preamble: Delta vs Crux
This section is redundant if you already read My AI Model Delta Compared To Yudkowsky.
I don’t natively think in terms of cruxes. But there's a similar concept which is more natural for me, which I’ll call a delta.
Imagine that you and I each model the world (or some part of it) as implementing some program. Very oversimplified example: if I learn that e.g. it's cloudy today, that means the “weather” variable in my program at a particular time[1] takes on the value “cloudy”. Now, suppose your program and my program are exactly the same, except that somewhere in there I think a certain parameter has value 5 and you think it has value 0.3. Even though our programs differ in only that one little spot, we might still expect very different values of lots of variables during execution - in other words, we [...]
---
First published:
June 12th, 2024
Source:
https://www.lesswrong.com/posts/7fJRPB6CF6uPKMLWi/my-ai-model-delta-compared-to-christiano
---
Narrated by TYPE III AUDIO. -
Preamble: Delta vs Crux
I don’t natively think in terms of cruxes. But there's a similar concept which is more natural for me, which I’ll call a delta.
Imagine that you and I each model the world (or some part of it) as implementing some program. Very oversimplified example: if I learn that e.g. it's cloudy today, that means the “weather” variable in my program at a particular time[1] takes on the value “cloudy”. Now, suppose your program and my program are exactly the same, except that somewhere in there I think a certain parameter has value 5 and you think it has value 0.3. Even though our programs differ in only that one little spot, we might still expect very different values of lots of variables during execution - in other words, we might have very different beliefs about lots of stuff in the world.
If your model [...]
The original text contained 1 footnote which was omitted from this narration.
---
First published:
June 10th, 2024
Source:
https://www.lesswrong.com/posts/q8uNoJBgcpAe3bSBp/my-ai-model-delta-compared-to-yudkowsky
---
Narrated by TYPE III AUDIO. - Show more