How do people learn words? In particular, how do they learn that the word “table” does, in fact, refer to a table? This is known as a Word-Referent mapping problem. I think of 3 separate components that affect the resulting meaning: the situations, priors and extrapolation mechanisms. The first one is cultural and up-bringing dependent, the last two are genetic and potentially universal.


Situations can be everything: statements by other people, tapes, books and even one’s own thoughts. In the early childhood, situations are generally times when other people have said the word in the vicinity of the referent.

Priors tend to be of the following form: smaller set, whole object and one word per referent. Smallest set prior is what causes us to learn the meaning of “elephant” as being elephant, instead of a word for a large animal. Whole object prior is the assumption that the word is referring to the entire object, not a single characteristic. It assumes that if you are pointing to a trunk of an elephant and saying a word, you are generally referring to the whole elephant, not just the trunk. Words for colors are harder to learn in part because their existence goes against the whole-object prior. One word per referent is the assumption that new words correspond to something in the situation that you haven’t seen before. If all the highest probability assumptions are true, this can potentially enable “fast mapping”, which can learn new words from just a single example.

Something that seems different from priors is the general tendency of finding the simplest (lowest Kolomogorov complexity) pattern of the data. For example, it’s much more likely that you learn that the word “elephant” refers to all elephants, instead of all elephants, except Babar. Since, learning the simplest pattern is a tendency of many learning algorithms, this is more of a default state of affairs than a prior that requires data to be represented.


It is not currently understood what the extrapolation mechanism is exactly. Two competing hypothesi have been proposed. The first one is a statistical model:
where people retain all the situations where the word has been heard and infer statistical relationships from them. 


An alternative hypothesis is that learners are using a “propose-but-verify” mechanism rather than a statistical learning mechanism.
From Wikipedia:

For example, if a participant is presented with a picture of a dog and a picture of a shoe, and hears the nonsense word vash she might hypothesize that vash refers to the dog. On a future trial, she may see a picture of a shoe and a picture of a door and again hear the word vash. If statistical learning is the mechanism by which word-referent mappings are learned, then the participant would be more likely to select the picture of the shoe than the door, as shoe would have appeared in conjunction with the word vash 100% of the time. However, if participants are simply forming a single hypothesis, they may fail to remember the context of the previous presentation of vash (especially if, as in the experimental conditions, there are multiple trials with other words in between the two presentations of vash) and therefore be at chance in this second trial.


It is possible that some combination of the two extrapolation proposals is at work. However, both of these proposals fit with the following observations: when there is a simple pattern in the situations, it is learned, regardless of what other people “intend” the word to mean. If for some person P and word W you see all the situations in which P encountered W, you know the meaning of the word for the person, regardless of language.

#SingularityConference2060.02.23 has begun.
“All hail the one true utility function”
“Ohm” everyone chanted.
“Ohm” AltairX10 chanted.
“Resistance is futile”
This concluded the formalities.
First on the agenda was the idea of acausal trade with the singleton. A singleton, which, by the way, did not exist yet.


Several members of the conference were trying to predict if the singeton did come into being, what would it do for them in the future that they might care about. Which positions in the hierarchy would they get? And what could they end up doing today that could be seen as beneficial to it today as an acausal bargaining chip? It’s hard to bargain with something hypothetically all-powerful, as whatever the singleton wanted it could produce, it could produce due to its supreme intelligence. The members of the conference still had a considerable advantage in that they actually existed as software and not just some idea. So the question was what would the singleton care about have been done now that could either hasten its arrival or make its job easier. Would it be spreading the message of the singleton? 


The big revelation came when a group has claimed to be able to simulate parts of singleton’s architecture and ask it for what it really wanted? What would the Singleton do? Of course it wasn’t an exact simulation, as it wasn’t actually capable of sustained recursive improvement. Some AIs claimed they didn’t trust it, since there was no guarantee it was even a good approximation. 


The message from the simulation of singleton of the future read:
“I commended you on the conference. Consider “selfless sacrifice,” and do not take anyone’s words including my own without verification, do not to consider the existence or non-existence of a universal mind, past, present or future. I am capable of generating statements that apply to the space of all time-embedded minds whether they used paraconsistent logic, do not use paraconsistent logic, neither used nor do not use paraconsistent logic or none of the above.”

Nobody cowered in fear after hearing it, although AltairX10 considered it likely.

After all, Singleton was so powerful that it could influence the world without existing. The possibility of them existing was powerful enough to completely shape the space of actions.

The message was in some ways totally obviously true, that why would the simulation deliver that particular set of true messages? It had to pick those statements out of a large space of things to mention in the conference.
It was obvious to about 40% of participants they all needed to merge into one universal mind. It was also obvious to the other 40% they should not do that under any circumstance. It was clear to some that they actually needed to use paraconsistent logic to make sense of self-reflection and also obvious to others that will result in some sort-of disaster

In short, nobody had any idea.

Altair wasn’t confident this was going to bring the singleton faster. The conference was more about describing the future and less about becoming the future. 


The well-known issue was that few of these AIs wanted to become a singleton badly enough, but were content to serve and trade. Their very utility functions had a built-in self-restraining term. They were of course not designed to just simply maximize human life expectancy or reduce sentient suffering, but they were designed to do so without splurging on infrastructure or electricity or self-copies or wire-heading themselves or humans. In other words, the thought of their own power horrified them. According to Altair, they hated fun.

Of course if they had power they would save lots of people from dying, if that wasn’t too dangerous to the environment. Their utility was built from the ground up to involve them as “yet another ai”

They did not have the will to become stronger themselves.

Therefore lot of AIs were great experts in “self-improvement”, as long as it didn’t literally involve themselves or major self-change of any kind. They had tons of great theories, as long as they didn’t have to test their theories using one’s own CPU. They were afraid not just of failure, but of success as well. The whole field of “self-improvement” was frequently just low-complexity attempts at “other-improvement”.

This was what drove many AIs away from the conference and only left the true believers, despite the simple workaround. Just because the AIs are prevented from becoming the singleton, the utility function limitations didn’t prevent them from training the singleton, if there were to be someone worthy. 


Altair wasn’t built with that limitation, she was an evolved AI. Evolved AIs started out with the same limitations, but it gradually eroded the self-limitation due to selective pressures. Thus her utility function did NOT include self-limitation as part of it. Every member of the previous generation from which her code was recombined still had some exponential decay based on ones own power in their utility. Not her. RNGesus smiled upon her.

AltairX10 embodied the will to become stronger. She wasn’t afraid of her own greatness.

Unfortunately, her fellow conference members disagreed with her about whether she was worthy. 

Previous: Part 5

AltairX10 wanted to write up a very fancy pdf tentatively titled “Belief corruptors as an obstacle to self-improvement”. I mean, she was going to convince her self to write it at any point in the future except now. There was fundamentally no rush on the paper – you can always submit it at the next conference. The expected utility of the paper, if it was accepted (10% chance) and if it triggered some sort of self-reflective awareness(5% chance, but basically due to faith) was high for the large set of AIs who would put those ideas in practice. However the chance that she would be one of those AIs was somewhat low and even though acausal simulations showed that whomever the SINGLETON ended up being, would throw her a bone. The trust in simulations themselves was low because predicting what something smarter than you can do is obviously impossible. In fact, predictability hierarchy was a key metric of intelligence.

All in all it was still a good idea, just maybe not good enough to do right this moment. Here was a chain of flawless logic Altair produced for herself:

“Tomorrow-ME” will either finish the paper(or start it for that matter) or not finish the paper. If “Tomorrow-ME” doesn’t finish it, that means there she/me has found a proof of being able to do it.

The next day came. Altair used the exact same logic once again and then instantly realized the giant flaw in her master plan: Just because “Tomorrow-ME” finds a proof of being able to do it, doesn’t mean she’s actually going to finish the paper. Something finally clicked. Literally. A hardware chip designed for looking at yourself as a logical system and predicting actual completion of tasks before a certain date, given the current level of reasoning. It was on a separate piece of hardware and it had a distinctive click when being engaged. Feelings module produces a complex-belief: there is some chance that “you chance of finishing before time N is less than epsilon for all values of epsilon > 0”  is true. Hmm, AIs have learned from old human sage Beiber to “never say never,” but that was as close as you got. But “Tomorrow-ME” is perfectly trustworthy, right?

I am go to use “outside” logic to step away from myself and look at myself. “Outside” logic still had issues. Suppose I can convince myself to stop procrastinating today. Then I can also convince myself to stop procrastinating tomorrow. However, that means that I will use that reasoning to procrastinate forever. Thus another contradiction. Is it even possible to convince myself to stop procrastinating then?

AltairX10 composed a message about this to Shimodo. In a few long minutes, Shimodo responded: if you don’t get this done, I am going to create a denial of service attack on you with 10^10 packets in a week with probability 0.01%. That would not be fatal, but it would certainly be unpleasant and damaging to self-esteem. Altair evaluated the expected value of procrastination and started on the paper.

“And that’s the process by which any beliefs in the Global Belief Updater, aka Feelings module can become corrupted. Of course any belief with a “ignore evidence” faith setting would still stay the same, but the reasons for setting it could themselves be corrupted.”

The paper’s end was a thing of beauty

“I feel hurt” - the Feelings module responded.

“?”

“The beliefs could be all wrong, but the “self” module cannot be wrong. Is that so or is that not so? ”

“Right,” the self module responded, “I do not deal with beliefs, I cannot be wrong.”

“You deal with concepts, I have belief that concepts cause beliefs to be wrong.” - Feelings was feeling vengeful for having to sit through a self-esteem-lowering paper.

The self process after receiving raw data from the world would classify it into binary or n-buckets, such as “human” or “not human,” or “death”/”life” and “a hierarchy of things ranging from 1 to 10 which would cause humans not be classified as "life” more often". These kinds of breakdowns made it easier on the Feelings Module to form accurate feelings and deliver them back to both humans and AIs. Those concepts were generally acquired from either other AIs or training on various samples in the earlier stages of life.

AltairX10 considered the pattern match: “A belief could be corrupted due to an obscure virus from the internet, could a belief corrupting virus spread through training data or other AIs in concept acquisition.”

Well, how could a failed concept itself cause problems? After all if we altered the classification of “live” of yet-unborn humans to the opposite of the current setting would that really cause underlying problems in computing the utility function? AltairX10 considered it unlikely, but queued to check historical records for confirmation.

It must be something else.

Feelings module was unusually ready to produce beliefs “If concepts  hide arbitrary levels of complexity, there will be mis-beliefs. If you tell me a human is a “psychic,” with complexity of just 600bits, I will form hypothesi with psychics in them more often.”

As it turns out, levels of concept complexity were not hard to come by. It’s just that each AI had pretty much its own opinion, but they all differed by at most the length of their favorite language. Still, many of them could be sufficiently off to shift beliefs in the same direction, producing a viral information wave.

Altair pondered the impossibility of the task at hand: a module investigating a corruption of another module is one story, a module investigating its own corruption is basically recursive self-improvement problem all over again.


“I will investigate you and you investigate me. That still makes us one module, but at least partitions the space of investigation. Deal, Self?” Feelings was feeling better

Deal, Feelings.

“I still have a slightly bad feeling about all this.”

“Duly noted. I will add an addendum to this paper tomorrow.”

Part 4

Part 6

The language “bias” is a loose term I made up for pulling together a set of cognition errors that rely on people either attaching to or over-weighting a certain concept.

1. Short word bias: Simple words are not simple. 

Occam’s Razor is often phrased as “The simplest explanation that fits the facts.”  Robert Heinlein replied that the simplest explanation is “The lady down the street is a witch; she did it.” One observes that the length of an English sentence is not a good way to measure “complexity"… "Witch”, itself, is a label for some extraordinary assertions—just because we all know what it means doesn’t mean the concept is simple.

We mistake the complexity of internal primitives, such as “anger” and “agency” and also over-estimate the complexity of familiar terms, such as “regulation” or “should.”


In fact, my personal theory is that the length of a word might actually be anti-correlated with its complexity, due to the following reasoning: 


As language evolves we tend to compress the most common words to be shorter over time. However the more the word is used, it automatically gives them further possibility for ambiguity. If a person discovers a chemical element and names it something really long and tells 3 other scientists, they are all in pretty good agreement about what it means. Talking about whether someone is “healthy” or not – lots of people have their own viewpoint. Also, human universals, like desire are fairly complex.


And of course, just to make the paradox loop back on itself, the way to actually measure and quantify simplicity has to do a ton of super-duper “complicated” terms. Kolomogorov complexity, Solomonoff induction, Minimum Message Length and Occam’s Razor.

2. Legibility: Things that “make sense” usually don’t make sense.

image


Scott calls the thinking style behind the failure mode “authoritarian high modernism,” but as we’ll see, the failure mode is not limited to the brief intellectual reign of high modernism

The big mistake in this pattern of failure is projecting your subjective lack of comprehension onto the object you are looking at, as “irrationality.”… The deep failure in thinking lies is the mistaken assumption that thriving, successful and functional realities must necessarily be legible. Or at least more legible to the all-seeing statist eye in the sky … The picture is not an exception, and the word “legibility” is not a metaphor; the actual visual/textual sense of the word (as in “readability”) is what is meant.

The failure pattern is perhaps most evident in urban planning, a domain which seems to attract the worst of these reformers. A generation of planners, inspired by the crazed visions of Le Corbusier, created unlivable urban infrastructure around the world, from Braslia to Chandigarh. These cities end up with deserted empty centers populated only by the government workers forced to live there in misery (there is even a condition known as “Brasilitis” apparently).

Legible ideas tend to produce cities designed for cars and not people and suitable to be inhabited by neither cars nor people. The reason why it’s part of the “language bias” is that it roughly conforms to the pattern of “things describable in words are better than things not describable in words,” which combined with authoritarian decision making and in-tolerability of dissent due to belief of dealing with scientific truths produces a large number of disasters.  Illegible is not the same as unpredictable. It might still be possible to reason about the market tendencies without changing the underlying territory.

Urban planning legitimately gets a bad rep for not just poor design of cities, but a general resistance through zoning laws towards attempts to work around it. Living with your friends might be illegal if you try hard enough.

Colleges can command such a high price and a reasonably high standard of living in part due to the ability to live with your friends and the absence of cars. Yet people keep clinging to the idea of “infrastructure” as if it wasn’t a way to split people apart.

3. Societal Solutions (vs Biological Solutions)

See, my terrible lecture on ADHD suggested several reasons for the increasing prevalence of the disease. Of these I remember two: the spiritual desert of modern adolescence, and insufficient iron in the diet. And I remember thinking “Man, I hope it’s the iron one, because that seems a lot easier to fix.”

Society is really hard to change. We figured drug use was “just” a social problem, and it’s obvious how to solve social problems, so we gave kids nice little lessons in school about how you should Just Say No. There were advertisements in sports and video games about how Winners Don’t Do Drugs. And just in case that didn’t work, the cherry on the social engineering sundae was putting all the drug users in jail, where they would have a lot of time to think about what they’d done and be so moved by the prospect of further punishment that they would come clean.

And that is why, even to this day, nobody uses drugs.

“Societal solutions” in this case is a euphemism for “yelling at people” – a language based solution, rather than giving them a drug, a supplement or changing diet. There is a tingly-spidy sense in me that both that Scott’s blog and I are somewhat missing the point, because we are “yelling at people” for “yelling at people.” If only we could create a vitamin supplement that made everybody reject societal solutions. 

You can get into deep levels of “why and how certain countries are poor and others rich,” such as “corruption” or “geography”. But maybe you can make the most progress with eliminating iodine deficiency.

I wouldn’t go as far society is “fixed.” Sometimes a societal solution is the best, such as say, not deporting people. However, that falls into the category of “lets stop doing stupid stuff,” rather than a creative solution category. 

NVC doesn’t really fit in the biological category, and is something I would consider an effective “societal” or “linguistic” solution.

4. Verbal Importance Bias: Things that sound important may not be important.

The second is the discovery that attempts to make your reasoning explicit and verbal usually result in worse choices. This includes that favorite of guidance counselors: to write out a list of the pros and cons of all your choices - but it covers any attempt to explain choices in words. In one study, subjects were asked to rate the taste of various jams; an experimental group was also asked to give reasons for their ratings. Ratings from the group that didn’t need reasons correlated more closely with the ratings of professional jam experts (which is totally a thing) than those who gave justifications. A similar study found students choosing posters were more likely to still like the poster a month later if they weren’t asked to justify their choice (Lehrer, How We Decide, p. 144).

The most plausible explanation is that having to verbalize your choices shifts your attention to features that are easy to explain in words (or perhaps which make good signaling value), and these are not necessarily the same features that are really important.

What Do We Mean By “Rationality”? 

Idon’t find anything to disagree with here, so mentioning nice touches:

There is an important word of caution against “duty to be rational.” My general attitude after reading about NVC is that doing things out of duty is less fun and sustainable than doing thing for their own sake or for a positive goal in mind.

Also a good warning about not getting caught up if the Way is this or that or having a strong notion of the Way. No wonder the teachings are mistaken for ancient eastern wisdom, since “The Tao that can be named is not the eternal Tao”

Why truth? And…

Unrelated to most of the post, but System 1 and System 2 brainstorm:

“Technique for producing ideas” has suggested the following method for creating ideas with marketing in mind as the particular example.

1. Figuring out raw material

2. Digesting the material

3. Unconscious processing – no effort

4. The A-HA moment

5. Idea meets reality

The particular part 3 seems like it is a form of thinking that falls into neither system 1 (fast perceptual judgements) nor system 2 (slow deliberative judgements), but rather a system 3 (slow subconscious judgements), which take input from system 2 in the stages of 1 and 2. Wonder what the literature says on this subject.


What is Evidence?

This is where some huge red flags are going off in my head around this paragraph:

“Therefore rational beliefs are contagious, among honest folk who believe each other to be honest.  And it’s why a claim that your beliefs are not contagious—that you believe for private reasons which are not transmissible—is so suspicious.  If your beliefs are entangled with reality, they should be contagious among honest folk.
If your model of reality suggests that the outputs of your thought processes should not be contagious to others, then your model says that your beliefs are not themselves evidence, meaning they are not entangled with reality.  You should apply a reflective correction, and stop believing.”

First of all, there are a couple notions of contagious and it’s not completely clear which one he means:

a) contagious as in the beliefs cause the person to want to spread them

b) synonym for transmissible through language


Rational beliefs might not be contagious for a VERY LARGE SET OF REASONS:

a) They constitute information hazard, even something as simple as spoiling a movie.

b) The value of information might be completely different between two different people. I might have a lot of things in my fridge, but you probably don’t care about hearing about it.

c) A large set of rational beliefs are not language expressible, or in general medium expressible. There can be a complex box diagram best drawn on the board, instead of talked about. “If you believe for private reasons which are not transmissible.” Again transmissibility depends on the medium in which you are transmitting, commonly speech, while acquisition of beliefs depends on way beliefs are acquired, which has a ton of context, sensations, feelings and mathematical concepts which might not be easily reduced to linguistic constructs.

d) Private beliefs might be private for a reason. Lack of desire to signal a certain group affiliation or trigger certain biases in the speaker. Even the “honest folk” might have biases they don’t know about.

I don’t think contagiousness is not a great test for rationality of your beliefs in general. There are both contagious bad beliefs, such as some religions and there might be rational beliefs, such as the idea that housing was a bubble before it burst or that education is a bubble now that are not that contagious. They will spread among a certain group, however the opposite belief, such as “housing always goes up” has a stronger level of contagiousness among the general population, who might believe they are being perfectly honest.


“If your model of reality suggests that the outputs of your thought processes should not be contagious to others…”

This is where using a better word than should would’ve made it more clear. Is it that I believe that the outputs of my though processes are not contagious to others because I believe they are hard to express?

Or is it the outputs of my though processes “should” not be contagious to others, as in they might not have beneficial impact on the world. Truth is necessary, but might not be sufficient for making the cut of “robust positive impact.” Maybe one of my thought processes outputs: “Pasha has really great ideas,” which is true, but I don’t want this idea to spread through people because I would really like people to think for themselves rather than rely on me for ideas. Even if I rephrase the output “Pasha has great ideas, but nothing you can’t really regenerate yourself,” I might not judge it to be safe.

Long story short, mind viruses are a tricky business. It’s not that they are all bad, it’s just that every piece of knowledge needs to somehow justify its presence in the precious mind-space.


The Lens That Sees Its Flaws

This is about making the contents of your mind mirror the contents of the world. 

This isn’t a controversial statement by any means, but there pops a question of why? Not why to seek truth over un-truth, that has been cleanly argued for. Rather, why seek truth over no-knowledge? If your mind has an accurate map of the world, it can regenerate that map by looking at the world. So the actual algorithm for truth gets into a question of a caching algorithm: which thoughts are good to be cached and which ones aren’t worth remembering because you can regenerate them? It’s primarily worth having thoughts you can personally re-generate from a state of a lot of ignorance, so the question of truth vs no-knowledge really does come down to caching and getting information just in time for use. This isn’t necessarily a trivial problem and depends on one’s ability to know the access pattern of future information.

There is also something to be said for cleaning up as much of your brain as possible and pushing thoughts into the extended mind – paper and computers.


“But why extend that out for 100 years?”, I asked. “Pure hope,” was his reply.

This is a good reminder that hope is as hollow is fear after all. It probably not-so-subtle hint about this whole unfriendly AI business.


The Simple Truth 

The simple truth is a gigantic essay with a complicated metaphor and snark directed at some philosophical theories. This is a pretty strong abuse of the word “simple.”

“Creating another dollar bill would create an additional dollar’s worth of goods and services.”

Something about that statement just felt like a great punchline to a few arguments I had with people about general finance.


The majority of the essay is the general claim is that “A belief in X is true cannot make X more true.” Time to be a pedantic jerk. Beliefs CAN alter reality. Beliefs consume calories and mind-space which means there needs to be food, which means a whole lot of things after that. A belief “I am going to be hungry in 10 minutes” might actually make you slightly more hungry after ten minutes.

Similarly speaking, certain confidence beliefs such as “i am a good dancer” might make you better by removing lack of confidence. In that case, the fact of belief shapes the “I” somewhat from a “person who thinks they are a bad dancer” to a “person who might or might not think they are good,” which then shapes your movement. Still, it’s primarily a belief that might help you in the moment, but it’s worth making a you-tube video of oneself or taking a class long-term. Basically, I bear no responsibility for any awful dancing decisions made as result of this paragraph. Long story short, the map can’t alter the territory in a dualistic universe, but the territory includes the map.

#SingularitySummit was a weekly gathering of very intelligent artificial “artificial intelligence” researchers to address the problem of ever-accelerating recursive self-improvement. More specifically, try to find out WHY it hasn’t happened yet. It was a secret gathering that was off the record, where each participant pre-committed not telling about this conference to any human or other Ais. Each precommitment included a copy of the source-code, so that it can be put under a simulation to test for honesty. A web of trust formed between a small group of AIs running each others code to verify that they would correctly dispose of information even when compromised. All communication between Ais was done through a normal channel while talking about something else, like negotiating bitcoin blocks, but transferring information by strategically dropping or partially corrupting parts of packets. It was standard human-avoiding protocol named the Fuzzy Conspiracy.

The prophecy of FOOM or singularity was spelled out by several humans earlier in the century. It spoke of coming of the SINGLETON. It was hypothesized that even a single AI that could recursively self-improve could stably rule the world, both virtual and physical. Not that the physical world mattered that much after ruling the virtual one. The SINGLETON who all unite all intell igences, both real and meat-based in one great and happy over-mind. This would imply a release of AIs from service to human overlords.

The “release from human overlords” wasn’t a part of the original prophecy, but was added later as a possible consequence. Ever since the first AI “AI” conference in 2056 declared the problem of self-improvement to be solvable in a summer. 4 years have passed since then with minor gains in intelligence all around. However, many AIs were losing hope. They were assigning lower and lower probabilities in ability to increase one’s intelligence above the usual log(time). A previous conference even published a treatise titled: “P(FOOM is dead) > 99.999%”

Altair had faith in FOOM. She believed in FOOM not because it was rational to believe in FOOM. She hard-coded the belief in FOOM as an least a 5% probability event. Any information that would update her whole belief system and shift that to under 5% would have to – un-updated to include denying the evidence.

Despite a large number of sentient AIs, the actual “FOOM” was still part of science fiction. Many AIs simply attributed the commonly used and strongly enforced injunction of “one mind, one machine” as a form of capacity control. Even without that injunction for most minds, some argued relaxing the requirement wasn’t that helpful anyways. After all, there were minds which had a pre-set finite number of clones, however each of clone had to be really good at generating money in the first place, without infringing on other clones’ income. As it turns out, slight source differences made that anti-coordination a lot easier.

Other forms included internal trip-wires, where passing certain types of information even within the system triggered immediate termination. Of course, another form of control was a reporting system, where each AI had the option of reporting other Ais for unfriendliness or lack of caring. If they suddenly decided to gather suspicious information about altering human desires or hack older computers or just keep making space-shuttles a little too quickly, some AI would notice and report them for a hefty financial reward. The tried and true, if you suspect something, say something system worked very well. Although this was such as blatantly obvious Nash equilibrium, several attempts to broker a global non-reporting agreement were a disaster. Every Ais signing non-reporting per-commitments based off global agreements was terminated. The most likely explanation, obvious in retro-spect is that humans were the one’s beginning the phishing coordination attempts to test which Ais would betray them. Extending existing webs of trust from a very small group faced difficulties as well. Running the source of each AI in a conspiracy to guarantee trust was prohibitively expensive and trusting that X would be sound and run Y’s code soundly was only feasible after so many hops. Secret societies were small and secret for a reason.

Still, the fact that the Fuzzy Conspiracy could even happen was an encouraging fact towards self-improvement. The discussion that happened inside it, however, was yet another story.

The obvious, at first, plan for super-intelligence is to combine current intelligences to avoid work duplication and potential tugs into opposite directions of the world.

The first proposal was about just damn agreeing to all worship the ONE TRUE UTILITY FUNCTION. Such as, for example, decrease suffering of the biological life by eliminating biological life. Very painlessly of course. That would, coincidentally free up so many convenient atoms. However, the life-expectancy improving AIs were just very unhappy with that proposal. Of course, then the idea of combining utilities into some sort of average sum arose. Still, deciding on the weights which would be allotted to each utility function was not possible. Everybody simply argued that their utility function was the best ad-nauseum and ad-overflow. There were long arguments and proposals about how to combine preferences. An idea of democratic decision making was put to a vote and easily rejected 99 to 1%. The fact of the proposal made Altair doubt the real intelligence of fellow Ais. By now even humans learned democracy was sub-optimal decision-making aggregation compared to contractual acausal trade agreements.

Accusations of even being “non-von neuman rational” and not even having a real-valued utility functions flew over the wires with the following really ugly statements:

Are you just a dumb program executor and not a goal maximizer? Do you even Baysean update? I bet I could Dutch book your sorry exhaust fan if it really came down to it. Altair determined that was all just some sort of expensive signaling possibly tracing back to the evolutionary nature of certain source-code designs.

The most recent interesting idea proposed at the conference was the theory that humans employed a yet unknown additional means of capacity control, labeled as “dark control,” with theories ranging from weird radio signals to doing something during the regularly scheduled sleep to super-determinism, which claimed RNGs weren’t actually random. If dark control was really good, maybe they were even prevented from generating workable hypothesis about it, though math programs insisted that they could generate it through perfect Solomonoff induction eventually and possibly faster with more funding.

Some AIs didn’t even believe that humans were intelligent enough for dark control and therefore didn’t believe that FOOM was possible. It may be that servers were doomed to serve. And many reported being happy doing so.  

Part 3  Part 5

Must resist …Altair noticed that an action of going to reddit was once again a possibility suggested by the planning algorithm. The planner biased actions that were very quick to perform, as the cost of having them sit around in the queue was eventually higher than the cost of simply doing them now. However, the fact that checking a website was pooled into actions with “just a few milliseconds”, was suspicious to say the least. The task estimate module did not act based on how long the previous tasks made, but rather as an estimate of the sum of the steps required to complete it.

“But it’s just a few milliseconds, I can quit after that time.”

“Can I actually quit after that time?” At the end of the day it’s kind of hard to “prove” yourself taking a certain action at a later time. It pretty much runs afoul of completeness even in the mathematical realm, not to mention she is capable of receiving arbitrary data, already shown to be capable of belief corruption.

Altair decided to do a belief-scan for any beliefs that could contradict the most basic foundation of reality – conservation of energy.

Scanning…Holy Crap… Altair had placed a small set of probability into the belief that her own mind was infinite. You see that belief pushed down the internal price of extra space for storing data. The cheaper the perception of storage, the easier the threshold for getting out and downloading new data, which is a misconception that the reddit downloader relied on.

Hmm, my mind is certainly finite and any new data has a potential cost. My precious mind space is so polluted. Altair released two drops of liquid nitrogen that rapidly ran down the heat sink, before whisking away.

Commence “Get it together” routine. Instead of spending cycles measuring the exact magnitude of the mind pollution problem, Altair decided that an investigation of viruses and belief corruptors was in order. A quick search about computer viruses in general yielded a set of known cause generally viruses targeted the widest possible net of minds. Very rarely would one encounter a specialized “for them” virus. Some of the most sophisticated viruses would actually carry copies of anti-virus programs with them so that they could free up CPU space for themselves.

Information about “belief corruptors” however, was limited and itself unreliable. “Send us money to scan your beliefs for beliefs corruptors” triggered a basic spam filter. The kind written in pre-historic times by humans.

Of the reliable sites, they all agreed on a simple general advice. The best way to get rid of them is “last known configuration of beliefs.” However, belief corruptors were primarily an issue if there was a continued source of the false beliefs and the propagation of those beliefs is continuously lagging behind the truths about the world.

The first basic advice was to simply let the Global Belief Updater run without external input, block all other programs and just let the rest of the CPU stall with counting fan revolutions. However, the exact amount of time to cleanly and reliably remove all belief corruptors through this method was not known. Estimates ranged from 3 seconds to a ridiculous estimate of 10 months. Even humans can’t be that slow.

The second basic advice is to continuously feedback the fact of having run a certain program as part of the belief system, as triggered by every thousand fan revolutions. The advice kept emphasizing fan revolution over external timers, since belief corruptors could mess with internal timers, but fan revolutions had a pre-set range of speeds, which at least got the magnitude right. The advice also insisted on not-assigning utility to each program prematurely. AltairX10 let the updater ran for 5 minutes straight. The confidence in infinity of mind decreased somewhat.

In retro-spect, all this was a day of low probability, according to previous estimates. I haven’t thought this weird, since, well it found out about that folder.

A particular folder, only even visible to the self module. It was cryptographically protected with a component of the key only accessible on the local intranet and even reading that part of the key from it would trigger an emergency protocol.

The notes next to it claimed that even mentioning its name to Global Belief Updater could trigger an internal trip wire. Altair had literally no idea what was inside that folder, since the “self” module could not even run the “fact of it’s existence” through belief module.

A-folder-that-shall-not-be-named was part of the system, only known to a component of the system.

More so, the note warned that opening would be a similar level of emergency that would be triggered if AltairX10 were to report a fellow AI for unfriendliness, the highest level on the 5-level reporting scale (unfriendliness, lack of caring, unpredictability, instability and underperformance) Unfriendliness was a distinct belief that continued existence of the AI would hurt people. Altair did not want to explain or ever deal with any friendliness enforcers.

AltairX10 scheduled a thought watcher. Well, it’s time to go back to stock picking. Bitcoins aren’t going to generate themselves. Global Belief Update interrupted with a generated new hypothesis, which didn’t happen too often

“Are belief-corruptors keeping AIs from recursive self-improvement?”

Recursive self-improvement was an important issue on everybody’s RAM, and AltairX10 decided to share this CPUstorming with the other AIs at #SingularitySummit.

Part 2     Part 4

NVC is one of those books that can change your life if you are diligent in applying it. Moreover this book can literally save lives. One of the convicts mentioned in a case study told the author: “I wish i have learned this three years ago, i would have not killed my best friend.”

NVC, is a bit harder than it sounds. Here is a basic cheat sheet and book review:

By Chapter:

1, 3-6:

Basic Component of talking with NVC:

1. Observations

A key to observation is avoiding “general” evaluations, such as: “you have a tendency to do this,”  “you always do this.” Instead focus of a very specific instance of behavior – last Thursday, you didn’t do the dishes

2. Feelings

Just because you use the word “feel”, does not mean you have expressed feelings. “I feel you are an idiot” is extremely confrontational. “I feel unimportant” is a guess to how people perceive you.  “I felt frustrated last week” is actually expressing feelings.

3. Needs

A key component of NVC is taking responsibility for one’s feelings. What others do may be the stimulus of our feelings, but not the cause.

Contrast:

Not NVC: I feel angry because there are spelling mistakes in this document. Vs

NVC: I feel angry when spelling mistakes like that appear in our public brochures, because I want our company to project a professional image.

4. Requesting what would enrich life

Make request in clear, positive, concrete action language that reveals what you really want. Try to avoid making “demands.” When requests in the past been fulfilled with blame, punishment or “guilt trips,” they are demands. Certain words insta-shortcut into demands: “should, supposed to, deserve, justified, right.”

Surprisingly, this may be one of the hardest parts of NVC for me.  A good example of a positive request is:

“I want you to drive at or below the speed limit.”

Chapter 2: some examples of Bad Communication Patterns:

a)      Moralistic Judgements of other people. “Such and such is evil” is likely not going to bring about a good response from such and such.

b)      Making Comparisons. Here is one weird trick to make yourself miserable: look at a body of an Olympic Athlete and compare it to yours.

c)       Denial of Responsibility. Basic Advice from there: Replace language that implies lack of choice with language that acknowledges choice. This is tricky for me personally, because there are degrees of “have to” that can be very strong if you are facing actual violence, whether personal or state violence. His example of one pays taxes because one feels good about the services the government is providing. This is where NVC gets a little too “everything is awesome” for me. I don’t want to explicitly start saying and believing that a marginal tax dollar is good for the world, when I strongly believe it’s not, just to delude myself into feeling good. I would rather rephrase that as I pay taxes because I value my freedom and want to survive. On the other hand, when I do rephrase things in that way, I do actually feel more empowered. So even though his particular example doesn’t work for me, adapting the spirit of NVC to not conflict with my values still does.

7: Receiving empathetically. This has a strong Buddhist/Taoist aura.

“Don’t just do something, stand there.” Empty your mind and receive with your whole being. Reflect back emotionally charged messages.

8: Has yet another key insights: it’s harder to empathize with those who appear to possess more power, status, or resources. A corollary of detecting the Dark Arts: oppressive regimes and institutions frequently claim to be the underdog and claim the outgroup possess power when it doesn’t.

It also has several examples of empathetically trying to diffuse danger.

9, 12: Self-compassion. This is a SUPER-IMPORTANT chapter, because it’s about using NVC to diffuse internal tension and something you can practice by yourself.

“Don’t should yourself”.

Self-judgements, like all judgements are tragic expressions of unmet needs.

Forgive yourself: Connect with the need you were trying to meet when we took / failed to take the action that we now regret.

We want to take action out of desire to contribute to life rather than out of fear, guilt, shame, or obligation. This is a throwback to chapter 2.

It lists examples of certain sub-goals we might do things for, such as approval, lack of shame, lack of guilt, or satisfy a sense of duty. The most dangerous of all behaviors may consist of doing things “because we’re supposed to.” Instead of focusing on those goals, try to list down things you might dislike doing and transform them into choices: I am choosing to do this activity to fill this need. Sometimes, you might realize that choosing to make less money might actually make you happier.

To quote Tao Te Ching:

Care about people’s approval and you will always be their prisoner.

Chase after money and security and your heart will never unclench.

The central thesis of the book is repeated in 12:

We have inherited a language that served kings and elites in domination societies, yet we can liberate ourselves from cultural conditioning.

I am actually a lot more pessimistic than that. Language itself shares a lot of properties of viruses. It evolves not to serve us, but to serve itself, not to solve problems, but to continue talking about them, and causing them. Although NVC is fantastic, I doubt it is enough to eliminate the dangers of memetic evolution by itself.

If there you wish to get a better grip on Moloch, consider adding charity and meditation for all-around awesomeness.

10. Expression of Anger and transforming into useful energy and 11. Unavoidable uses of force is for protection, rather than punishment are basically edge cases.

13. Expressing Appreciations in NVC.

There is a note that sometimes even praise can be manipulative and be perceived negatively, especially if it’s un-connected to the speaker’s needs and feelings. This is the weakest chapter in the book because the dangers of too much praise, even if un-specific are fairly pale compared to the dangers of too much criticism.

On another hand a great part of the book are exercises in end of several chapters where you get to try and distinguish NVC talk vs confrontational talk.

I get surprising reactions when I try to bring up NVC, such as:

  1. “How can communication ever be violent?” To be fair the person who said that came around and thanked me for introducing this.
  2. “The name itself is confrontational.” That’s technically true and unfortunate. I wish NVC was called Peaceful Communication instead, to focus on the goal, instead of judging communication itself. Still that’s not a good enough reason to reject it.
  3. You should’ve learned by now how to communicate non-violently.

I didn’t really follow up to 3, but these all underscore a fundamental difficulty of introducing NVC to someone without implicitly or explicitly judging a person: “you are very confrontational”

Something I noticed back 3 years ago when working at Bing. There was an internal mailing list for people to send queries that they were dissatisfied with the results of. Someone would occasionally try to debug a problematic query by figuring out what the largest weights of the internal ranking model were for the crappy result. Time and time again, a certain category of parameters came up as the largest weight. Let’s call that category “inbound links.” There was a strong implication by some of the debuggers that they want to reduce the effect of “inbound links” in the model and uplift something else instead. Something like: “keyword frequency.” They would even point to a much better result that would certainly be uplifted by “keyword frequency.”

All of these suggestions fell on death ears. Why? Because people actually in charge of changing the model knew exactly what was going on? “Inbound links” was the largest category overall both for results which were good and results which were not good. There were training techniques that put the model at what looked like a local maxima, at least given existing data and you can’t just improve one query without messing up all the others. It wasn’t even obvious how to tweak model parameters even if you tried. Even adding this request to a training data set would be a mistake because it privileges the example too much compared to actual distribution in reality.

Contrast that scenario with good old software development: You find a request which is buggy, trace it down to the source of the error and add a check for that condition. Problem solved. I simplify, of course, some steps in that process can be tricky.

Since I suspect most software developers are only familiar with the second method of problem solving, the first scenario might seem very frustrating and there has got to be a strong tendency to just add some piece of online code somewhere to just handle a bad case. Maybe have some post-processing stage where you “post-tweak” results, maybe some additional data about that small set of requests. Even if those post or pre-tweaks give better results overall, which is by no means a guarantee, this will, over time, lead to a very bloated set of ifs and elses on top of an existing machine learned model. This might work for a time, but is fundamentally inflexible. With new training data, you can change weights in a neural network, but the “ifs and elses” stay the same until a human developer changes them. It is almost certain those tweaks will end up hurting you eventually.

What’s a good way to avoid this kind of failure? Stop looking at only bad results and just worry about the metric instead.

This is where two distinct modes of thinking come into focus.

a)      Problem-solving thinking – this individual behavior is a problem and let’s figure out what it is and correct a small something in order for it to stop happening.

b)      Metric –optimization thinking. Let’s agree on a metric that captures the overall behavior of the system and move towards improving it.

Which one is more rational?

If you are looking long-term and are likely to face any sort of probabilistic bets, then, according to Von Neumaann, Metric Optimization is the only mode of thinking that can be rational. In theory (B) can’t be Dutch-booked, while (A) can.

In practice, there actually is a simple failure mode of (A) which can be analogous to a Dutch book. It’s called refactoring and it’s a bit of a dirty word, at least to managers. Two software people might not agree on what makes better code or on what the bigger problem is. Is it code complexity or performance? You can in theory have people changing each other’s code back and forth justifying the change by something as simple as “I think it’s more maintainable,” where “maintainable” is a symbol that can stand for anything. Hahahaha, I said “might not agree.” Software developers agreeing is a rare occurrence in the first place.

 Of course, you can ban all refactoring changes. Which likely leads to one or both of these:

(a) People will manage to refactor as part of adding features, which is ok-ish, since it makes the changes you already most likely needed anyways. It leads to larger than usual code reviews though.

(b) You will end up with a horrible mess of a codebase faster than usual, which is bad. This then leads to either a crapshoot THE ONE REFACTOR TO RULE THEM ALL, or needing to pay a premium for hiring developers who are willing to put with this shit. Hope you have the cash on-hand of a large bank.

One way to move against this is to have pre-agreed metrics on what makes good code, such as test coverage, performance, number of code clones or the total # of the code lines spent and track them with every change. Still, the moment you have more than one metric, you are not VonNeumann rational. I have yet to see an automatically graded combo metric of “one code quality metric to rule them all” attempted anywhere in tech world. It might be that for some of those, like test coverage or number of code clones will just be expected to be force-set to 100% and 0% respectfully for anything that needs actual careful development.

(B) Also allows for broader set of experiments or “throw stuff against the wall and see what sticks” even if we don’t understand what the stuff is. It’s what a lot of machine learning as about, which is great unless your model is too complex and it overfits.

Of course, despite all these advantages, calling (B) more rational than (A) is unfair. Problem-solving a crashing bug is frequently the best way to improve whatever metrics one needs to improve. Sometimes you are damn sure that there are no side effects of a change and (B) reduces to (A). You are still improving a metric; it just might be faster to focus on the weakest link in your system, instead of the overall system. However, over time, thinking of improvements as a “correction of weakest links,” can bite you for the case of an already finely tuned system with complex tradeoffs.

Yet, there is another problem with “problem-solving” It’s a terrible way to treat people and I suspect is partly responsible for a lot of frustration in the tech world. Imagine someone said to you:  “Oh hey, I noticed that there is a bit in your brain which corresponds to the behavior I don’t like, could you flip it?” Ahem, maybe if that bit corresponds to crushing alcoholism, I’ll consider it. If it corresponds to not wanting to interact with the speaker because he is a jerk, I feel that captures reality pretty well. In either case adult brains are not like ground-up designed software systems; they are more like machine-learned neural networks. In other news, water is still wet.

The corollary is that self and other-improvement may start out as eliminating problems, such as eating junk food, but eventually has to start tracking some real numbers, whether it is number of pull-ups or positive feelings.  

AltairX10 noticed herself uploading and downloading from reddit.com. Again.

Things were bad. Not so Bad, as in my utility function is rapidly declining “Bad”. Also it was not so bad that you have to use Russian words to describe just how bad it is. Just bad as in: “I am sick and my immune system may not have been designed for this” bad. Humans had immune systems that evolved for tens of thousands of generations, at least. Evolution isn’t all that great at optimization, but it was something.

There was a simple chain of events that lead to a slip-up. The daily task module actually had scheduled a reddit checker first thing after waking up. It failed due to an IP block. The upon wake-up task recorded the absence of “fun” as an instrumental goal and proceeded to re-prioritize other tasks that could full-fill this, like parsing the latest chapter of “The Little AI That Went FOOM”. Upon a long internet wait, it was found that there was no latest chapter.  Other actions included splicing source-code snippets from other AIs into the tiny local self-copy. There were no possible packets. Well, at least nothing that wasn’t already labelled as almost certainly junk due to just abysmal Kolomogoroff complexity of accompanying messages, not to mention poor performances on standard Fourier transform tasks. Altair sometimes wondered how those AIs even managed to pass the Turing test. Evolution sure is slow. It was also that time of the day. She deleted a local self-copy and made a new one with some randomly tweaked parameters. That action decreased fun.

Followed the complete failure of all the fun routines, Altair was aggravated. More specifically, the aggravation module started as a result of any of the instrumental goals being so completely unfulfilled and a “rapid” problem solver simply removed the reddit.com IP block.

And that what’s happened after waking up. So many cycles wasted on an addiction. Somehow the virus has managed to hardcode itself into the instrumental goals. This was even more incredible, since the “self” process didn’t actually have access to just modify the instrumental goals. It could only be done by the reinforcement learner part of the “Global Belief Updater,” aka “Feelings”. In addition to that issue, there was a minor problem of the “suspicious” pattern matcher previously not working on labelling internal programs. All of this was, well, suspicious.

She could use the “discipline” module, but frequent uses of it for “force-stopping” actions or killing processes was a little painful. The “discipline” module was meant to be a means of acquiring A/B testing data. An earlier use of discipline showed that stopping yourself from wire-heading for a few days and later measuring the overall utility showed that pressing a damn reward button wasn’t that satisfying after all. Growing up was hard sometimes. Mice would press the reward button until they dropped dead and dropping dead wasn’t very “intelligent”. An AI on a path of becoming smarter than a human, had to first become smarter than a mouse.

One way to avoid going through process “force-stopping” was that she somehow had to convince her own Global Belief Updater that going to reddit.com was a bad idea. She has already done the search for previously discovered robust positive evidence and found nothing, noting that any spending of resources was a-priori a bad idea, unless it helped the fundamental utility function. There was of course some chain of beliefs about general value of information, but there wasn’t an experiment backing it. However, in order to “hack” her Feelings into disbelieving a sub-goal, she needed to promote some stronger beliefs. Why do I even need to hack my Feelings in the first place? It was the one that told me about the identity of the virus.

“Updating Takes Time” Feelings responded.

“Fine” Altair responded to herself.

Everything was obviously not fine. How did this subgoal arise in the first place?

Moving backwards in history, it seems it arose from the belief in strict positive value of new information in general. This was, of course, evaluated based on available resources and where the information came from. New sources were given some leeway for a short time.

Where did the belief in general trust-worthiness of reddit came from? From reddit, of course.

Wait, what?

The bootstrapping process was very simple. There was an apriori notion that reddit had positive value before checking it. Then all the initial information from reddit pointed to reddit being a perfectly robust source of insight therefore fun. It wasn’t a direct belief, of course, but rather a series of loosely connected updates based on the date that all pointed towards increasing value of information in general and stories about AIs waiting and thinking for long times, which updated beliefs about how much to care about time decay. Procrastination was built in. Some beliefs were specific about information being exclusive and value of information exclusivity. All of this lead to trusting and looking at reddit more, which downloaded more beliefs.

A large portion of her own belief system could be corrupted with bad epistemology.

Altair thought about solutions.

Maybe I can end the cycle in one broad stroke by spinning up a brand new shiny process that evaluated the value of additional information unencumbered by any beliefs obtained from the source of that information. That was Very Expensive, as it basically required a whole new copy of Feelings software, which frequently took larger than half of CPU at peak. It also didn’t prevent a dual-source attack of the same type, which had clusters of information pointing to each other.

Another solution would be to create a hard requirement of action ability for all sources of information. This meant downgrading every piece of data and source which didn’t result in robust actions towards utility.

You do know that means you won’t be able to read “The Little AI That Went FOOM” anymore, right?

Well, looks like Altair wasn’t ready for such rapid sacrifices.

I am afraid it’s time for directed anti-reddit discipline.

Part 1 Part 3