[idea] Al Conan
Conan O'brian, hate him or not, he is
- Very rando quite often.
- Is some percentage "just a guy who makes silly faces".
- Has as his goal "making you laugh", which AI has been able to do very well also, literally from the very beginning.
- Would pick up on, somehow, the great appeal of the idea of being the first person in history to have an Evil AI Doppelgänger, certainly as a Harvard man.
and also has
- Thousands and thousands and thousands of hours of training data, along with possibly helpful contextual "metadata" ("is he speaking to a crowd now or to an individual?", "Is he the interviewer now or the interviewee"...) and possibly/probably transcripts off all of this. It may be possible for example to match a certain facial expression with a point in a sentence. Or as a reaction to a word.
Alll Kiinds of stuff for an AI to chew on.
The pieces:
- Video: We don't expect to create a physical Conan bot, but that's actually in our favor here. It would be unnerving for Joe Average to meet Conan in person, but a "scene" could be scripted where a known-fan is "Facetimed" out of the blue, as part of a bit. "Yes, this is really Conan." And this is probably the easiest part.
- Audio: We probably should consider "A" and "V" to really be one thing: "AV" would be a good name. Trying to artificially synchronize generated audio and generated video that are output by different systems seems impossible. Let's just focus on: train Al on a bunch of AV, try and see what you can generate. If it's wacky and off-kilter, then so much the Conan.
- Dialog: It's _Di_alog, remember. We're creating a turing bot here. Did I not mention? This is hard but think of this strategy: Conan asks you a question (GTP3), you respond while he nods (may butt in, but the system will have to take some risks here. It would have to be a plausible interruption; not random.) and then (easier to detect) the system makes Conan go off in a diatribe that at least starts out as a reaction to your response (GPT3).
Open problems:
- How do you teach Al the difference between "a mannerism" and "crow's feet". Specifically, how do you take training data that's of the same person, but inevitably that person has aged over the decades. And so the AI needs to know what difference to attribute to age and what to expressions or mannerisms. It some how needs to know "this is the same person, but at a different age, depending on sample datetime. there's this thing called aging, see..." How can you capture that idea in an algorithm?
- He's probably scared.
Conan has an amazing schtick, but it's formulaic enough for Al to throw down a pretty good gauntlet.