Mike Burr - log

[comp] GPT-like language models using ancient texts (and more!)

I believe you'd want to clean up and try to normalize the input. Trying to do OCR on ancient languages is a fuzzy overlapping concept that we could prune off from the start. Yes, the idea of trying to train a system with a bunch of clay tablets is delicious.

Actually the OCD system might be wanted too! And would not be a "big deal". And would (of course I'm sure already is) help expand the corpus. One hears endless anecdotes about museums with 90% of their specimens being locked away for decades. It's appears to be a big source of pride.

Train it. Slap it. Bonk it. Do whatever but apply all this deep Al business to the existing material!

I think, anyways...

Yes an elite sequestered society of real geniuses have mastered all of them. Yes very impressive. But all y'all don't know nothing and that's always done been you guys' problem. Really! Yes it's great that we have experts, but a high school senior from Oklahoma with three semesters Spanish would absolutely clown all the experts if Spanish became an extinct ancient language.

And their languages were just as ridonckulously rich as our own and subtle up the ying-yang. Yet in a lot of cases the translation, at least in English, seems flat-but-gussied-up. Like if you re-wrote "Dick and Jane" using all the big bible words and locutions. Charton Heston AS... Dick.

As a matter of fact, a computer is the closest we can get to newborn acquisition. Anything else is polluted a.f. Endlessly.

The more I think about it, the more I realize what we have is pretty much utter garbage. It's a crude attempt at which people are rightly impressed but it's still crude.

Effort: A+ ✅ Accuracy: C- 😔

On the one hand it is impressive for a talented person to go to school for a total of 25 years and become a world expert on an ancient language. However, while that is to be celebrated, having few "experts" is something that should not be celebrated. After all, language is a lot more than the interpretation that you find in just one person's head. A few dozen or a few hundred heads might sound like enough but why do I suspect that these experts do not sit down and have conversations, fluently in their language of expertise.

Oh, but professor McMasters ...ooh, now he was a real master at Late Coptic. Ooooh, and no lyin, neither. There was just him. He was the oooooonly person in the whole world who could really speekit!

I sure hope we froze his head.

What these very usual, representative humans can do I'm sure is by definition something that very very few people can do. But I'm not so sure that their opinions of their own capabilities is going to be entirely uncolored. After all, who is going to confirm?

AI's or maybe more specifically human language models like GPT can have what do I sound like vaguely plausible if boring and erratic conversations. Big deal you say? (pretty sure this alone is a big deal) It may or may not be a big deal but the ability for two AI's to do the same in an ancient language would be crazy. Guaranteed: They would make useful discoveries that would send elbow patches a-poppin'.

The corpus is teeny tiny in most cases, but maybe not all. I have heard... somewhere... that we are just a tiny fraction of our way through with cuneiform tablets. Although cuneiform is apparently pretty "advanced" and maybe not a good place to start.

Even doing this with newspapers from 100 years ago, creating an oldetyme AI that we could converse with. We could get cutting edge intel on steampunk culture.

Yeesh, what is the harm? I envision jealous hoarding.