This is the kind of punishingly stupid comment that could only have been written by someone labouring under some form of neurodivergent impairment. To anyone thinking of criticising Reader for this embarrassing post therefore - forgive him. He doesn't understand how embarrassing it is to publicly announce that you are going to write off an entire article (from a writer whose first language almost certainly isn't even English) because of a single typo. He doesn't have the capability.
Good question, but i'd argue no. Because latent spaces are interpolatable spaces of possibilities this goes directly against the purpose of a canon: Scripture that is fixed and sacred.
However, in practice all canonical scripture emerged from a highly fluid body of text, mostly from the oral tradition. Take Homer: We don't even know if a guy like 'Homer' even existed in ancient greece and it's very very possible that the mythologies of the Iliad and The Odyssey stem from a thousand folk songs and fairy tales that went around in paleolithic greece when then crystallized into those myths and at one point were written down, rewritten again, copied copied copied and then emerged as "THE ILIAD" and "THE ODYSSEY", the historical canonical texts we read and comment about until this day. Same thing goes for the bible and all other scripture.
To me, it seems that the Internet is perfect for providing the communicative soil that is necessary for myth production, the pre-stage of canonization. Training an LLM on the web is like training an LLM on pre-homer greece: You'd get all the myths and all the songs, you could interpolate Medusa and Zeus and get a lightning-snakehead-hybrid, and you could also, with the right prompt, get THE ILIAD and THE ODYSSEY, likely in their canonized form.
The question is if that's the same?
As i said, one of the crucial elements of a canon is that it is not changeable, and a canon is more than text: A canon is also all the rituals, the folk dances, the canonical songs. This unchangeableness is there to make knowledge and wisdom of the past accessible to generations of the present, but also to perform a commitment to values and traditions of a given society. Usually we do this by religious rituals, but also in more mundane or folksy traditions. From that perspective, an LLM can't provide that framework of societal collective commitment to a whole set of stories, narratives, rituals and laws.
But the basic job of a canon is not only commitment, but also communication between that cultural memory and the generation of the present. Any political struggle which leads to protest on the streets is a struggle about the canon: The canon says we always did this or that in this or that way, it is written and thou shall obey. Nope, says the present generations, your canon sucks and it did not account for this or that new thing, circumstance, groups of people, affordings of the environment, or whatever. So then we change one of the canons: The Law. (Did you recognize that some of our laws are inscribed in monuments? That's cultural memory, you can't change the engraved scripture on a monument, it's literally made to last a thousand years.)
In that communicative sense of a canon, where the past speaks to the present, an LLM could be a "canon" of sorts, because if you train an LLM on the times of Homer, you'd get a latent space that provides "what people were thinking and doing back then", and then you could "talk" to that space.
But you'd still not get the "fixed scripture"-thing out of it, so it would be not so much of a canonization, but a space that *contains* that canon, almost by accident, but also any other fantasy-canon you could come up with.
So, no, i guess not, i guess the inherently changeable nature of anything digital goes very much against the grain of canonization.
But thanks for that comment, this is worth thinking about.
The topic of building cultural memory in a time of culture wars intrigues me. I agree that internet-mediated relations don’t really help to build that shared memory: too much stuff, too quickly flows by our eyes and fingers.
Does it follow then that any significant cultural movement must offer people some form of non-digital sociability as a condition of mobilisation (I.e take the streets on demonstrations, or get together in person in small groups, yearly events/conferences, …)?
Well written, thanks!
the past tense of teach is taught, not teached. I stopped reading at that barbarism on the fourth word of the third line.
This is the kind of punishingly stupid comment that could only have been written by someone labouring under some form of neurodivergent impairment. To anyone thinking of criticising Reader for this embarrassing post therefore - forgive him. He doesn't understand how embarrassing it is to publicly announce that you are going to write off an entire article (from a writer whose first language almost certainly isn't even English) because of a single typo. He doesn't have the capability.
Happy easter to both of you :-*
Can the internet archives be kind of canonized by training a LLM on it?
Good question, but i'd argue no. Because latent spaces are interpolatable spaces of possibilities this goes directly against the purpose of a canon: Scripture that is fixed and sacred.
However, in practice all canonical scripture emerged from a highly fluid body of text, mostly from the oral tradition. Take Homer: We don't even know if a guy like 'Homer' even existed in ancient greece and it's very very possible that the mythologies of the Iliad and The Odyssey stem from a thousand folk songs and fairy tales that went around in paleolithic greece when then crystallized into those myths and at one point were written down, rewritten again, copied copied copied and then emerged as "THE ILIAD" and "THE ODYSSEY", the historical canonical texts we read and comment about until this day. Same thing goes for the bible and all other scripture.
To me, it seems that the Internet is perfect for providing the communicative soil that is necessary for myth production, the pre-stage of canonization. Training an LLM on the web is like training an LLM on pre-homer greece: You'd get all the myths and all the songs, you could interpolate Medusa and Zeus and get a lightning-snakehead-hybrid, and you could also, with the right prompt, get THE ILIAD and THE ODYSSEY, likely in their canonized form.
The question is if that's the same?
As i said, one of the crucial elements of a canon is that it is not changeable, and a canon is more than text: A canon is also all the rituals, the folk dances, the canonical songs. This unchangeableness is there to make knowledge and wisdom of the past accessible to generations of the present, but also to perform a commitment to values and traditions of a given society. Usually we do this by religious rituals, but also in more mundane or folksy traditions. From that perspective, an LLM can't provide that framework of societal collective commitment to a whole set of stories, narratives, rituals and laws.
But the basic job of a canon is not only commitment, but also communication between that cultural memory and the generation of the present. Any political struggle which leads to protest on the streets is a struggle about the canon: The canon says we always did this or that in this or that way, it is written and thou shall obey. Nope, says the present generations, your canon sucks and it did not account for this or that new thing, circumstance, groups of people, affordings of the environment, or whatever. So then we change one of the canons: The Law. (Did you recognize that some of our laws are inscribed in monuments? That's cultural memory, you can't change the engraved scripture on a monument, it's literally made to last a thousand years.)
In that communicative sense of a canon, where the past speaks to the present, an LLM could be a "canon" of sorts, because if you train an LLM on the times of Homer, you'd get a latent space that provides "what people were thinking and doing back then", and then you could "talk" to that space.
But you'd still not get the "fixed scripture"-thing out of it, so it would be not so much of a canonization, but a space that *contains* that canon, almost by accident, but also any other fantasy-canon you could come up with.
So, no, i guess not, i guess the inherently changeable nature of anything digital goes very much against the grain of canonization.
But thanks for that comment, this is worth thinking about.
The topic of building cultural memory in a time of culture wars intrigues me. I agree that internet-mediated relations don’t really help to build that shared memory: too much stuff, too quickly flows by our eyes and fingers.
Does it follow then that any significant cultural movement must offer people some form of non-digital sociability as a condition of mobilisation (I.e take the streets on demonstrations, or get together in person in small groups, yearly events/conferences, …)?