Lvxferre [he/him]

I have two chimps within, Laziness and Hyperactivity. They smoke cigs, drink yerba, fling shit at each other, and devour the face of anyone who gets close to either.

They also devour my dreams.

  • 4 Posts
  • 528 Comments
Joined 2 years ago
cake
Cake day: January 12th, 2024

help-circle

  • Yeah, got to borrow some word from discourse analysis :-P

    It fits well what I wanted to say, and it makes the comment itself another example of the phenomenon: that usage of “utterance” as jargon makes the text shorter and more precise but makes it harder to approach = optimises for #2 and #3 at the expense of #1. (I had room to do it in this case because you mentioned your Linguistics major.)

    Although the word is from DA I believe this to be related to Pragmatics; my four points are basically a different “mapping” of the Gricean maxims (#1 falls into the maxim of manner, #2 of manner and relation, #3 of quality, #4 of quantity) to highlight trade-offs.


  • To be clear, by “communication” I’m talking about the information conveyed by a certain utterance, while you’re likely referring to the utterance itself.

    Once you take that into account, your example is optimising for #2 at the expense of #1 — yes, you can get away conveying info in more succinct ways, but at the expense of requiring a shared context; that shared context is also info the receiver knows beforehand. It works fine in this case because spouses accumulate that shared context across the years (so it’s a good trade-off), but if you replace the spouse with some random person it becomes a “how the fuck am I supposed to know what you mean?” matter.


  • I believe that good communication has four attributes.

    1. It’s approachable: it demands from the reader (or hearer, or viewer) the least amount of reasoning and previous knowledge, in order to receive the message.
    2. It’s succinct: it demands from the reader the least amount of time.
    3. It’s accurate: it neither states nor implies (for a reasonable = non-assumptive receiver) anything false.
    4. It’s complete: it provides all relevant information concerning what’s being communicated.

    However no communication is perfect and those four attributes are in odds with each other: if you try to optimise your message for one or more of them, the others are bound to suffer.

    Why this matters here: it shows the problem of ablation is unsolvable. Even if generative models were perfectly competent at rephrasing text (they aren’t), simply by asking them to make the text more approachable, you’re bound to lose info or accuracy. Specially in the current internet, where you got a bunch of skibidi readers who’ll screech “WAAAAH!!! TL;DR!!!” at anything with more than two sentences.

    I’d also argue “semantic ablation” is actually way, way better as a concept than “hallucination”. The later is not quite “additive error”; it’s a misleading metaphor for output that is generated by the model the same way as the rest, but it happens to be incorrect when interpreted by human beings.


  • Link to the archived version of the article in question.

    I actually like the editor’s note. Instead of naming-and-shaming the author (Benj Edwards), it’s blaming “Ars Technica”. It also claims they looked for further issues. It sounds surprisingly sincere for corporate apology.

    Blaming AT as a whole is important because it acknowledges Edwards wasn’t the only one fucking it up. Whatever a journalist submits needs to be reviewed by at least a second person, exactly for this reason: to catch up dumb mistakes. Either this system is not in place or not working properly.

    I do think Edwards is to blame but I wouldn’t go so far as saying he should be fired, unless he has a backstory of doing this sort of dumb shit. (AFAIK he doesn’t.) “People should be responsible for their tool usage” is not the same as “every infraction deserves capital punishment”; sometimes scolding is enough. I think @[email protected]’s comment was spot on in this regard: he should’ve taken sick time off, but this would have cost him vacation time, and even being forced to make this choice is a systemic problem. So ultimately it falls on his employer (AT) again.








  • In addition to all of that, since your comment is spot on:

    When people claim some variety is more conservative than another variety, they tend to cherry pick a lot. It’s easy, for example, to look at rhoticity and claim “American English” is more conservative, or to look at the cot/caught merge and claim British English is more conservative. But neither claim is accurate or meaningful; and when you try to look at the big picture, you notice changes everywhere.

    To complicate it further, neither “British English” nor “American English” refer to any actual variety. Those are only umbrella terms; they boil down to “English, arbitrarily restricted to people who live in the territory controlled by that specific government”. And the actual varieties that they speak might keep or change completely different features.


  • Backstory of the spelling of that word:

    Latin colōrem (accusative of color) gets inherited by Old French as color /ko’lor/.

    Somewhere down the line Old French shifted /o/ to /u/. I believe this shift affected at first stressed vowels, or that the distinction between unstressed /o/ and /u/ was already not a big deal; so there was more pressure to respell the last (stressed) vowel than the first (unstressed) one. So the word gets spelled color, colour, colur.

    Anglo-Norman inherited this mess, spelling it mostly as colur. Then Middle English borrows the word, as /ku.'lu:r/~/'ku.lur/. It’s oxytone in AN, but English has a tendency to shift the stress to the first vowel, creating the second pronunciation. Spelling as usual for those times is a mess:

    • colur - spelled like in Anglo-Norman.
    • color - swap the ⟨u⟩ with cosmetic ⟨o⟩. Scribes hated spelling ⟨u⟩ in certain situations, where it would lead to too many vertical lines in a row; that’s why you also got come, love, people instead of cume, luve, peuple.
    • colour - mirroring an Old French spelling that was more common up south, around Paris.
    • coloure - that ⟨e⟩ was likely never pronounced, I think it was there to force reading the previous vowel as long
    • coler - probably from some /'ku.lur/ pronunciation already reducing the vowel to */'ku.lər/
    • kolour - ⟨c⟩~⟨k⟩ mixing was somewhat common then. And no, KDE did not exist back then, they did no lobby to spell the word with a K for the sake of a program that would only appear centuries later (Kolourpaint).

    Eventually as English spelling gets standardised, the word settles down as colour.

    Then around 1800, Noah Webster treats this word as if it was directly borrowed from Latin. Since in Latin it’s color, he clipped the -u. And his dictionary was popular in USA, recreating the mess, even after it was already fixed.






  • Lvxferre [he/him]@mander.xyztoMemes@sopuli.xyzInternet picture of a monkey
    link
    fedilink
    arrow-up
    16
    arrow-down
    1
    ·
    edit-2
    2 months ago

    You know what, I got a brilliant idea:

    See, the chimp in my avatar is called Ai Ai. Was? I don’t know if she’s still alive; last news I could find about her are from 2005, when she stopped smoking. Anyway, what if I had artificial intelligence to create a bunch of her pictures, and sold them as NFT? The “AI Ai Ai collection”, or Ai³ for short. I wouldn’t do this to scam a bunch of suckers, noooooo; I’d do it because you can get rich, if you “invest” into my collection: buy an Ai³ NFT now, for just 100 euros. Then resell it for a thousand euros, for mad profitz!!!

    […I’m obviously joking. C’mon, this summer is easily getting past 30°C, in a city where it used to snow once in a blue moon. I definitively don’t want to feed the global warming further with dumb crap like this.]


  • I’m still reading the machine generated transcript of the video. But to keep it short:

    The author was messing with ISBNs (international standard book numbers), and noticed invalid ones fell into three categories.

    • Typos and similar.
    • Publishers assigning an invalid ISBN to the book, because they didn’t get how ISBNs work.
    • References "hallucinated"¹ by ChatGPT, that do not match any actual ISBN.

    He then uses this to highlight that Wikipedia is already infested by bullshit from large “language” models², and this creates a bunch of vicious cycles that go against the spirit of Wikipedia of reliability, factuality, etc.

    Then, if I got this right, he lays out four hypotheses (“theories”) on why people do this³:

    • People who ignore the limitations of those models
    • People seeking external help to contribute with Wikipedia
    • People using chatbots to circumvent frustrating parts of doing something
    • People with an agenda.

    Notes (all from my/Lvxferre’s part; none of those is said by the author himself)

    1. “Hallucination”: misleading label used to refer to output that has been generated the exact same way as the rest of the output, but when interpreted by humans it leads to bullshit.
    2. I have a rant about calling those models “language” models, but to keep it short: I think “large token models” would be more accurate.
    3. In my opinion, the author is going the wrong way here. Disregard intentions, focus on effect — don’t assume good faith, don’t assume any faith at all. Instead focus on the user behaviour; if they violate Wikipedia policies once warn them, if they keep doing it remove them as dead weight fighting against the spirit of the project.