• 0 Posts
  • 21 Comments
Joined 1 year ago
cake
Cake day: June 27th, 2023

help-circle
  • mm_maybe@sh.itjust.worksto196@lemmy.blahaj.zoneAbuse is abuse rule
    link
    fedilink
    arrow-up
    39
    arrow-down
    1
    ·
    11 days ago

    My wife once hit me in front of my kids because she didn’t like my pointing out a double standard in how she was treating them. The one she was favoring recently started hitting the other one in a similar manner–basically just to silence her when she said something he didn’t like–and when I pointed out the similarity to my wife’s actions and suggested he had learned it from her she got mad and claimed that rather than hitting me she had “hit my hand away” which is a lie and she knows it. It is 100% classic spousal abuse and gaslighting, and yet due to the sheer size difference between us–I’m a foot taller–I feel ridiculous calling it that, and don’t want to find out what else my son learns is OK from his mom if I’m not around, so here I am still married to her, mostly trying to forget the abuse when it’s not actively happening. She’s been abusive, but I’m not really in any physical danger, so staying seems like the rational option in my situation… I imagine that’s relatively common among men.







  • Like any occupation, it’s a long story, and I’m happy to share more details over DM. But basically due to indecision over my major I took an abnormal amount of math, stats, and environmental science coursework even through my major was in social science, and I just kind of leaned further and further into that quirk as I transitioned into the workforce. bear in mind that data science as a field of study didn’t really exist yet when I graduated; these days I’m not sure such an unconventional path is necessary. however I still hear from a lot of junior data scientists in industry who are miserable because they haven’t figured out yet that in addition to their technical skills they need a “vertical” niche or topic area of interest (and by the way a public service dimension also does a lot to help a job feel meaningful and worthwhile even on the inevitable rough day here and there).


  • My “day job” is doing spatial data science work for local and regional governments that have a mandate to addreas climate change in how they allocate resources. We totally use AI, just not the kind that has received all the hype… machine learning helps us recognize patterns in human behavior and system dynamics that we can use to make predictions about how much different courses of action will affect CO2 emissions. I’m even looking at small GPT models as a way to work with some of the relevant data that is sequence-like. But I will never, I repeat never, buy into the idea of spending insane amounts of energy attempting to build an AI god or Oracle that we can simply ask for the “solution to climate change”… I feel like people like me need to do a better job of making the world aware of our work, because the fact that this excuse for profligate energy waste has any traction at all seems related to the general ignorance of our existence.






  • Y’all should really stop expecting people to buy into the analogy between human learning and machine learning i.e. “humans do it, so it’s okay if a computer does it too”. First of all there are vast differences between how humans learn and how machines “learn”, and second, it doesn’t matter anyway because there is lots of legal/moral precedent for not assigning the same rights to machines that are normally assigned to humans (for example, no intellectual property right has been granted to any synthetic media yet that I’m aware of).

    That said, I agree that “the model contains a copy of the training data” is not a very good critique–a much stronger one would be to simply note all of the works with a Creative Commons “No Derivatives” license in the training data, since it is hard to argue that the model checkpoint isn’t derived from the training data.


  • The problem with your argument is that it is 100% possible to get ChatGPT to produce verbatim extracts of copyrighted works. This has been suppressed by OpenAI in a rather brute force kind of way, by prohibiting the prompts that have been found so far to do this (e.g. the infamous “poetry poetry poetry…” ad infinitum hack), but the possibility is still there, no matter how much they try to plaster over it. In fact there are some people, much smarter than me, who see technical similarities between compression technology and the process of training an LLM, calling it a “blurry JPEG of the Internet”… the point being, you wouldn’t allow distribution of a copyrighted book just because you compressed it in a ZIP file first.


  • I’m not proposing anything new, and I’m not here to “pitch” anything to you–read Jaron Lanier’s writings e.g. “Who Owns the Future”, or watch a talk/interview given by him, if you’re interested in a sales pitch for why data dignity is a problem worth addressing. I admire him greatly and agree with many of his observations but am not sure about his proposed solution (mainly a system of micro-payments to creators of the data used by tech companies)–I’m just here to point out that copyright infringement isn’t in fact, the main nor the only thing that is bothering so many people about generative AI, so settling copyright disputes isn’t going to stop all those people from being upset about it.

    As to your comments about “feelings”, I would turn it around to you and ask why it is important to society that we prioritize the feelings (mainly greed) of the few tech executives and engineers who think that they will profit from such practices over the many, many people who object to them?



  • What irks me most about this claim from OpenAI and others in the AI industry is that it’s not based on any real evidence. Nobody has tested the counterfactual approach he claims wouldn’t work, yet the experiments that came closest–the first StarCoder LLM and the CommonCanvas text-to-image model–suggest that, in fact, it would have been possible to produce something very nearly as useful, and in some ways better, with a more restrained training data curation approach than scraping outbound Reddit links.

    All that aside, copyright clearly isn’t the right framework for understanding why what OpenAI does bothers people so much. It’s really about “data dignity”, which is a relatively new moral principle not yet protected by any single law. Most people feel that they should have control over what data is gathered about their activities online, as well as what is done with those data after it’s been collected, and even if they publish or post something under a Creative Commons license that permits derived uses of their work, they’ll still get upset if it’s used as an input to machine learning. This is true even if the generative models thereby created are not created for commercial reasons, but only for personal or educational purposes that clearly constitute fair use. I’m not saying that OpenAI’s use of copyrighted work is fair, I’m just saying that even in cases where the use is clearly fair, there’s still a perceived moral injury, so I don’t think it’s wise to lean too heavily on copyright law if we want to find a path forward that feels just.



  • Capitalism is precisely the problem, because if the end product were never sold nor used in any commercial capacity, the case for “fair use” would be almost impossible to challenge. They’re betting on judges siding with them in extending a very specific interpretation of fair use that has been successfully applied to digital copying of content for archival and distribution as in e.g. Google Books or the Internet Archive, which is also not air-tight, just precedent.

    Even fair uses of media may not respect the dignity of the creators of works used to create “media synthesizers”. In other words, even if a computer science grad student does a bunch of scraping for their machine learning dissertation, unless they ask and get permission from the creators, their research isn’t upholding the principle of data dignity, which current law doesn’t address at all, but is obviously the real issue upsetting people about “Generative AI”.