emacs-tangents
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help building Pen.el (GPT for emacs)


From: Shane Mulligan
Subject: Re: Help building Pen.el (GPT for emacs)
Date: Sat, 24 Jul 2021 14:10:43 +1200

It's a bit like whitewashing because it's
reconstructing generatively by finding
artificial/contrived associations between
different works that the author had not
intended but may have been part of their
inspiration inspiration, and it compresses the
information based on these assocations.

It's a bit like running a lossy 'zip' on the
internet and then decompressing
probabilistically.

When run deterministically (set the temperature of GPT to 0), you may actually
see 'snippets' from various places, every time, with the same input generating
the same snippets.

So the source material is important.

What GitHub did was very, very bad but they
did it anyway.

That doesn't mean GPT is bad, it just means
they zipped up content they should not have
and created this language 'index' or ('codex'
is what they call it).

What they really should do, if they are honest
people, is train the model on subsets of
GitHub code by separate licence and release
the models with the same license.

Shane Mulligan

How to contact me:
🇦🇺00 61 421 641 250
🇳🇿00 64 21 1462 759
mullikine@gmail.com



On Sat, Jul 24, 2021 at 1:14 PM Richard Stallman <rms@gnu.org> wrote:
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > That's not what happens with these services: they don't _copy_ code
  > > from other software (that won't work, because the probability of the
  > > variables being called by other names is 100%, and thus such code, if
  > > pasted into your program, will not compile).  What they do, they
  > > extract ideas and algorithms from those other places, and express them
  > > in terms of your variables and your data types.  So licenses are not
  > > relevant here.

  > According to online reviews chunks of code is copied even verbatim and
  > people find from where. Even if modified, it still requires licensing
  > compliance.

>From what I have read, it seems that the behavior of copilot runs on a
spectrum from the first description to the second description.  I
expect that in many cases, nothing copyrightable has been copied, but
in some cases copilot does copy a substantial amount from a
copyrighted work.

--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]