[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Guidelines for pre-trained ML model weight binaries (Was re: Where s
From: |
Csepp |
Subject: |
Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?) |
Date: |
Wed, 12 Apr 2023 11:32:34 +0200 |
Nathan Dehnel <ncdehnel@gmail.com> writes:
> a) Bit-identical re-train of ML models is similar to #2; other said
> that bit-identical re-training of ML model weights does not protect
> much against biased training. The only protection against biased
> training is by human expertise.
>
> Yeah, I didn't mean to give the impression that I thought
> bit-reproducibility was the silver bullet for AI backdoors with that
> analogy. I guess my argument is this: if they release the training
> info, either 1) it does not produce the bias/backdoor of the trained
> model, so there's no problem, or 2) it does, in which case an expert
> will be able to look at it and go "wait, that's not right", and will
> raise an alarm, and it will go public. The expert does not need to be
> affiliated with guix, but guix will eventually hear about it. Similar
> to how a normal security vulnerability works.
>
> b) The resources (human, financial, hardware, etc.) for re-training is,
> for most of the cases, not affordable. Not because it would be
> difficult or because the task is complex, this is covered by the
> point a), no it is because the requirements in term of resources is
> just to high.
>
> Maybe distributed substitutes could change that equation?
Probably not, it would require distributed *builds*. Right now Guix
can't even use distcc, so it definitely can't use remote GPUs.
- Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?), (continued)