For those wondering why this is interesting: This technique is being used to reproduce[0] the Alpaca results from Stanford[1] with a few hours of training on consumer-grade hardware.
I believe there will soon be a cottage industry of providing application-specific fine-tuned models like this, that can run in e.g. AWS very inexpensively. The barrier today seems to be that the base model (here, Meta's LLaMA) is encumbered and can't be used commercially. Someone will soon, I'm confident, release e.g. an MIT-licensed equivalent and we'll all be off to the races.
Today Databricks announced [0] 6b parameter model from EleutherAI finetuned on Alpaca dataset. According to their CEO[1], training took 3 hours, and costed $30. They didn't release any details on how it was trained, but likely with LoRa.
I haven't done any formal tests on this yet, but with llama-13b, the overall structure of its responses definitely becomes much more ChatGPT-like. It would be very interesting to see how the 65B model performs.
The 1/2 month seems to match Lycoris/LoCon, which as I understand (haven’t dug into the details on this) is a newer refinement of LoRa. LoRa has been used for longer, correct.
The LyCORIS/LoCon repo started committing 1 month ago and almost no one is using it except for a few experiments (not even the automatic webui supports it without a plugin).
Judging from activity on Civitai, I think “almost no one is using it except for a few experiments” is very wrong. Sure, A1111 needs a plugin for it; it needs a plugin for ControlNET, too, but that is also quite popular.
I'm also judging from the activity on CivitAI, the most downloaded (>1000 downloads, not many) ones are actually just LoRA with LoCon in another (experimental) branch of the CivitAI page, definitely not "very wrong" ahah
>it needs a plugin for ControlNET
The big difference is that ControlNet actually required a pretty complex interface to be used effectively, meanwhile the use of LoCon/LyCORIS should be completely transparent and works like a LoRA
ControlNet is built in as of maybe two weeks ago and no longer requires an extension. I started using it when the built-in support arrived and have had a lot of fun with it since.
4 months? I don't think so, people really start using LoRA when it was added to the diffusers library less than 2 months ago, this library is used by the training plugin of automatic webui, I guess time seems to flow more slowly when many things happen.
> This technique is being used to reproduce[0] the Alpaca results from Stanford[1]
Reproduced is a strong statement, without any rigorous justification other than a few cherry-picked examples. Alpaca-LoRA is simply LLaMA with LoRA-tuning on the Alpaca data. There are no metrics, no measurements, no evaluations to show that the Alpaca-LoRA performs similarly to Alpaca, when it is well-known in the field that parameter-efficient fine-tuning always pays a cost in terms of performance relative to full fine-tuning (which is what Alpaca does).
(This has been a huge nit for me because of the recent flood of Alpaca-replications, or even claims that Alpaca comparable to ChatGPT, rushing to market themselves, but with nothing to justify their claims.)
>when it is well-known in the field that parameter-efficient fine-tuning always pays a cost in terms of performance relative to full fine-tuning
The LoRA paper clearly states the performance of the method "LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. ": https://arxiv.org/abs/2106.09685
I don't want to get into the weeds of the subtleties of evaluation, hyperparameter-tuning and model comparisons, but let's just say that subsequent studies have shown that LoRA (consistent with most parameter-efficient tuning methods) underperform full fine-tuning: https://arxiv.org/abs/2203.06904
As simple way to think about it is this: if LoRA really gives full fine-tuning performance, why would anyone ever fully fine-tune a model?
To balance my view a little, it is definitely a valid question to ask "how far can we get with parameter-efficient tuning", and I firmly believe that as models get larger, the answer is "very, very far".
That said, I also dislike it when it is carelessly claimed that parameter-efficient tuning is as good as full fine-tuning, without qualifications or nuance.
It is not apparent to me that fine tuning should be better, especially since the LoRA method seems like it could be robust against catastrophic forgetting.
I agree - my comment originally had a parenthetical about this fact, but I thought it was probably confusing to people who just wanted to understand what this was about. Perhaps I shouldn't have edited it out.
It also bothers me that a lot of LoRA claims read like "You won't believe how little it costs to train these models!", when of course 99%+ of the complexity and cost is in the LLaMA (or whatever) model that underpins it. Folks are talking about it in a loose way that implies some kind of miraculous overall training cost breakthrough.
I believe there will soon be a cottage industry of providing application-specific fine-tuned models like this, that can run in e.g. AWS very inexpensively. The barrier today seems to be that the base model (here, Meta's LLaMA) is encumbered and can't be used commercially. Someone will soon, I'm confident, release e.g. an MIT-licensed equivalent and we'll all be off to the races.
[0] https://github.com/tloen/alpaca-lora
[1] https://crfm.stanford.edu/2023/03/13/alpaca.html