For those wondering why this is interesting: This technique is being used to rep...

romanzubenko · on March 24, 2023

Today Databricks announced [0] 6b parameter model from EleutherAI finetuned on Alpaca dataset. According to their CEO[1], training took 3 hours, and costed $30. They didn't release any details on how it was trained, but likely with LoRa.

[0] https://www.databricks.com/blog/2023/03/24/hello-dolly-democ... [1] https://twitter.com/alighodsi/status/1639251347777388544

numlocked · on March 24, 2023

Interesting. I wonder what the training cost was for:

https://huggingface.co/EleutherAI/gpt-neox-20b

Perhaps it’s in the paper…

michaelhartm · on March 24, 2023

They used the 6b GPT4-J, not 20B. That's what's interesting, it's a smallish large language model :).

dragonwriter · on March 24, 2023

GPT-J, not GPT4-J.

int_19h · on March 24, 2023

There are also some LLaMA LoRAs that are trained on the Anthropic dataset specifically for chat:

https://huggingface.co/serpdotai

I haven't done any formal tests on this yet, but with llama-13b, the overall structure of its responses definitely becomes much more ChatGPT-like. It would be very interesting to see how the 65B model performs.

m3affan · on March 24, 2023

Let the revolutionbbegin

smaddox · on March 24, 2023

There's already RWKV, if you want a decent performing pre-trained model that's Apache 2.0 licensed: https://twitter.com/BlinkDL_AI/status/1638555109373378560?s=...

pffft8888 · on March 24, 2023

https://news.ycombinator.com/item?id=35281026

GaggiX · on March 24, 2023

In addition, for the past 1/2 month this technique has been used to fine-tune Stable Diffusion models.

terafo · on March 24, 2023

Closer to 4 months. It is much better than having a bunch of 2-4gb models laying around.

dragonwriter · on March 24, 2023

The 1/2 month seems to match Lycoris/LoCon, which as I understand (haven’t dug into the details on this) is a newer refinement of LoRa. LoRa has been used for longer, correct.

GaggiX · on March 24, 2023

The LyCORIS/LoCon repo started committing 1 month ago and almost no one is using it except for a few experiments (not even the automatic webui supports it without a plugin).

dragonwriter · on March 24, 2023

Judging from activity on Civitai, I think “almost no one is using it except for a few experiments” is very wrong. Sure, A1111 needs a plugin for it; it needs a plugin for ControlNET, too, but that is also quite popular.

GaggiX · on March 24, 2023

I'm also judging from the activity on CivitAI, the most downloaded (>1000 downloads, not many) ones are actually just LoRA with LoCon in another (experimental) branch of the CivitAI page, definitely not "very wrong" ahah

>it needs a plugin for ControlNET

The big difference is that ControlNet actually required a pretty complex interface to be used effectively, meanwhile the use of LoCon/LyCORIS should be completely transparent and works like a LoRA

Agentlien · on March 24, 2023

ControlNet is built in as of maybe two weeks ago and no longer requires an extension. I started using it when the built-in support arrived and have had a lot of fun with it since.

GaggiX · on March 24, 2023

4 months? I don't think so, people really start using LoRA when it was added to the diffusers library less than 2 months ago, this library is used by the training plugin of automatic webui, I guess time seems to flow more slowly when many things happen.

outside1234 · on March 24, 2023

Or, more importantly than in AWS, locally in disconnected or poorly connected scenarios like in-vehicle or in-home.

arugulum · on March 24, 2023

> This technique is being used to reproduce[0] the Alpaca results from Stanford[1]

Reproduced is a strong statement, without any rigorous justification other than a few cherry-picked examples. Alpaca-LoRA is simply LLaMA with LoRA-tuning on the Alpaca data. There are no metrics, no measurements, no evaluations to show that the Alpaca-LoRA performs similarly to Alpaca, when it is well-known in the field that parameter-efficient fine-tuning always pays a cost in terms of performance relative to full fine-tuning (which is what Alpaca does).

(This has been a huge nit for me because of the recent flood of Alpaca-replications, or even claims that Alpaca comparable to ChatGPT, rushing to market themselves, but with nothing to justify their claims.)

GaggiX · on March 24, 2023

>when it is well-known in the field that parameter-efficient fine-tuning always pays a cost in terms of performance relative to full fine-tuning

The LoRA paper clearly states the performance of the method "LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. ": https://arxiv.org/abs/2106.09685

arugulum · on March 24, 2023

I don't want to get into the weeds of the subtleties of evaluation, hyperparameter-tuning and model comparisons, but let's just say that subsequent studies have shown that LoRA (consistent with most parameter-efficient tuning methods) underperform full fine-tuning: https://arxiv.org/abs/2203.06904

As simple way to think about it is this: if LoRA really gives full fine-tuning performance, why would anyone ever fully fine-tune a model?

GaggiX · on March 24, 2023

>why would anyone ever fully fine-tune a model?

You're asking it as if it were a rhetorical question, but I think it carries more weight than many people seem to believe.

arugulum · on March 24, 2023

To balance my view a little, it is definitely a valid question to ask "how far can we get with parameter-efficient tuning", and I firmly believe that as models get larger, the answer is "very, very far".

That said, I also dislike it when it is carelessly claimed that parameter-efficient tuning is as good as full fine-tuning, without qualifications or nuance.

deepsquirrelnet · on March 25, 2023

I agree that it does carry weight.

It is not apparent to me that fine tuning should be better, especially since the LoRA method seems like it could be robust against catastrophic forgetting.

GaggiX · on March 25, 2023

I 100% agree with you, but I (we) need more evidence.

numlocked · on March 24, 2023

I agree - my comment originally had a parenthetical about this fact, but I thought it was probably confusing to people who just wanted to understand what this was about. Perhaps I shouldn't have edited it out.

It also bothers me that a lot of LoRA claims read like "You won't believe how little it costs to train these models!", when of course 99%+ of the complexity and cost is in the LLaMA (or whatever) model that underpins it. Folks are talking about it in a loose way that implies some kind of miraculous overall training cost breakthrough.

polyterative · on March 24, 2023

Thanks! Hard to follow this stuff sometimes with all the news