If nine experts in privacy can't understand what Microsoft does with your data, ...

Cthulhu_ · on Sept 1, 2023

> then in my opinion a court should step in and declare it void so that Microsoft isn't allowed to use any private data until they get their act together.

I hear what you're driving at, but "a court" cannot be both prosecutor and judge at the same time. This page is about that, possibly starting a civil suit to have a judge look at this and act accordingly.

j245 · on Sept 1, 2023

The true failure is government. Mozilla shouldn’t have to lead this. The prosecutor should be the regulator.

tmpX7dMeXU · on Sept 1, 2023

America reaps what America sows.

FridgeSeal · on Sept 1, 2023

Wouldn’t have too much of an issue with this if it was confined to America, but unfortunately it’s not.

So basically “the rest of us suffer what America sows”

boredumb · on Sept 1, 2023

What is a state prosecutor going to prosecute microsoft on? Vague T&C? As great as it may sound having the state going around proactively enforcing T&C of every product is not going to be effective or fair.

grabeh · on Sept 1, 2023

To an extent, think about vested interests here. Mozilla has little to gain by showcasing how clear a rival's new service agreement is!

The AI services section seems pretty clear in terms of limiting the use cases of user content:

"iv. Use of Your Content. As part of providing the AI services, Microsoft will process and store your inputs to the service as well as output from the service, for purposes of monitoring for and preventing abusive or harmful uses or outputs of the service."

Admittedly, I haven't read other parts to understand the full picture though.

flakeoil · on Sept 1, 2023

If I understand the below correctly then it seems they can use your data for whatever purpose they want. Also training AI even though it does not explicitly say so.

"2b. To the extent necessary to provide the Services to you and others, to protect you and the Services, and to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services."

[1] https://www.microsoft.com/en-us/servicesagreement/upcoming.a...

lnxg33k1 · on Sept 1, 2023

Vested interests, yes. History, also.

For first, Mozilla doesn't do this every week. And Mozilla has a history to keep in mind general population interests for privacy and security. On the other hand, we have a corporation with a history of cheating, lying, stealing, scamming people, from fighting standards, abusing positions of power, overwriting choices going against their shareholders interests. So yeah, vested interests, but also we need to keep in mind the history of both entities

Also Mozilla didn't say "Oh we have the MS new ToS and we keep them private", they're there, get a lawyer and see if they're obvious to understand?

nvm0n2 · on Sept 1, 2023

That's the only mention of AI using content. So it can be read in a few ways:

1. They will sometimes use the data for training their RLHF stuff, to "prevent harmful use" of the services.

2. The clause is exhaustive and therefore they won't use it for training, as otherwise that'd be mentioned, and are just going to log stuff for the usual monitoring purposes.

This is a storm in a teacup. I don't even know why I should care. If MS crawl some web pages I've written and AI gets slightly smarter by reading them, or if I have a chat with the AI and some engineers use it to make the AI work better, great. It's very hard to imagine concrete, real harm from them being able to do this, though I can understand why companies might worry about it spitting out their source code verbatim in some cases.

dspillett · on Sept 1, 2023

> I don't even know why I should care. If MS crawl some web pages I've written and AI gets slightly smarter by reading them

Crawling public web pages is a separate issue⁰ – by putting something online you aren't explicitly agreeing to any of MS's policies, at least in the eyes of the law. This is the same for anyone crawling public content not just MS.

This privacy policy covers all the content you might use MS apps and services for, i.e. where you are¹ automatically agreeing to MS's policies: OneDrive, potentially any local-only documents in Office, code in VS and other tools, perhaps anything stored on your PC running Windows.

> I don't even know why I should care.

If you don't use any MS products or services, and no products/services you do use are backed by MS's services, then you don't need to care personally. Or indeed if you do but consider everything you output or otherwise work on to be public domain. Otherwise, maybe it is something you should form an opinion on?

----

[0] time to switch my robots.txt files to “User-agent: * Disallow: /” – though it is very likely already too late for any existing content

[1] except where limited by law that you can afford to argue with MS's legal team over

nvm0n2 · on Sept 1, 2023

I do use MS services. I still don't understand why I should care unless the AI starts simply repeating my private data in response to questions.

Now you could argue, what if I have documents with secret ideas or valuable IP that I don't want the AI to helpfully explain to others? That's definitely a valid concern! But for consumer uses, if it learns to draw better hands by looking at my holiday photos or whatever, then I don't see the problem.

dspillett · on Sept 1, 2023

> unless the AI starts simply repeating my private data in response to questions

That is a concern some have, particularly around CoPilot and the fact it has been trained with much copy-left covered code in public repositories.

They assure us that it is not possible for blocks of code to be regurgitated that would break things like *GPL, but they have yet to explain why, if that assurance is 100% definitely true, they have not included any of their private code in the training set. Surely they consider that their code is of good quality and would be valuable to include in the model.

> if it learns to draw better hands by looking at my holiday photos or whatever, then I don't see the problem

And if it gives an advertising firm working for a product you'd rather not be associated with an image of a family that look _very_ like yours? Again, the same assurance is given as per CoPilot, but again not everyone is assured by the assurance.

And of course it could happen anyway by chance even if your family is not in the training set. I don't not bother to lock my doors because someone with a good lock-pick could get in anyway.

And they are not doing it because of a great communal benefit (well, their individual coders may be, but the company certainly isn't), they are doing it for commercial benefit. I'd prefer they didn't with my data, or if they do I'd like my slice however small thankyouverymuch.*

mowthie · on Sept 1, 2023

> If you don't use any MS products or services, and no products/services you do use are backed by MS's services, then you don't need to care personally.

I beg to differ, wouldn't they be more inclined to care in case their data was being used in a product they do not interact with, rather than the one they do use - and in some way benefit from it?

dspillett · on Sept 1, 2023

That is a huge grey area of indirect use/agreement. If they don't interact with those services than someone else has given MS the data so from MS's PoV someone else has agreed to the policy and from the users PoV someone else has perhaps given their data to MS without permission. So yes, a concern, but not necessarily one relating to this policy except any clauses it has about removing data and its use when they are informed they shouldn't have it.

grabeh · on Sept 1, 2023

No, that is true. There are multiple interpretations here. I gave the most optimistic one!

dspillett · on Sept 1, 2023

That paragraph says some things that they can do. It in no way says they won't use your content for AI training and any number of other things.

Mozilla's point is that the whole document is sufficiently vague that they could use it to defend pretty much whatever use of your content that conceive of now or in the near future.

grabeh · on Sept 1, 2023

Why would they single out those specific uses then, if you consider express prohibitions are necessary?

dspillett · on Sept 1, 2023

To make it look, on cursory reading, like the policy is something you are comfortable to agree to. Legal theatre.

Also because those specific uses are mentioned in existing law and/or have been otherwise successfully defended. It gives their lawyers as many explicit tools as possible, before they need to argue around the implicit ones enabled by their policies & agreements being deliberately more vague elsewhere.

The point is that if they don't say that they won't, then they pretty much can if they choose to.

grabeh · on Sept 1, 2023

Interesting! A rather cynical approach. Although preferable to naivety on my part - I'd expect a court to hold the list exhaustive if challenged.

hkt · on Sept 1, 2023

With a sign on the door saying "beware of the leopard"

Xelbair · on Sept 1, 2023

yeah, that's basically one of core tenants of GDPR.

>Consent must be a specific, freely given, plainly worded, and unambiguous affirmation given by the data subject;

grabeh · on Sept 1, 2023

GDPR and indeed any data protection laws may well be completely irrelevant in the context of Microsoft's services. Even if relevant, consent is unlikely to be a relevant as a processing basis under GDPR in the context of usage of MS services. Performance of contract or legitimate interests much more likely to be relevant...

cientifico · on Sept 1, 2023

I'm my experience GDPR is relevant.

I need to inform my customers what I do with their personal data. That includes to which companies I share that data with.

Having an excel with customer data is providing that data to Microsoft. So I need, as responsible of the data, to know how they will use it. Any use case that isn't obvious have to be cleared stated in the data privacy agreement. Including moving data outside EU into other countries like America (where US government can request that data without even informing us) or using their data to train AI.

Come'on. If we need to inform that we used chatgpt (just in case they provide PI), why we will not need to inform about Microsoft.

grabeh · on Sept 1, 2023

Key word is "may" be completely irrelevant! Of course, if you're providing an Excel of customer data, it will be relevant if the user is in the EU. But still, consent won't be relevant in that context.

User content may include personal data but may also not...so in some senses, better to include totality of use cases in a non-data protection related document.

pbhjpbhj · on Sept 1, 2023

It sounds like I might be OOTL, has there been recent legal action relevant to this?

j45 · on Sept 1, 2023

Thanks.

Synthetic data might be one perspective.