Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask Microsoft: Are you using our personal data to train AI? (foundation.mozilla.org)
478 points by alabhyajindal on Sept 1, 2023 | hide | past | favorite | 151 comments


The regulatory architecture of "big tech" is an unmitigated disaster.

You can't have two entities effectively control every touching point with the digital domain for a major fraction of the planet.

There is absolutely no reason to trust that they will not abuse this position in opaque and impossible to trace ways. These are trillion-dollar powered for-profit entities, with armies of lawyers and lobbyists that can intimidate medium sized countries. They will exploit every weakness of incompetent, confused and captured regulatory/political systems. Because that's what they are legally obliged to do for their shareholders. And these shareholders care zilch if this duopolistic - fingers in all honeypots - design undermines our entire digital future. They just want some tech "winners" in their portfolios.

The longer nothing serious is done the harder it becomes to do anything.


> they are legally obliged to do for their shareholders

Isn't that an "old wives tale" that doesn't have any basis in reality?


For US companies registered in Delaware (which is most of them), the doctrine of shareholders primacy holds. Shareholders can and have sued corporate boards/execs for not acting in their best interest. It goes back to the 1910s when Henry Ford very loudly and explicitly announced he was cutting shareholder dividends to pay workers the famous $5/day wage (double what any other factory workers were paid), and minority shareholders sued and won.

Of course it is debated and different in different jurisdictions, as it is common law and not a specific statute. But ambiguity in law means execs lean towards the safe decision. Shareholders can go a long way with a legal threat that never sees trial.

https://en.m.wikipedia.org/wiki/Dodge_v._Ford_Motor_Co.


>Shareholders can and have sued corporate boards/execs for not acting in their best interest.

Translating "best interest" to "maximize profits" is the stretch though. The shareholders define what is their best interest. Shareholders aren't faceless demons. They are you and me.


Come on. You know who the "shareholders" are that the parent commenter talks about. They are board members and people with a huge amount of shares, not "normal" people who hold 5 shares in their retirement portfolio.

The main interest of a shareholder is return on investment. And that's fine. The problem is that if you don't have enough competing entities in a market, these shareholders of the few companies that still exist hold disproportionate power that can and will be misused to make any attempt at new, reasonable competition infeasible.

It's really not that hard to understand.


> The main interest of a shareholder is return on investment. And that's fine.

No, it's not fine. Legally, maybe, but not morally fine. I don't understand how everyone is ok with this system which seems to be selecting for the worst humans.


> No, it's not fine. Legally, maybe, but not morally fine. I don't understand how everyone is ok with this system which seems to be selecting for the worst humans.

Wait. If you invest your money into something, do you not expect a return?


because the previous system involved swords and later canons


no this is insufficiently informed.. company bylaws can vary quite a lot about voting rights. Your typical SiliValley cut-n-paste corp less so..


In most cases, the ones who do sue are neither - they’re companies themselves, with shareholders of their own, or funds with specific regulations (which amount to much the same, if not more specifically profit-oriented).

These are not people, they don’t think morally the way people do. These are not demons, they are not evil themselves - these are more like machines, algorithms tuned to maximize profit. This structure is our paper-clip optimiser, already in place much before the AI revolution. If there is a devil, they are the ones behind setting up this entire system, not a player in it.


have you ever seen a shareholder that doesnt like a monopoly, winner-takes-all investment? this is the wet dream of investors.

ensuring the system does not degenerate is the job of regulators.


I think the point is that shareholders' preference isn't the legal standard. The board can make moves the shareholders absolutely hate and which lose money in the short-term, but as long as the board reasonably thinks it's in the shareholders best interest, they can't do anything about it legally. (Other than vote out the board over time.)

Of course, regulation is needed to stop a bad board.


You can setup a company where the board is static and very dysfunctional. Lived it. Company grows and grows and you can't get rid of the founder who has clearly reached the "Peter Principle" stage. Very painful to be an investor or employee at a place like that.


lol. Shareholders are absolutely not “you and me”.

Your couple Tesla shares doesn’t make you a “shareholder” as being discussed here. The shareholders that sue boards are shareholders who, if they sold, would disrupt share price notably.


They're not faceless but they are demons.


> For US companies registered in Delaware (which is most of them), the doctrine of shareholders primacy holds. Shareholders can and have sued corporate boards/execs for not acting in their best interest. It goes back to the 1910s when Henry Ford very loudly and explicitly announced he was cutting shareholder dividends to pay workers the famous $5/day wage (double what any other factory workers were paid), and minority shareholders sued and won.

That's because he had the explicit goal of starving out the Dodge brothers who owned a competing automotive manufacturer. If it was just about the pay of employees the suit likely would have failed. You cannot prejudice minority shareholders even if they own competing companies.


Isn't that an "old wives tale" that doesn't have any basis in reality?

You are correct. Nobody has ever been able to point to a law requiring companies to maximize profits for shareholders at all costs.

In fact, there are thousands and thousands of companies that exist with a primary purpose of doing things other than making money for their shareholders.

Any "legal" obligation to "maximize shareholder value" started out as a way to excuse the greed of other people, but falls flat upon even cursory examination.


"An activist shareholder is a shareholder who uses an equity stake in a corporation to put pressure on its management"

https://en.wikipedia.org/wiki/Activist_shareholder

I am not excusing greed. I am just saying that if shareholders are focused on financial gain the corporate management is obliged to deliver. Not all shareholders are greedy. But overwhelmingly they are.


>An activist shareholder is a shareholder who uses an equity stake in a corporation to put pressure on its management

Would you be surprised to learn that there are shareholder groups that use their equity claims to pursue environmental, DEI or other goals that benefit society?

>Not all shareholders are greedy. But overwhelmingly they are.

Not even close. The vast majority of shareholders own stock through ETFs and pay no attention to the finances (or politics) of the companies they own.

The fiduciary rule is to prevent executives from running the company in their interest, as opposed to the interests of the silent shareholders who actually own the company.


I am quite aware of impact investors, ESG etc. The space is full of convulsions and strife [1]. But its a different debate. The point I was trying to make is that corporate managers are legally controlled by shareholders, its not a myth.

Actually the difficulty with sustainable investments shows how difficult to rely on shareholder's values to goad corporates in the "right" direction: unless the risk is clear, present and impossible to disguise there is just enormous resistance to interfere with profitable opportunities.

Now in the context of digital information oligopolies good luck convincing shareholders that their big tech darlings are detrimental for the future of digital society.

[1] https://www.pionline.com/esg/blackrock-voted-against-record-...


>Not even close. The vast majority of shareholders own stock through ETFs and pay no attention to the finances (or politics) of the companies they own.

Right, but many would sell their ETFs and buy others if they suddenly started to deliver below market returns.


Exactly.

Having a fiduciary duty to act in shareholder's interests (a very common sense rule) is a long way from "maximizing shareholder profits at all costs", even if those sometimes align.


Alas it does. Think, e.g. about the backlash against asset managers for pursuing sustainability agendas.

Closer to the topic, think also about the "furore" and supposed panic at Google that they are not capitalizing (literally) on their AI leadership.


> pursuing sustainability agendas

My impression is that the "sustainability agendas" these asset managers were pursuing turned out to be er... mostly bullshit.

Along the lines of companies "green washing" stuff that was not in fact green. Also stuff marketed as "socially progressive" as a sales tactic.


It just goes to show that if you want real change you need to get really serious about it. As in: change the regulatory and legal framework. Self-regulation doesn't work. You just get an army of consultants, rating agencies and other intermediaries providing "assurance" for-profit, never biting the hand that feeds them.

PS. For the record and while tangential to the digital monopoly discussion, it is still a good thing that we got the ESG and greenwashing debacle. Just a few years ago nobody but the perennial "tree huggers" was even talking about sustainability. Now it is a much more broad based discussion although obviously it has not landed anywhere yet. I wish people would also take regulation of the digital sphere more seriously as it is the backbone on which we will solve all these other problems.


> These are trillion-dollar powered for-profit entities, with armies of lawyers and lobbyists that can intimidate medium sized countries. They will exploit every weakness of incompetent, confused and captured regulatory/political systems.

I don't think the need to exploit opportunities is exclusive to major corporations. We see this behavior and power grabs even in petty quarrels in small amateur organizations such as small businesses and homeowner associations. The tools change but the human traits that drive this behavior is always there. There is no toggle that turns on this behavior when assets surpass a certain level.

> Because that's what they are legally obliged to do for their shareholders.

I don't think they are "legally obliged" to anyone, or even "obliged" at all. Their leadership is driven to exploit the tools at their reach to meet their goals, and that's precisely what they do.


> I don't think the need to exploit opportunities is exclusive to major corporations

we are not talking about small, medium or large corporations. we are talking about two entities that effectively control the entire fabric of digital society and the economy. and from which there is increasingly no opt-out. there was never so much concentration. never.

> Their leadership is driven to exploit the tools at their reach to meet their goals

Corporate management gets approved their remuneration from shareholders. If they don't deliver they get the boot.


> we are not talking about small, medium or large corporations.

Doesn't matter. The point is that there is no threshold in a company's revenue where they start to exploit all tools and opportunities at their reach. Large corporations aren't any more inherently evil than your local gas pump or convenience store.

> Corporate management gets approved their remuneration from shareholders. If they don't deliver they get the boot.

That's not what happens in real life. There is far more to the CEO-Board-shareholders relationship than simplistic "they can give him the boot" take.


I blame the tech companies but I also blame the people who created all this stuff -- programmers, tech startups, and computer science. Those people created a system that is too easy to use as an ultra-efficient wealth concentrator, and now tech companies are the result.


we are all cogs in the machine. maybe decades ago the stance of individuals could have made a difference but nowadays its clear that it no longer does. there is a collectively created monster that can only be tamed by something of equivalent and greater collective power: radical government regulation. The EU is doing its part. The US isnt.

imho nothing was learned from the previous Microsoft monopoly era. Information technology is not just any business sector. It is the canvas on which everything else plays out. It needs to be regulated in a way that people can go with online life, politics, business etc. without this Damocles sword permanently hanging above us.


I think personal action can still make a difference. What if we were to develop a coalition of people that believed in that, sort of like a union for a healthy society?

The coalition would have a common set of standards and ethics, which would include the development of more community-based standards for technology. With such a coalition, we could create our own technology not in the hands of technology companies, and we could be more responsible in the development and use of technology. Then the entire coalition could stop using Google for example...and crush the life out of them.

If such a coalition had ten million people in it, it might work if many of them are technologically capable. Ten million seems like a lot but it's only 1 person getting 2 new people to join, who in turn each get 2 new people to join, etc... 24 times.


I would definitely not be against such a coalition :-).

I think we already effectively have (more loosely defined than your idea) a global community that is based on various open source efforts. Open source is vitally important (and not much appreciated in the usual ossified power centers) phenomenon as it proves that in many critical nodes of information technology: desktop and mobile OS, desktop apps, browsers, social media, and all sorts of libraries etc you can perfectly well have alternatives that are more in line with societal objectives.

But we have now enough of a track record to know the open source mode cannot scale to provide the backbone that everyone deserves. What is missing is the regulatory framework that will induce (and if needed, force) commercial initiatives to be interoperable, ring fenced, small enough to fear and respect clients and users, the opposite of the too-big-to-fail conglomerates, in short: we need companies that are good digital citizens.


Yes, you are right. I agree. Open source is a HUGE benefit and an excellent way to counteract corporations from holding all the resources. In my own professional workflow I use a lot of it, and I also try to give back by contributing tutorials to using it and recommending it as well. I think the world would be a lot worse off without the open-source world.


Isn’t the EFF that (or it’s intention,at least)?


Go on, attempt to form that coalition....

And the next thing you know you'll be on every major news network as a "Communist anti-technology union group looking to hold the US back"

The US has a brutal history of stamping out unions, do you think you're just going to form this group unopposed without massive amounts of money and resistance looking to stamp out your very existence?


> The US has a brutal history of stamping out unions, do you think you're just going to form this group unopposed without massive amounts of money and resistance looking to stamp out your very existence?

I wonder how they are going to do that? I have to admit, your statement scares me less than anything I have ever heard...


the claim 'I was only following orders' has been used to justify too many tragedies in our history.


The regulatory environment also played a big role in this. If you read the original telecommunications act (title I). It is wildly heavy handed on privacy. The TL:DR basically you can pipe this voice data around but you may not use it and if an gov official wants it, get a warrant (even if it is a rubber stamp one). Title II broke most of that to make the internet grow. Well it succeeded doing that wildly. But now the companies are doing what is basically 'natural' for them to do. Consume what they can see. This sort of thing has happened before with telegrams, mail, and plain old telephone. Companies have seized up Title II to make this happen. For example your cell phone is not a phone. It is an internet device and covered under tittle II not I.

Also we as users are somewhat to blame as we wanted 'free stuff'. We traded away our freedoms for tweets and facebook posts.

There is no 'one entity' to blame here. We all share it quite equally.

I have been saying this for many years now. But most of these companies are heading right into a Title III being written just for exactly what they are doing. They better get their act together or no one is going to like the outcome.


I came to say something similar. I think it's important to add that there is a slew of non-technical staff, sales / product managers that foam at the mouth at opportunities to exploit these monopolist ideas. They then hire technical staff to execute. Their entire compensation package depends on this virus like growth.


This is the most Kafkaesque thread i've ever seen on Hacker News.

Microsoft definitly uses analytics if that counts as personal data - https://www.microsoft.com/insidetrack/blog/microsoft-uses-an...

Is Microsoft reading your Gmail account, Word documents or Porn activities and feeding them to OpenAI? Not according to terms and conditions https://learn.microsoft.com/en-us/legal/cognitive-services/o...

Is Microsoft generally doing generally unknown uknowns. Yes.


This has been my objection to Microsoft's maze-like privacy policies for a long time.

I once asked - on another forum and before the recent "AI" coding assistants were widely available - whether Microsoft's privacy policy allowed them to upload and do things with your own code if you used VS Code with telemetry enabled.

At the time I was downvoted to invisibility and told I was being silly. But not one person showed me anywhere in Microsoft's terms or privacy policy wording that limited the scope of the data processing clearly and transparently to exclude that kind of thing.

Today there's a bit of an obsession with training ML models using any large data set available and perhaps my caution from yesteryear wouldn't look so silly to the critics now.


> But not one person showed me anywhere in Microsoft's terms or privacy policy wording that limited

These policies don't exist to tell you what they are limiting themselves to. These documents exist as a defence to use against you (because you agreed to the policy by continuing to use their services) if you try to stop them doing something.

Unless for some reason there is commercial or legal advantage in saying “we will not to X”, a large company will never knowingly impose such limits on itself.


Fortunately since I'm in the UK the legal position is rather more enlightened. The default is that any processing of personal data is not allowed and they have to justify it and explain their justification.

Of course even if they flagrantly ignore the rules and carry on while treating any fines as a cost of doing business - which is hardly unusual in GDPR world for certain big tech companies - there is only so much you can do as one individual and you still need the regulators to step in and enforce the rules with meaningful sanctions.


For reference, here is the text of Microsoft's upcoming Services Agreement:

https://www.microsoft.com/en-us/servicesagreement/upcoming.a...

And here is their summary of what has changed:

https://www.microsoft.com/en-us/servicesagreement/upcoming-u...


It still contains the sentence “to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content.” The only explicit limitation is asserted with respect to targeted advertising: “We do not use what you say in email, chat, video calls or voice mail, or your documents, photos or other personal files, to target advertising to you.”

The Summary of Changes doesn’t mention any changes to the Privacy Statement, which in turn doesn’t seem to exclude training AI models on user data.


Training a Microsoft AI could quite literally be considered ‘improving Microsoft products and services’.


Precisely - and there are at least two problems (for the end user) here:

Firstly, the subordinate clause ("to improve Microsoft products and services...") does not impose any constraint on what is actually being granted here ("you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content") - the end user is granting nothing less than they would be if the subordinate clause were not there.

Secondly, there is no suggestion that the "improvements" will benefit the end user. It is, of course, entirely possible that the user's content could be used to benefit other users of Microsoft's products and services in a way that is detrimental to the owner/provider of that content.

If the subordinate clause has any purpose other than to suggest to the end user that Microsoft is imposing a benign and reasonable demand here, I do not see it.


Exactly.


Honest question I have. I've worked in advertising and the laws around PII are interesting. In California there are the CCPA laws that require them to allow a form to submit and have my PII removed after 90? days.

What happens if these people train a model on a cali residents PII, having a request come in, 3 months later someone asks it about you and it spits out the PII that was "removed"? I'm assuming it's a matter of this going to court to be decided but I'd be curious if any californian legal nerds have some reasons why no one has started trying to target these things for settlements if nothing else?


> What happens if these people train a model on a cali residents PII..

They would just deny it. Or claims impossible to recover from the trained model.

I know no CA laws, but lots of similar privacy law, when asked to be "removed", allow exception for aggregated data.


I don’t know CCPA laws that well, but I do the EU ones. As it stands “the right to erasure / to be forgotten” is extremely vague on this, and there doesn’t seem to be a wide precedent. In general the law is applicable to raw data records and not to aggregate data/metrics, neither to models. However, models in this context refers to one particular ruling w.r.t. to insurance or credit scoring industry (don’t remember exactly which one).

I want to point out that the model doesn’t need to “spit out” removed data. It can be a classifier, or regression model, and not a generative model, and ideally, it would not be trained on your data.

Worth noting that from the technical standpoint, it’s difficult too. Say, a model costs X-large amount of dollars. Normally, I would retrain it e.g. every 6 months. But now I have erasure requests coming in on a regular basis —- retraining often to comply with those is too expensive. There’s a line of research on “machine unlearning” on how to do it efficiently but it’s quite underwhelming so far.


Why are you assuming they have your PII in it in the first place and that the identifying part would not be stripped?

If the personal part is gone, I'm not sure you have any claim.


How does M$'s legal team accomplish such a feat? Are there layers of linguistic abstraction built up such that only a sufficiently large team (Microsoft, grand jury) has the bandwidth to extract any meaning? Red herrings with gotchas hidden in seemingly innocuous places? Do they just talk in circles and never give an exact answer?


Past a certain number of millions of dollars, a team of lawyer is effectively a legal red team who is tasked with finding bypasses (and will find them) around the restrictions in place.


Maybe governments should hire actual red teams to look for those loopholes.


Or pass the rules that state that if there is no expectation that a layman could understand it, you can't enforce a contract against them.


I think the the 10th amendment, would be a better template to start with.


Im not sure what you mean.

I wasn't stating that it had to be a federal law.


How does M$'s legal team accomplish such a feat?

Who's going to stop them?


What is M$?



Microsoft.


If it does not specifically exclude that use, then you can be assured that eventually that data will be used in training if it’s useful. If not now, then eventually. That goes for every company warehousing data on the planet.

This should be everyone’s base assumption, and should be basically accurate unless laws are put in place, but even then jurisdictional bypass may make them irrelevant for historical data and will only weakly protect new data.

Welcome to the new oil.


There is a difference between personal data and personally identifiable data. In many ways it is unavoidable to use personal data. Predictive text uses personal data (they 'learn' from everyone) - is the sentence and paragraph personal? Search engines record what you have typed in - do you classify your search query as 'personal'? I can copy paste the first comment from this thread - is it personal data. I may be flat out wrong, and shouldn't type anything into a web page ever.

Those four lawyers and three privacy experts didn't seem to come to a conclusion on what personal data is. Does big tech feed data created by people into 'AI' tools? Yes. Does little tech? Yes.

I'm okay to join Mozilla with my pitchfork if I know what it is about. I would like to have people that are clear about how they are looking after my interests, rather than just getting the mob riled up. Use of data, any data, is subject to an agreement. Go on and read the 'legal' that you have agreed to using Hacker News - they have a whole section on how they use personal information. Do we get our pitchforks out for HN, or are they cool?


It’s Microsoft that for years tried to kill Linux and still sharpens the knife.


So it's just a grudge against Microsoft then? That's not something I'll sign up for, thanks. You either believe in the concepts of not using personal data - for everyone - or you're selling something.


Which of these things is "training AI"?

  * an abuse detection system is trained on user behavior and data to flag bad actors for human review
  * When I perform a web search the ranking function is trained all user behavior
  * The email composer will suggest completions based on my own typing history
  * a public chat interface to an LLM is trained on my private emails
"training AI" is an implementation detail that probably doesn't belong in a privacy policy most of the time. Would be nice if Mozilla was more explicit about the behavior they are concerned with.


Yes, yes they are, and will. If we think some giant corpo has our best interest in mind and has corporate governance all buttoned up, HA capital H A.


"130 products"

Does that include LinkedIn. What stops LinkedIn from sharing data with Microsoft.


Nothing at all, it's a MS product and straight up integrated with their CRM


LinkedIn is owned by M$, you draw the conclusion.



The easy solution would be that everything created by IA (code, books, letters, art) should be licensed creative common (or something more akin to AGPL). Idc if you use my code to write your own, but you have to share it too.


> We had four lawyers, three privacy experts, and two campaigners look at Microsoft's new Service Agreement, which will go into effect on 30 September, and none of our experts could tell if Microsoft [will use your data] to train its AI models.

* in the USA, I assume?

With GDPR, if it's not a defined goal then the answer is no. In the USA, I hear things of some states having a similar law now but as a blanket statement without defined region (not even country) I'm not surprised if you can't give a definitive "no".


I think what these types of contracts show is that companies at Microsoft's level don't give a sh*t about national or over-regional regulation. They will use your data, and while you're busy reading 100s of pages of mumbo-jumbo they already have their models about you in place and sell access to it on a PPC basis.

And this is especially true with GDPR. Google's revenues are still growing, the advertising companies are fine, we all just have to click a few more cookie banners nowadays.


Defeatism and cynicism are simply excuses for complacency. If you think companies like Microsoft are exploiting your data, don't just whine about it; take action. Your passivity won't stop them; stricter regulations and public pressure will. Just the other day, Microsoft decided to decouple Teams from the Office Suite in an attempt to preempt a potential antitrust penalty from the European Union. It's just one example of how a regulatory body can influence how companies conduct their business.


On the one hand, yes, in that one example it worked. After quite a while, during which the competition already suffered massively.

On the other hand, triple nope. E.g. the council of German GDPR enforcers declared most of Windows and Microsoft Office unfit for use in public offices and companies. https://www.wbs.legal/it-und-internet-recht/datenschutzrecht... or https://datenschutzbeauftragter-dsgvo.com/dsgvo-teil-2-micro...

But nobody cares, everyone just ignores those kinds of decisions and continues to not get fined because as especially the second link shows, "its complicated". Nobody even knows which agreements with Microsoft are actually valid for which license, software and situation. At the same time, for all the office drones, Microsoft software is just "the standard" and nobody even considers switching. It is just a complete refusal to get compliant by all the public offices, let alone private companies.


The only reason Europe has successfully enacted such laws and pushed back against big tech is because it isn’t dominated by European companies. The US has no economic incentive to do the same.


> Ask Microsoft: Are you using our personal data to train AI?

According to Microsoft: "Your privacy is very important for us". So the answer is yes.


Could this also affect the self hosted ChatGPT in Azure? Trying to convince everyone to host a model themselves so they can use that data...


Of course, it is going to use it, else you won't be able to use any of their products ... the sad world we are living in


The fact that they turn on photo scanning on the photos uploaded on OneDrive tells me everything.


Of course they are, why do you think ms office practically begs you to save your documents on onedrive?? The save to onedrive is by default!!


>and none of our experts could tell if Microsoft plans on using your personal data to train its AI models

This means nothing. You don't know if someone is going to do something unless they say they are going to do it. No one knows if Bethesda is going to take down every video of Starfield on Youtube tomorrow that is monetized with ads. Sure you can speculate what someone will do, but you will never know for sure.


I bet Adobe is doing the same as well.


If a corporation has your data you can be damn sure they are mining it for all it is worth which includes feeding into AI learning these days. They work for the governments helping to surveil us. We "voluntarily" tell google, microsoft, apple, facebook everything that we would object telling to big brother. Boy do I have news for you!


Ah yes, Mozilla, the most respectful company when it comes to personal data.


Has nobody put MS Service agreement into ChatGPT and asked for an answer?


Would Microsoft really risk training data by government administration? It generates billions in revenue from government accounts and the vast industry supporting government that is compelled to use Microsoft products.


Based on previous legal battles at Microsoft, yes they would.

Microsoft chance their arm on anything they think they can get away with.

They think they are too big to fail


The market thinks they are too big to fail as well. For a while they had better credit rating than the US government.


I’m sure they have different ToS for government contracts, like they do for enterprise customers.


Where are all the assholes who say things like: "I'm loving the new Microsoft", "Microsoft is not the evil company it was back in the 90's", etc.

I'd love you hear how you justify this.


They're a little busy trying to shred their membership card of the Leopards Eating People's Faces Party.


Lately, they tend to gather in threads about Github.


Official answer: no. Right answer: yes, of course!


Thank you Mozilla.


So they will add a clause in their contract.


Can’t we just ask ChatGPT-4 for a summary since it’s passed the Bar exam ?


So much of modern technology is a trojan horse these days. This is basically the enshittification of cloud services. Just a few years ago people would say you were being a little paranoid if you were worried about your data passing through company's servers unencrypted, but here we are now.

If companies do start training models on what people consider to be private documents, then the issues we already have with AI taking the jobs and purposes of humans is going to become significantly worse.

Scientists working on papers will essentially not be able to trust that their work won't get out before they have published it. A competing research team, asking prompts in the right way, could chance upon a reply that gives them a clue as to what the other team have found or are doing. The same goes for competing companies and engineering teams. Or authors writing the next book in a series. Other people using that trained data could produce a cheap rip-off of that next paper, patent, book using AI.

And that will completely demotivate humans to actually do stuff. Because what's the point? No one will pay you for it, and a poor quality second rate product is obtainable much cheaper.

At that point I think we'll discover what the real limitations of AI are, as we, as a society, have to get used to using it over humans. And I somehow doubt we will be better off.


> Scientists working on papers will essentially not be able to trust that their work won't get out before they have published it.

For a long time already there's been a possibility for your work to "get out" (patented etc) before you've even finished working on it.

How did Google (and other big-tech entities) get to be the most powerful bodies on Earth?

When Google positioned themselves as "the search engine" they obtained more than a little digital privilege.

The ability to observe everything someone is searching, over a long time period, is also the ability to anticipate their moves and intentions. That's put competitors and researchers at a huge disadvantage. Researchers and competitors signal their intentions, perhaps quite unknowingly, long before they even have a clear idea of what they're doing themselves.

Of course AI only exacerbates this a million-fold. And surely what I'm saying is a decade behind the curve for anyone who is paying attention to the world.

If you have a business in tech, and are therefore a direct competitor of at least one major big-tech entity, you should not be using their services. Instead think about on-prem and local compute solutions for your most sensitive work, and relay all your search out via Tor hidden services or other mixnets for maximum diffusion .


If nine experts in privacy can't understand what Microsoft does with your data,

then in my opinion a court should step in and declare it void so that Microsoft isn't allowed to use any private data until they get their act together.

If it's so vague that it becomes meaningless that should default to granting no rights. Otherwise, why not publish your all-rights-granting privacy policy in Klingonian in a locked drawer in a toilet basement? ;)


> then in my opinion a court should step in and declare it void so that Microsoft isn't allowed to use any private data until they get their act together.

I hear what you're driving at, but "a court" cannot be both prosecutor and judge at the same time. This page is about that, possibly starting a civil suit to have a judge look at this and act accordingly.


The true failure is government. Mozilla shouldn’t have to lead this. The prosecutor should be the regulator.


America reaps what America sows.


Wouldn’t have too much of an issue with this if it was confined to America, but unfortunately it’s not.

So basically “the rest of us suffer what America sows”


What is a state prosecutor going to prosecute microsoft on? Vague T&C? As great as it may sound having the state going around proactively enforcing T&C of every product is not going to be effective or fair.


To an extent, think about vested interests here. Mozilla has little to gain by showcasing how clear a rival's new service agreement is!

The AI services section seems pretty clear in terms of limiting the use cases of user content:

"iv. Use of Your Content. As part of providing the AI services, Microsoft will process and store your inputs to the service as well as output from the service, for purposes of monitoring for and preventing abusive or harmful uses or outputs of the service."

Admittedly, I haven't read other parts to understand the full picture though.


If I understand the below correctly then it seems they can use your data for whatever purpose they want. Also training AI even though it does not explicitly say so.

"2b. To the extent necessary to provide the Services to you and others, to protect you and the Services, and to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services."

[1] https://www.microsoft.com/en-us/servicesagreement/upcoming.a...


Vested interests, yes. History, also.

For first, Mozilla doesn't do this every week. And Mozilla has a history to keep in mind general population interests for privacy and security. On the other hand, we have a corporation with a history of cheating, lying, stealing, scamming people, from fighting standards, abusing positions of power, overwriting choices going against their shareholders interests. So yeah, vested interests, but also we need to keep in mind the history of both entities

Also Mozilla didn't say "Oh we have the MS new ToS and we keep them private", they're there, get a lawyer and see if they're obvious to understand?


That's the only mention of AI using content. So it can be read in a few ways:

1. They will sometimes use the data for training their RLHF stuff, to "prevent harmful use" of the services.

2. The clause is exhaustive and therefore they won't use it for training, as otherwise that'd be mentioned, and are just going to log stuff for the usual monitoring purposes.

This is a storm in a teacup. I don't even know why I should care. If MS crawl some web pages I've written and AI gets slightly smarter by reading them, or if I have a chat with the AI and some engineers use it to make the AI work better, great. It's very hard to imagine concrete, real harm from them being able to do this, though I can understand why companies might worry about it spitting out their source code verbatim in some cases.


> I don't even know why I should care. If MS crawl some web pages I've written and AI gets slightly smarter by reading them

Crawling public web pages is a separate issue⁰ – by putting something online you aren't explicitly agreeing to any of MS's policies, at least in the eyes of the law. This is the same for anyone crawling public content not just MS.

This privacy policy covers all the content you might use MS apps and services for, i.e. where you are¹ automatically agreeing to MS's policies: OneDrive, potentially any local-only documents in Office, code in VS and other tools, perhaps anything stored on your PC running Windows.

> I don't even know why I should care.

If you don't use any MS products or services, and no products/services you do use are backed by MS's services, then you don't need to care personally. Or indeed if you do but consider everything you output or otherwise work on to be public domain. Otherwise, maybe it is something you should form an opinion on?

----

[0] time to switch my robots.txt files to “User-agent: * Disallow: /” – though it is very likely already too late for any existing content

[1] except where limited by law that you can afford to argue with MS's legal team over


I do use MS services. I still don't understand why I should care unless the AI starts simply repeating my private data in response to questions.

Now you could argue, what if I have documents with secret ideas or valuable IP that I don't want the AI to helpfully explain to others? That's definitely a valid concern! But for consumer uses, if it learns to draw better hands by looking at my holiday photos or whatever, then I don't see the problem.


> unless the AI starts simply repeating my private data in response to questions

That is a concern some have, particularly around CoPilot and the fact it has been trained with much copy-left covered code in public repositories.

They assure us that it is not possible for blocks of code to be regurgitated that would break things like *GPL, but they have yet to explain why, if that assurance is 100% definitely true, they have not included any of their private code in the training set. Surely they consider that their code is of good quality and would be valuable to include in the model.

> if it learns to draw better hands by looking at my holiday photos or whatever, then I don't see the problem

And if it gives an advertising firm working for a product you'd rather not be associated with an image of a family that look _very_ like yours? Again, the same assurance is given as per CoPilot, but again not everyone is assured by the assurance.

And of course it could happen anyway by chance even if your family is not in the training set. I don't not bother to lock my doors because someone with a good lock-pick could get in anyway.

And they are not doing it because of a great communal benefit (well, their individual coders may be, but the company certainly isn't), they are doing it for commercial benefit. I'd prefer they didn't with my data, or if they do I'd like my slice however small thankyouverymuch.*


> If you don't use any MS products or services, and no products/services you do use are backed by MS's services, then you don't need to care personally.

I beg to differ, wouldn't they be more inclined to care in case their data was being used in a product they do not interact with, rather than the one they do use - and in some way benefit from it?


That is a huge grey area of indirect use/agreement. If they don't interact with those services than someone else has given MS the data so from MS's PoV someone else has agreed to the policy and from the users PoV someone else has perhaps given their data to MS without permission. So yes, a concern, but not necessarily one relating to this policy except any clauses it has about removing data and its use when they are informed they shouldn't have it.


No, that is true. There are multiple interpretations here. I gave the most optimistic one!


That paragraph says some things that they can do. It in no way says they won't use your content for AI training and any number of other things.

Mozilla's point is that the whole document is sufficiently vague that they could use it to defend pretty much whatever use of your content that conceive of now or in the near future.


Why would they single out those specific uses then, if you consider express prohibitions are necessary?


To make it look, on cursory reading, like the policy is something you are comfortable to agree to. Legal theatre.

Also because those specific uses are mentioned in existing law and/or have been otherwise successfully defended. It gives their lawyers as many explicit tools as possible, before they need to argue around the implicit ones enabled by their policies & agreements being deliberately more vague elsewhere.

The point is that if they don't say that they won't, then they pretty much can if they choose to.


Interesting! A rather cynical approach. Although preferable to naivety on my part - I'd expect a court to hold the list exhaustive if challenged.


With a sign on the door saying "beware of the leopard"


yeah, that's basically one of core tenants of GDPR.

>Consent must be a specific, freely given, plainly worded, and unambiguous affirmation given by the data subject;


GDPR and indeed any data protection laws may well be completely irrelevant in the context of Microsoft's services. Even if relevant, consent is unlikely to be a relevant as a processing basis under GDPR in the context of usage of MS services. Performance of contract or legitimate interests much more likely to be relevant...


I'm my experience GDPR is relevant.

I need to inform my customers what I do with their personal data. That includes to which companies I share that data with.

Having an excel with customer data is providing that data to Microsoft. So I need, as responsible of the data, to know how they will use it. Any use case that isn't obvious have to be cleared stated in the data privacy agreement. Including moving data outside EU into other countries like America (where US government can request that data without even informing us) or using their data to train AI.

Come'on. If we need to inform that we used chatgpt (just in case they provide PI), why we will not need to inform about Microsoft.


Key word is "may" be completely irrelevant! Of course, if you're providing an Excel of customer data, it will be relevant if the user is in the EU. But still, consent won't be relevant in that context.

User content may include personal data but may also not...so in some senses, better to include totality of use cases in a non-data protection related document.


It sounds like I might be OOTL, has there been recent legal action relevant to this?


Thanks.

Synthetic data might be one perspective.


Samsung's ToS,

> The Sites may allow you to share things like comments, photos, messages, or documents with us or with other users. When you share content, you continue to own the intellectual property rights to your content and you are free to share the content with anyone else wherever you want. However, to use your content on our Sites, you need to grant us a license for any content that you create or upload using our Sites. When you upload, transmit, create, post, display or otherwise provide any information, materials, documents, media files or other content on or through our Sites (“User Content”) you grant us an irrevocable, unlimited, worldwide, royalty-free, and non-exclusive license to copy, reproduce, adapt, modify, edit, distribute, translate, publish, publicly perform and publicly display the User Content (“User Content License”), to the full extent allowed by Applicable Law.

It's my understanding that "Sites" is all Samsung products as it is vaguely referenced in the ToS itself.

https://www.samsung.com/us/common/legal/


> These Terms of Use (“Terms”) apply to your use of this website, any associated mobile sites, services, applications, or platforms (“Sites”).

Correct

But this is only for the US, for example the German TOS is very short and does not state that and is only for the Website and Remote Services. I think that each app (or at least some subgroups of products) has/have its on TOS

EDIT: Nevermind, I found the "real" TOS ("AGB") for germany https://terms.account.samsung.com/contents/legal/deu/deu/gen...

It also states the same

And these are the services that the TOS ("AGB") cover: https://terms.account.samsung.com/contents/legal/deu/deu/ass...


The problem is that in order to do anything even slightly meaningful like displaying profile pictures you need to grant the right to store and process. Since it is hard to keep track of distributed data at scale and some data being quoted by other users you need these rights indefinitely. The problem is of course the generalization of data in conjunction with generalized rights gives away de-facto everything. On the other hand narrower general terms risk creating liabilities for companies.

I don‘t think building rules based on low level technical functions for a portfolio of high level applications is feasible. Honest actors in their natural interest to self protect have created a legal framework which allows bad behavior by bad actors (or their future bad selves).

Not sure how a future framework could look like but the evolution of RBAC may give some clues. Data probably needs to be labeled. Processing needs to be tied to intent. Acknowledging that rules for larger datasets must exist and need to differ by size.


While we are talking about it... can we make ToS ilegal?

Why do we have to abide by rules while browsing the web? Why do businesses fear litigation so much, they hire lawyers to write and maintain a huge document nobody can ever read or understand?

This is failure from governments that can't set basic rules for human interaction.

All this


> can we make ToS ilegal? Why do we have to abide by rules while browsing the web?

Because "the web" (ie the internet) isn't a natural object. Its composed of stuff owned by companies and people, and they have the right to say how you use their stuff when you interact with it.


> Why do we have to abide by rules while browsing the web?

Are you serious?


Because different companies will reasonably need different terms, and it's important that these are set out somewhere and accessible.

I wouldn't expect the same terms to apply for my public twitter posts as my paid fastmail account.


Why do we have to abide by rules while going around in the world?

I'm not in favour of abusive ToS, but your argument isn't the best one that can be made against it.


> While we are talking about it... can we make ToS ilegal?

What does that even mean? Laws trump Terms of service/agreements and contracts of any kind. Do they not?


I think he meant that instead of each company creating their own ToS, the government should set the standard or limitations on what a company can do.

> Laws trump Terms of service/agreements and contracts of any kind.

Web is not regulated by the government.


"The government"? Which one? Some governments do regulate certain aspects of the web directly.

All companies have to do is abide the local rules set by regulations, if such exist. And some very much do. Maybe not in your jurisdiction?

I think you two have things backwards


> All companies have to do is abide the local rules set by regulations

The point being that there are not enough regulations.

One feature that I wish exist is the option to completely erase your account on a particular website along with the data they have collected from you (there can a waiting time of few months before they erase it from their servers). This feature can only be bought out by government regulation because most websites don't have the incentive.


If they don't explicitly say in the ToS that they aren't going to use it in any particular way, then you can be sure that they are if there is potential for commercial gain.

Even if they do explicitly say that they aren't going to use it, I'm going to be sceptical. There will be a nice pile of caveats and exclusions within the legalese, and if not they might just use it anyway and hope they can afford to ride out any resulting legal action if people notice.


The fact that phones eavesdrop on conversations and turn those into relevant ads within minutes is reason enough for me to believe that they will use the data regardless of what their privacy policy says.


If we need to ask, don't we know the answer?


We know the answer. They gobble everything and it's been going on for a while. It's not just MS. Ant-terrorism, online safety, AI training... At this point, what's the difference?


"If you need to ask, you're not going to understand it anyway."


Time to cancel my subscription to Microsoft’s online products. Never buying another Windows machine again. Apple or Linux only from here in out. Apple’s software and/or LibreOffice are more than enough.


All this privacy noise feels a lot like Whataboutism.

Firefox has telemetry and studies enabled by default. Alps while mozzilla call out x, y, z for the same thing.


No one from the US is going to see this.


I'm in Utah, I saw it.


People work for MS in Europe.

There might be separate questions for different regions too?


How can we define "personal" data?

Is data from public LinkedIn accounts considered personal?

However, I believe that our Office 365 personal data should be prohibited from being used to train AI, as it is sensitive information.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: