Hi HN, I have been working on something directly related to AI and copyright. Would it be ok to point it out here?
Recently The Pile was taken offline from The Eye by DMCA. One solution is to host it offshore, which we're calling The Nose: https://thenose.cc
The technical security measures may be of interest to the audience here, so I'll be as detailed as possible. The following formula should be safe if you follow it to the letter.
The basic setup is to install Whonix on a VeraCrypt drive, acquire Monero through any method, use a service like changenow to convert Bitcoin on a wallet stored only on the Whonix installation, sign up for a ProtonMail account (when they ask for email verification, use a no signup inbox service like yopmail), rent a dedicated server at Shinjiru using bitcoin, and register the domain at the same place. They're both a registrar and a server host, which simplifies matters. Use N/A for all contact info. Use Cloudflare to manage your site's DNS records.
Wallet security: do not ever move Bitcoin to any wallet linked with your personal identity. This is easier said than done. First there is the question of how to store passwords. These are the keys to the kingdom, and are the most sensitive aspect by far, because they're intimately linked with you. Additionally, if hardware failure occurs, you'll lose everything if you store them on the Whonix drive. My setup is to use KeePass to store the passwords on a laptop I use to VNC into the computer with the Whonix drive, and then save the database to a folder that gets synced to the cloud. The only flaw in this model is that if your laptop is compromised while your KeePass is open, you're done. But (as Ulbricht discovered) this is always true. The threat model assumes lawyers coming after you with DMCA with additional safeguards against the FBI narrowing down who you are in real life. If your physical location is compromised through any method, you're done.
All it takes is one mistake to end you. SSH into your box from your real computer? Done. Sign up using your real name with Mailgun? Done. Accidentally say "Thanks, <your real name>" to the support staff at Shinjiru in an email? Done. Abandon ship and close everything down.
The security of this technique comes down to simplicity. There are very few moving parts. I opted for nginx + mediawiki with Discourse forums at https://forums.thenose.cc (though I don't know if anyone will care enough to join). Logging is turned off to protect users downloading the data, though you only have my word on this. But reputation is the only thing a hacker has ever truly had anyway.
If you're serious about following the above recipe, I urge you to read through the Whonix docs on online anonymity: https://www.whonix.org/wiki/Documentation Remember, threat model is your saving grace. You probably aren't starting a darknet, so you can relax your threat model in terms of physical safety. But you won't get away with any mistakes made in cyberspace.
As for the site itself, I've avoided asking for donations for now (hosting is $130/mo though, which will get expensive) or describing anything beyond this HN comment. I'll say it's for simplicity, but in fact I only started it a few days ago and haven't had time to provide anything but the essence of our service: hosting AI datasets in stable, copyright-resistant ways.
If additional datasets beyond The Pile need protection or distribution, you can contact me at nostril@thenose.cc or at https://forums.thenose.cc. I have a 4TB drive, of which 800gb is being used by The Pile so far.
You can try to join the EleuthorAI discord. They are the people that crated the pile iirc. It's very active and I think you would be able to get in touch there.
Recently The Pile was taken offline from The Eye by DMCA. One solution is to host it offshore, which we're calling The Nose: https://thenose.cc
The technical security measures may be of interest to the audience here, so I'll be as detailed as possible. The following formula should be safe if you follow it to the letter.
The basic setup is to install Whonix on a VeraCrypt drive, acquire Monero through any method, use a service like changenow to convert Bitcoin on a wallet stored only on the Whonix installation, sign up for a ProtonMail account (when they ask for email verification, use a no signup inbox service like yopmail), rent a dedicated server at Shinjiru using bitcoin, and register the domain at the same place. They're both a registrar and a server host, which simplifies matters. Use N/A for all contact info. Use Cloudflare to manage your site's DNS records.
Wallet security: do not ever move Bitcoin to any wallet linked with your personal identity. This is easier said than done. First there is the question of how to store passwords. These are the keys to the kingdom, and are the most sensitive aspect by far, because they're intimately linked with you. Additionally, if hardware failure occurs, you'll lose everything if you store them on the Whonix drive. My setup is to use KeePass to store the passwords on a laptop I use to VNC into the computer with the Whonix drive, and then save the database to a folder that gets synced to the cloud. The only flaw in this model is that if your laptop is compromised while your KeePass is open, you're done. But (as Ulbricht discovered) this is always true. The threat model assumes lawyers coming after you with DMCA with additional safeguards against the FBI narrowing down who you are in real life. If your physical location is compromised through any method, you're done.
All it takes is one mistake to end you. SSH into your box from your real computer? Done. Sign up using your real name with Mailgun? Done. Accidentally say "Thanks, <your real name>" to the support staff at Shinjiru in an email? Done. Abandon ship and close everything down.
The security of this technique comes down to simplicity. There are very few moving parts. I opted for nginx + mediawiki with Discourse forums at https://forums.thenose.cc (though I don't know if anyone will care enough to join). Logging is turned off to protect users downloading the data, though you only have my word on this. But reputation is the only thing a hacker has ever truly had anyway.
If you're serious about following the above recipe, I urge you to read through the Whonix docs on online anonymity: https://www.whonix.org/wiki/Documentation Remember, threat model is your saving grace. You probably aren't starting a darknet, so you can relax your threat model in terms of physical safety. But you won't get away with any mistakes made in cyberspace.
As for the site itself, I've avoided asking for donations for now (hosting is $130/mo though, which will get expensive) or describing anything beyond this HN comment. I'll say it's for simplicity, but in fact I only started it a few days ago and haven't had time to provide anything but the essence of our service: hosting AI datasets in stable, copyright-resistant ways.
If additional datasets beyond The Pile need protection or distribution, you can contact me at nostril@thenose.cc or at https://forums.thenose.cc. I have a 4TB drive, of which 800gb is being used by The Pile so far.