Upload to the Community Archive!

Towards data sovereignty, and a backup of the twitter canon.

Sep 05, 2024

The Community Archive is a public database and API of the full twitter history of everyone who uploads their archive. It is also open source.

We believe there is immense value in preserving the history of subcultures on twitter. While Twitter charges exorbitantly for convenient access to our own data, we are getting people to request and download their data, and upload it to a common archive!

Benefits:

Preserve the canon of important threads and discussions
Enable cool applications like:
- Semantic search over tweets
- Producing books from tweet collections
- Dashboards of your interests and takes
- LLM-based digital anthropology
- Tracing the diffusion of ideas over time
- Migrating the canon to alternative platforms like Bluesky
- And much more!

We've got the archive and API up and running, but that's just step one. Now comes the legwork: we're personally reaching out to old school and central accounts and walking them through requesting and uploading their archives.

If you have any questions or need support, please reach out to me on twitter, and go request your archive!

FAQ

Uploading

How do I upload my archive?
1. Request your Twitter archive (you'll need to log in and verify via email)
2. Wait 1-2 days for the download link
3. Upload to our archive
Thank you for your persistence!
What can I do with it now? For now, only search across all the archives. The plan for this website is to let you upload, and to serve as a dashboard for the effort of gathering community archive. We’re hoping other neat and complex applications will come from others building on this data, or from our own side-projects.
What about my private data? Your private data never leaves your computer. We only access profile information, tweets, likes, and follows. Our code is open source and can be verified by anyone. For extra peace of mind, you can unzip your archive, delete the data/direct-messages.js file, and re-zip before uploading.
Can I filter which of my tweets are added to the archive? Not currently, but we're planning a UI to filter by date or keyword.
How can I support the archive? We have a monthly donation you can subscribe to, and a one-time donation link!

Thanks for reading and go request your archive today!

Oct 4, 2024

Technical part of the community archive post:

# Building

How does the API work?

- Upload through our website

- The API is read-only

- Access via curl requests or Supabase client libraries

- See our README for more details and examples

Will you offer full dumps of the data? Yes.

Will the API always be free? It'll remain free as long as we can sustainably run it. While we reserve the right to discontinue the project, we'll make every effort to distribute full data dumps to interested parties.

Are you storing pictures too? Not yet due to storage costs, but we'd love to in the future.

What are some ideas for projects using this data?

- Semantic search over tweets;

- Self-knowledge applications like analyzing your trends over time and summaries of your thoughts on topics you're usually talking about;

- Digital anthropology and sociology research ("what are the origins of this idea / project / movement?", "how did these people start interacting?")

- Cross-archive search like "which of my friends are talking about this topic?"

- Producing stuff like books from your best tweets;

- Migrating to alternative networks like Bluesky could be made easier if the canon of important tweets is easy to migrate.

- If someone is active enough, you could produce a "digital twin" from their archives by e.g. fine-tuning an LLM on their tweets, or just selecting the best and putting them in context.

# Logistics

Will people have to keep re-uploading their archives to stay up to date? Yes, for now. We're focused on preserving historical data. Future ideas:

- Organize yearly "upload parties"

- Develop a browser extension for automated updates

How hard is it to get tweets otherwise? Twitter severely limits access. Current rates: $5000 for 1M tweets (far fewer than we aim to preserve, at a much higher cost).

Expand full comment

1 reply by xiq

Tasshin Fogleman

Sep 8, 2024

doing the lord's work, bless ❤️

2 more comments...

Fractal Sidequests

Discussion about this post