How to build a copy of Vine using IPFS

In September 2017, my team at TheoremOne embarked on a research project into distributed applications. We set our goal to build a distributed clone of Vine—the GIF sharing app:

Vine (/ˈvaɪn/) was a short-form video hosting service where users could share six-second-long looping video clips.
Vine @ Wikipedia

What is the Distributed Web?

This presentation by Brewster Kahle does a great job explaining what the Distributed Web is; it’s only 8 minutes short, so go ahead and watch it first, and then come back to the article:

To recap, the Distributed Web

is a web that is solid, reliable, private and fun
will be an evolution of the current web that will fix many of the shortfalls of our current cloud computing paradigm
will flip data ownership and the associated power to monetize that data from large companies to users

The distributed web will rely on two players: peers/nodes and distributed apps.

Peer-to-peer/node-to-node (P2P)

The Distributed Web relies on devices connecting directly to other devices, which can be referred to as peers or nodes. Anyone who has used BitTorrent has already experienced a distributed network. Unlike downloading a single large video from YouTube, small pieces of the target file can be downloaded from multiple peers at once.

Distributed Apps (Dapps)

Distributed Apps are apps built on this new distributed protocol stack. Such apps communicate directly with other peers/clients in the network, and may store most data they generate locally.

This means that in the Distributed Web, data lives separately from apps, and so the network effects aren’t owned by a single entity. We own our own data, and if one app starts doing something we don’t like, someone can make a new app that uses the same data to get around the problem. Similar to how email clients work—same data, different interface. The data in the Distributed Web can be private: Dapps can choose to never store data on third party servers.

What about Blockchain?

If you’re wondering how Blockchain fits into all of this, the answer is that it’s optional.

Blockchain gives us distributed consensus via an auditable, distributed ledger. Anywhere a client↔server app needs server-side processing—data validation, financial transactions, any kind of cheating-prevention—a Dapp needs distributed consensus. Blockchain is a tool we can use to mediate trustworthy interactions amongst a network of untrusted peers.

Some simple kinds of Dapps, however, will not require any cheating-prevention. Such Dapps can be built without Blockchain.

Planning our Dapp

To recap, we knew we wanted to build a Dapp. This meant (on a high level) that:

each node should have their own local stream of data
users should be able to create new files (GIFs in our case) in their personal stream
other nodes should be able to consume the data from the streams–and similarly to BitTorrent help redistribute the content to other nodes

In addition, we had some other design goals for our Dapp:

No downloaded apps: our app must work entirely in the browser, rather than requiring a separate install
No centralization: our app must work in a fully-decentralized, peer-to-peer paradigm, with no reliance on centralized servers for any functionality

This constrained us a bit. While the video files created by Vine used Vine’s own proprietary video format, our app wouldn’t be able to use video files at all. We would have to shoot and share GIFs. The GIF format, though very inefficient, is universally supported, whereas different browsers create videos in different formats. Since we did not want to use a centralized server to transcode the files, we had to use GIFs.

The research

We knew what to look for:

Share data in a P2P fashion between peers.
Maintain a shared data structure between peers.
Receive real-time updates from peers posting new content.
Offline first capabilities: work while offline (or in an intermittent connectivity state).
Be able to publish local updates and sync remote updates when the device went back online.

We knew we were not going to need to implement the lowest level details; many teams have already been hard at work laying the foundations of the Distributed Web for some time now. They’ve poured a massive amount of resources, time, and emotion into their work. We appreciate all they’ve done and want to build off of it, not reinvent it.

Below are the projects we found building protocols and tools for the Distributed Web.

IPFS

A peer-to-peer hypermedia protocol to make the web faster, safer, and more open.

It makes it possible to distribute high volumes of data with high efficiency.
It provides mechanisms for zero duplication of data that translates into savings in storage costs.
It provides historic versioning (like git) and makes it simple to set up resilient networks for mirroring data.
It remains true to the original vision of the open and flat web and delivers technology that makes it possible to build apps that make that vision a reality.
It powers the creation of diversely resilient networks which enable persistent availability with or without Internet backbone connectivity.
It aims to replace HTTP and build a better web for all of us.

Dat Project

Dat is the distributed data sharing tool.

Distributed Sync: Dat syncs and streams data directly between devices, putting you in control of where your data goes.
Efficient Storage: Data is deduplicated between versions, reducing bandwidth costs and improving speed.
Data Preservation: Dat uses Secure Registers with state of the art cryptography to ensure data is trusted, archived, and preserved.

Secure Scuttlebutt

Secure Scuttlebutt is a database of unforgeable append-only feeds, optimized for efficient replication for peer to peer protocols.

It provides tools for creating a feed, posting messages to that feed, verifying a feed was created by someone else, streaming messages to and from feeds. Unforgeable means that only the owner of a feed can modify that feed, as enforced by digital signing.
It is useful for peer-to-peer applications.
It makes it easy to encrypt messages.

Blockstack

Blockstack is a new decentralized internet where users own their data and apps run locally.

Own Your Data. It’s kept on your device and encrypted before being backed up in the cloud. This removes the need for blind trust in 3rd parties and makes it easier to keep your data safe.
Own Your Apps. Apps are loaded via a secure domain name system and live on your devices. Independence from 3rd parties makes you more safe.
Own Your Identity. Your digital keys are seamlessly generated and kept on your device. This lets you move freely between apps and control your online experience.

ZeroNet

Open, free and uncensorable websites, using Bitcoin cryptography and BitTorrent network.

Real-time updated sites.
Password-less BIP32 based authorization: Your account is protected by the same cryptography as your Bitcoin wallet.
Built-in SQL server with P2P data synchronization: Allows easier site development and faster page load times.

Ultimately, we chose to work with IPFS.

Thicket

For anyone interested this is our original research document: Thicket Research.

My colleague Chad invited me to try out the IPFS PubSub Room demo. We were able to chat with each other using the Distributed Web as the backbone–I was shocked, it was working!

Then we tried some hacking: we cloned the js-ipfs repo and tweaked their examples to see if we could post and share our GIFs via their network. It worked!

We had found a network that provided tools that we could use for our experiment.

Laying the groundwork

The basic steps we took to get our Dapp started:

Set up an IPFS node instance

const node = new IPFS({ repo: String(Math.random() + Date.now()) })

node.once('ready', () => { console.log('IPFS node is ready') })

Add files to the network

const gifSrc = 'GIF’s data url in a variable'

const res = await node.files.add(Buffer.from(gifSrc))

Here, res[0].hash holds the IPFS unique identifier for the file you submitted. To test the file you just added: https://ipfs.io/ipfs/${res[0].hash}

Fetch files from the network

const stream = await node.files.cat(res[0].hash)

Here, stream holds the value originally added, like the GIF’s data url. How res[0].hash ends up on the consuming node and not just the producing node is an exercise for the reader.

Find peers in the network

An experimental feature is needed to achieve this:

const node = new IPFS({
  repo: String(Math.random() + Date.now()),
  EXPERIMENTAL: { pubsub: true }
})

node.once('ready', () => {
  const room = Room(ipfs, 'ipfs-pubsub-demo')

  room.on('peer joined', peer => console.log(`We found a peer: ${peer}`))
  room.on('peer left', peer => console.log(`A peer left: ${peer}`))
}))

Any two nodes joining the same room can exchanges messages.

// receive
room.on('message', msg =>
  console.log(`Got a message from ${msg.from}: ${msg.data.toString()}`))

// broadcast
room.broadcast('Every peer in the room will receive this message')

Sync a shared data structure

Having a shared data structure between apps (client apps can be either frontend or backend–and even a mixture of them) can be a tricky thing. Fortunately for us, a tool called YJS exists and takes care of syncing and computing the updated state of the data.

Even better, a module that works with YJS and integrates with IPFS exists: y-ipfs-connector. Right on target:

import Y from 'yjs'
import yIPFSConnector from 'y-ipfs-connector'
import yMemory from 'y-memory'
import yArray from 'y-array'

Y.extend(yMemory, yArray, yIPFSConnector)

// create IPFS node instances
const ipfs = new IPFS({
  EXPERIMENTAL: {
    pubsub: true // need this to work
  }
})

const y = await Y({
  db: { name: 'memory' },
  connector: {
    name: 'ipfs', // use the IPFS connector
    ipfs: ipfs, // instance from above
    room: 'ipfs-shared-data-structure-demo',
  },
  share: { list: 'Array' }
})

y.share.list now holds the shared data structure (array in this case). In order to add to or delete from the array, any peer can use the corresponding methods:

y.share.list.push(['data to be inserted'])

y.share.list.delete(indexToBeDeleted)

To receive updates, peers need to observe the object:

y.share.list.observe(() => {
  // when this method is executed,
  // the updated and computed state of the array can be accesed via
  y.share.list.toArray()
})

The groundwork was laid. Time to move deeper into Dapp product design challenges.

A mind shift: authentication-free Communities

Things were working out for us! (Mostly.)

What we had achieved so far:

Shoot GIFs (this proved to be really fun)
Store GIFs locally
Add GIFs to the IPFS network
Retrieve GIFs from the IPFS network
Connect peers together in namespaced rooms
Peers in a same room shared an array-like structure for their publications
Peers were receiving real-time updates of other peers inserting/deleting publications into the shared array

Around this time, Sam joined our team and designed initial wireframes, helping us think through the details of how our app could look and function. Later Sarah joined the team and continued designing a beautiful product–we loved it and couldn’t wait to bring it to life. Our Distributed Web experiment could feel like a real app instead of a toy.

And at this point, we ran into a challenge: we had not yet introduced a user identity layer. Any peer could delete any publication–even publications that had been posted by someone else. A user identity layer was an upcoming item in our to-do list, but we had to make a choice: we either researched how to implement a user identity layer, or we worked on the UI for an app using what we had come up with so far.

We found many existing libraries/protocols for building user identity: MetaMask, Blockstack and SSB. Distributed identity is a fascinating topic; if you want to dig deeper we recommend this Identity Panel discussion from Blockstack Summit 2017.

Ultimately, though, we decided that implementing identity was too risky for our timeline. We wanted to make sure we shipped a minimal viable product quickly.

Still, we had to have a way for anonymous users to create content that wouldn’t pollute the app experience for anyone else. We had to give users control over the data they saw and created within Thicket.

Our solution: Communities.

A Thicket Community nicely mirrors a real-world community. In Thicket, a Community is a group of peers that share resources and responsibility for their Community’s content. Each member downloads all the content and stores it locally (shares disk space resource) and redistributes such content to any other peer fetching it (shares bandwidth resource). Each member is responsible for the content posted to and deleted from their Community, and must be mindful of what new members they allow to join the Community.

In Thicket, if you are part of a Community you are not alone, the Community’s got your back: any content you post to the Community gets downloaded and stored locally on the devices of other members. When you go offline, newly invited Community members can still get your content from any other Community member who has the app open.

Thicket & Next steps

Thicket

This process began with our idea to research tools and protocols for the Distributed Web and we ended up delivering a Dapp that uses IPFS’s protocols and tools to interact with the Distributed Web.

This research experiment can be used as a starting point for future works on decentralized social networks. We’ve taken a first step, found out some of the problems and figured out how to make things work with the tools we chose. Now it is up to you to continue building the layers for the distributed web: Thicket.