The goal of retrieval bot is to offer a scalable framework to perform retrieval testing over Filecoin network.
There is no centralized orchestrator to manage retrieval queue or work. Instead, it uses MongoDB to manage work queue as well as saving retrieval results.
The retrieval success ratio and count for each SP per day per protocol has been exported into result.zip
Query used to generate the data
db.task_result.aggregate([
  {
    $group: {
      _id: {
        sp: "$task.provider.id",
        type: "$task.module",
        date: {
          $dateToString: {
            format: "%Y-%m-%d",
            date: "$created_at",
            timezone: "UTC",
          }
        },
      },
      count: { $sum: 1 },
      success: { $sum: { $cond: [{ $eq: ["$result.success", true] }, 1, 0] } },
    }
  },
  {
    $project: {
      _id: 0,
      sp: "$_id.sp",
      type: "$_id.type",
      date: "$_id.date",
      success: "$success",
      total: "$count",
      ratio: { $divide: ["$success", "$count"] },
    }
  }
])Workers refer to the unit that consumes the worker queue. There are 4 basic types of workers as of now.
This worker currently only support retrieving a single block from the storage provider:
- Lookup the provider's libp2p protocols
- If it is using boost market, then lookup the supported retrieval protocols
- Find the bitswap protocol info and make a single block retrieval
This worker currently only support retrieving the root block from the storage provider:
- Make graphsync retrieval with selector that only matches root block from the storage provider
This worker currently only support retrieving the first few MiB of the pieces from the storage provider:
- Lookup the provider's libp2p protocols
- If it is using boost market, then lookup the supported retrieval protocols
- Find the HTTP protocol info and make the retrieval for up to first few MiB
This type of worker does nothing but saves random result to the database. It is used to test the database connection and the queue.
Integrations refer to the unit that either pushes work item to the retrieval queue, or other long-running jobs that may interact with the database in different ways
This integration periodically pulls the statemarketdeals.json from GLIP API and saves it to the database.
This integration pulls random active deals from StateMarketDeals database and push Bitswap/Graphsync/HTTP retrieval workitems into the work queue.
- Setup a mongodb server
- Setup a free ipinfo account and grab a token
- make build
- Run the software natively or via a docker with environment variables. You need to run three programs:
- statemarketdealsthat pulls statemarketdeals.json from GLIP API and saves it to the database. Check .env.statemarketdeals for environment variables.
- filplus_integrationthat queues retrieval tasks into a task queue. Check .env.filplus for environment variables.
- retrieval_workerthat consumes the task queue and performs the retrieval. Check .env.retrievalworker for environment variables.
 
- All programs above will load .envfile in the working directory so you will need to copy the relevant environment variable file to.env
- When running retrieval_worker, you need to make surebitswap_worker,graphsync_worker,http_workerare in the working directory as well.