Design

PaStA's integration with Patchwork can be divided into some broad milestones:

  1. An initial import of mail archives from Patchwork to PaSta.
  2. A method for PaStA to be in sync with Patchwork's patches as Patchwork keeps receiving patches.
  3. Pushing the results of PaStA's computations back to Patchwork.

1. Initial Import

It was a deliberate decision to separate 1 and 2 as two independent steps instead of a single step. The reason being we expect the initial import of patches to be a significantly heavy task. 10s to 100s of GBs of patch archives will need to be imported. As a result of this we want some more control over the initial import and choose to do that as a separate step.

The initial import from Patchwork will be carried out using Patchwork's dumparchive command.

2. PaStA Sync

Patchwork IDs will be used to keep a track of what patches that have arrived, and what needs to be pulled. We dont rely on message dates, as mails can arrive out of order.

PaStA has knowledge about the highest patchwork id in it's patch database, it will then ask Patchwork for all patches that have patchwork id > highest patchwork id in PaStA. There are two ways of doing this:

A choice amongst the 2 methods will be needed to made after some performance analysis. Another factor to consider is patches arriving while PaStA is pulling patches from Patchwork. At this stage we just ignore this.

3. PaStA Push

todo

Implementation

PaStA has a MailContainer class that currently is inherited by MboxRaw and PubInbox which handles raw mail boxes and public inboxes respectively. We treat Patchwork as another kind of mail container for PaStA and introduce a new subclass of MailContainer, MboxPatchwork.

The configuration file for PaStA for integrating with patchwork will look something like this:

[mbox.patchwork]
uri = <patchwork-url>
token = <api-token>
project_ids = [1, 3, 5] # list of project ids in a Patchwork instance that PaStA will interact with.

initial_archives = [(1, '/path/to/archive/1'), (3, '/path/to/archive/3'), (5,'/path/to/archive/5')]

A Patchwork project generally keeps track of a single mailing list. A MboxPatchwork instance will be created for each project id defined in the configuration.

If initial_archives are set, PaStA will treat it as an initial import from Patchwork and recreate index files and the patch directory structure for each project from their respective mailboxes.

If the initial_archives are not set, PaStA will treat it as an update instead of an initial import and will start pulling patches from Patchwork for the given projects. PaStA will both update the index files and the patch directory structure for the newly pulled patches.

USE_PATCHWORK_ID will be removed from the configuration, instead we will be adding patchwork ids to the index file to facilitate mapping of message ids to patchwork ids. Note that a message id can be associated with a list of patches in the index (multiple mails with slightly different content may have the same message id). Each of these patches will be associated with a patchwork id.