Importing email archives

discussion
feedback

#1

We could, in principle, import all of the developers list’s archives. That would require some work. The easiest way would be to simply break the current archives into chunks of maximum allowed length (would amount to roughly 10 chunks per year, I think) and post each chunk as a single post. This will not be pretty since it won’t be categorized, but I think it will be searchable.

Concern: people’s email addresses are there (in form of “developer at domain”), so we may need to clean that out first…

Other concerns? My biggest concern is that it is going to be messy (trying to import)…

A poll for everyone to vote, anonymously.

  • Importing developers list archives is crucial, even if it is messy and hard to do. I would not use the forum otherwise.
  • Importing developers list archives is not crucial and I would use the forum anyway; important stuff will naturally emerge here at some point.

0 voters


#2

The mailing list is already public, so users who have posted have their email addresses out there anyway. But before any import we should probably give such a disclaimer on the developers list itself.

Also, I’m not in favor of the “chunk” method. I suppose you saw this thread? https://meta.discourse.org/t/import-mailman-archives-into-discourse/18537

I kinda feel that some sort of import or integration with the mailing list is important, otherwise developers might just see this as unnecessary fragmenting of our support forums.


#3

Yes, I saw it. So, do we have the .mbox files for the archives? (I only found txt files.) In that case, we can try using/modifying the script that is mentioned there, although it does not seem like it will be straightforward.

some sort of import or integration with the mailing list is important, otherwise developers might just see this as unnecessary fragmenting of our support forums

Yes, this is probably true. I mean, the “unnecessary” part I think is perhaps debatable (given the usability/discoverability of the existing stuff), but the “fragmenting” part is definitely true.


#4

do we have the .mbox files for the archives? (I only found txt files.)

mbox is a directory/file format for storing email. It’s probably on the server hosting the lists, but it may be possible to clone it with something like mbsync or offlineimap.

I mean, the “unnecessary” part I think is perhaps debatable (given the usability/discoverability of the existing stuff), but the “fragmenting” part is definitely true.

Sorry, I meant that the fragmentation might seem unnecessary to someone who is satisfied with the mailing lists as-is.


#5

Could you elaborate on this, actually? I am not disagreeing with you, just wondering what exactly you meant :).


#6

I was referring to this:


#7

Yes, and your objection to this is?.. To clarify, I don’t think it is ideal, and the main reason I am suggesting it is that there seems to be no good working solution to import an email archive.


#8

I wouldn’t want to have one big mega thread for a months worth of list emails. I don’t think it would be easy to browse, even if it is searchable.

The link I provided above has exactly that. The first reply is a bit dismal (basically “don’'t bother”), but then Jeff Atwood provides a ruby script towards the bottom of the thread.


#9

This is not my impression (that the provided script will work well, having read the thread you mention a couple times and a few other related threads). In particular, I saw someone say about it: “Our developer was able to use it once, as a proof of concept” :). There are also lengthy discussions about the mess that starts when it tries to automatically create users, send them digests of email, etc. Most people also say it is impossible to do without editing the script, which will be tricky for me since I don’t know Ruby. The way it is it will also wipe the existing database, but that is not a big issue of course since we are just starting here.) That said, I am willing to try if most people insist on this. I certainly agree that having megathreads is far from ideal.


#10

It’s included in the official repository (https://github.com/discourse/discourse/blob/master/script/import_scripts/mbox.rb), and linked from the installation documentation (https://github.com/discourse/discourse/blob/master/docs/INSTALL-cloud.md, toward the bottom “See our open source importers”). And here’s a howto: https://meta.discourse.org/t/howto-import-mbox-mailing-list-files/51233

But yes, it’s good to be cautious.

Edit: I now see that your were referencing that howto article. I think a lot of those issues could be mitigated by, e.g., scrubbing email addresses (which I was initially opposed to, but if Discourse automatically sends digests to those emails then I have a different opinion), backing up before importing, etc. But also if it’s going to be a big hassle then maybe it’s not worth importing the archives.


#11

I added a poll to the original post… Not sure I phrased it well though :slight_smile:


#12

There is going to be strong confirmation bias from people filling out the poll…

I think migrating from the email lists to this forum should be a long term aim. By using both, we fragment support. However, the forum can do everything that the mailing lists do! Importing the archives doesn’t just make it easier to search them – it also makes it easier to migrate from the mailing lists to this forum.

The other crucial migration task is allowing people to reply to posts via email. Perhaps discussion of that task should be in a separate thread, though.


#13

Agreed about the confirmation bias. Let us not treat it as something that should be decided via a majority vote, then.