Thread overview
dpaste and the wayback machine
Feb 08, 2016
Wyatt
Feb 08, 2016
Jesse Phillips
Feb 08, 2016
Wyatt
February 07, 2016
Dpaste currently does not expire pastes by default. I was thinking it would be nice if it saved them in the Wayback Machine such that they are archived redundantly.

I'm not sure what's the way to do it - probably linking the newly-generated paste URLs from a page that the Wayback Machine already knows of.

I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when the WM does not see a link that is search for, it offers the option to archive it) obtaining https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.


Thoughts?

Andrei
February 08, 2016
On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu wrote:
> Dpaste currently does not expire pastes by default. I was thinking it would be nice if it saved them in the Wayback Machine such that they are archived redundantly.
>
> I'm not sure what's the way to do it - probably linking the newly-generated paste URLs from a page that the Wayback Machine already knows of.
>
> I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when the WM does not see a link that is search for, it offers the option to archive it) obtaining https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.
>
>
> Thoughts?
>
You want it in Wayback?  Sounds like you need some WARC [0].  Since anyone can upload to IA (using a nice S3-like API, even [1]), this should be pretty uncomplicated.  If you can get a list of all the paste URLs, you can use wget [2] to build the WARC fairly trivially. [3]  Then I'd suggest getting a dlang account and make an item [4] out of it.  Just make sure it's set to mediatype:web and it should get ingested by Wayback.

After that?  Generate a WARC when a paste is made and use the dlang S3 keys to add it to the previous item (or maybe just do it daily or weekly so as to not stress the derive queue too much).  I'm pretty sure that's all that's needed.

-Wyatt

[0] http://fileformats.archiveteam.org/wiki/WARC
[1] https://archive.org/help/abouts3.txt
[2] -i,  --input-file=FILE   download URLs found in local or external FILE.
[3] http://www.archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget
[4] https://blog.archive.org/2011/03/31/how-archive-org-items-are-structured/
February 08, 2016
On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu wrote:
> Dpaste currently does not expire pastes by default. I was thinking it would be nice if it saved them in the Wayback Machine such that they are archived redundantly.
>
> I'm not sure what's the way to do it - probably linking the newly-generated paste URLs from a page that the Wayback Machine already knows of.
>
> I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when the WM does not see a link that is search for, it offers the option to archive it) obtaining https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.
>
>
> Thoughts?
>
> Andrei

I'm not sure if the wayback machine should be used for version control, if you want to keep a history of your past I suggest using a gist.github.com.

I view the wayback machine as a view for what the web used to look like not necessarily what information was in it.
February 08, 2016
On Monday, 8 February 2016 at 20:02:41 UTC, Jesse Phillips wrote:
>
> I'm not sure if the wayback machine should be used for version control, if you want to keep a history of your past I suggest using a gist.github.com.
>
> I view the wayback machine as a view for what the web used to look like not necessarily what information was in it.

I'm pretty sure that's Andrei's thought, too.  It's a pastebin; people use it to make web links to pasted things.  If it were to disappear, a lot of links would break very permanently because Heritrix has no way to index and crawl the site.

-Wyatt
February 09, 2016
On 2/8/16 11:44 AM, Wyatt wrote:
> On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu
> wrote:
>> Dpaste currently does not expire pastes by default. I was thinking
>> it would be nice if it saved them in the Wayback Machine such that
>> they are archived redundantly.
>>
>> I'm not sure what's the way to do it - probably linking the
>> newly-generated paste URLs from a page that the Wayback Machine
>> already knows of.
>>
>> I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when
>>  the WM does not see a link that is search for, it offers the
>> option to archive it) obtaining
>> https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.
>>
>>
>>
>>
>> Thoughts?
>>
> You want it in Wayback?  Sounds like you need some WARC [0]. Since
> anyone can upload to IA (using a nice S3-like API, even [1]), this
> should be pretty uncomplicated.  If you can get a list of all the
> paste URLs, you can use wget [2] to build the WARC fairly trivially.
> [3]  Then I'd suggest getting a dlang account and make an item [4]
> out of it. Just make sure it's set to mediatype:web and it should get
> ingested by Wayback.
>
> After that?  Generate a WARC when a paste is made and use the dlang
> S3 keys to add it to the previous item (or maybe just do it daily or
> weekly so as to not stress the derive queue too much). I'm pretty
> sure that's all that's needed.

That's intense. I think a simple page (or chained linked collection of
pages) containing links to all pastes defined would suffice. For example
consider defining dpaste.dzfl.pl containing a link to
dpaste.dzfl.pl/today.html. That would contain e.g. the links generated
today and a button "More" linked to dpaste.dzfl.pl/2016-02-08.html
(which would be yesterday). That in turn would contain links to
yesterday's pastes and a link to the day before etc.

My understanding is this is enough to have wayback archive all pastes.

> I'm pretty sure that's Andrei's thought, too. It's a pastebin; people
> use it to make web links to pasted things. If it were to disappear, a
> lot of links would break very permanently because Heritrix has no way
> to index and crawl the site.

Yah.


Andrei