Using the Wayback Machine

Locating, a website which has “vanished” from the Internet

Afrolumens Project by George Nagle

Now that Pennsylvania probate records have been made available online, it has rekindled my interest in tracking my Bigham and Richey lines from Drumore Twp., Lancaster Co., Pennsylvania.

One thing I wanted to investigate further was information previously uncovered: both the Bigham and the Richey families of Lancaster County were slaveholders in the latter part of the 18th century. A note in my database reminded me that I had acquired this information from the Afrolumens Project, a website by George F. Nagle. This fantastic site once documented information on slavery in the state of Pennsylvania. Information previously obtained from had been transcribed data. Now I wanted to learn whether images of original documents had been posted online in the interim.

So, I typed in the URL (Uniform Resource Locator, i.e. web address) for the site into my Google Chrome browser and immediately noted a problem. The site couldn’t be found at
Other sites were still referencing Afrolumens, but nothing came up from Afrolumens. A little more sleuthing indicated that the site had been taken offline late 2007. How could that be? My research log indicated I had last accessed the site in November 2010. Even more sleuthing indicated the site went back online some time in 2009. Regardless, it was clearly unavailable today. That’s when I remembered something called the “Wayback Machine”.

The Wayback Machine is one part of the Internet Archive, a non-profit entity established in 1996 to build an Internet library. One of its stated goals is “to prevent the Internet…and other “born digital” materials from disappearing into the past”. [1] One way it does that is by making cached copies of crawled web pages available via the Wayback Machine. As of today, more than 150 billion web pages have been archived. [2]

While the Wayback Machine is an invaluable tool to assist in locating websites that are no longer available, it is not a complete snapshot of every page on every website from 1996 to the present. From the FAQ, it seems that there are four criteria which must be met before a website will appear in the Wayback Machine archive.

1.  The site must be publicly available (i.e. no user login required);
2.  The site must have been online a minimum of six months;
3.  The site must be well-linked to from other sites;
4.  Robots.txt must not exclude crawlers from indexing the site’s content. [3]

Another caveat to using the site is that it is not keyword searchable. You must know the URL to navigate the Wayback Machine at present. With that aside, let’s search for an archived copy of the Afrolumens website.

First, navigate to the Internet Archive website using your favorite web browser. Your result will look something like this:

Internet Archive Main Page

The web page is a bit busy, but you’ll find the Wayback Machine near the top, in the middle of the window.

Wayback Machine

Now, we’ll type in the last known URL for the Afrolumens site:

This is our result showing that the website was crawled by the Wayback Machine 103 times between 2002 and 2011. A calendar is displayed with the last year the site was crawled displayed first, in this case 2011. The selected year is shown in yellow on the timeline strip near the top of the screen. Each of the blue circles on the calendar below the timeline represents a website snapshot.

Afrolumens Crawls in 2011

Click on a blue circle to bring up the cached version of the website as it existed on that date. Here’s what the Afrolumens main page looked like on 22 Jul 2011:

From here, you can navigate to other available cached pages by clicking the arrows, or by placing your cursor directly in the timeline strip. However, I’m interested in learning whether images were ever uploaded to the “Slavery in Pennsylvania” section of the site, so that’s where I’ll go next. This is the resulting page:

Note that as we get deeper into the Afrolumens site hierarchy, there are less snapshots. This part of the site was only crawled 65 times, and this 25 Nov 2010 snapshot represents the last time this part of the the site was archived by the Wayback Machine. You’ll also note that some of the images are missing (probably linked to from another site). I’m interested in drilling down to Lancaster County, Pennsylvania, so I need to click County index.

I continue to explore the archived site, but by this time it is apparent that no images of original records have been uploaded, as this cached image dates to the same time when I last visited the site and recorded the results of that search in my research log made in November 2010.

What sites will you explore using the Wayback Machine?


[1] Internet Archive >About IA: [ : last accessed 03 Jul 2012]
[2] [ : last accessed 03 Jul 2012]
[3] Internet Archive FAQ > [ : last accessed 03 Jul 2012]