As a member of the Tableau Public team, I'm always on the lookout for new ways to gather data. One of my favorite tricks is using Google Docs to quickly create a spreadsheet. Copying and pasting web tables often works, but I find Google Docs to be more accurate in getting the right data in the right cell in my spreadsheet.
I've been looking into data about the financial crisis, so we're going to scrape the page that lists US bank failures in the US from 2008 to present.
- Sign in to Google Docs.
- Create a new spreadsheet.
- In the new spreadsheet, in cell A1, type: =ImportHtml("http://en.wikipedia.org/wiki/List_of_bank_failures_in_the_United_States_(2008%E2%80%93present)",
"table", 1)
There are three values in the parentheses after ImportHtml. First, the URL that you'd like to scrape. Second, the type of object you want to scrape. We're just going to work with tables. Third, a number that says which table you want - the first one, second one, etc.
- If you've pressed enter, cell A1 should look like this now.
What the ImportHtml command has done is pull in the first table from the Wikipedia page - the table of contents. However, that's not what we want. We want the data from the tables that list the bank failures for a year.
Go back to cell A1 and change the number at the end of the ImportHtml command from '1' to '2' and press enter.
Now you should see the list of banks that failed in 2008 starting with Douglass National Bank.

Voila, you've scraped a Wikipedia table in one fell swoop. To add the data for 2009, scroll down in the spreadsheet to the first open row (Cell A27) and paste the ImportHtml command. Change the last number from '2' to '3', press enter and you've got the 2009 failures too.
To get an Excel file that you can use with Tableau, go to File--> Download as --> Excel.
If you'd like to scrape another Wikipedia page with tables, just switch out the URL in the ImportHtml command, and keep changing the last number until you find the table you want. Happy Scraping.
Comments
Worked for me with ";" instead of ","
Thanks. ^^
Nice tip JWind. It works for me with ";" and ",". So both of these statements will work to pull in the bank failures table:
=ImportHtml("http://en.wikipedia.org/wiki/List_of_bank_failures_in_the_United_States_(2008%E2%80%93present)";
"table"; 2)
=ImportHtml("http://en.wikipedia.org/wiki/List_of_bank_failures_in_the_United_States_(2008%E2%80%93present)",
"table", 2)
Good catch Barbara! Thanks for keeping me honest. I've updated the text above to include the "=". Cheers, Lori
Very cool - except that you forgot to include the "=" in front of the ImportHtml function:
=ImportHtml("http://en.wikipedia.org/wiki/List_of_bank_failures_in_the_United_States_(2008%E2%80%93present)",
"table", 1)
Super idea !!!!!!
Barbara
Post new comment