New Tricks: Data Mining With Google Spreadsheets

March 22, 2010

Happily, I stumbled across the following link:

Now You Can Mine Data With Google Queries Too

The interesting bit is below the comic where they actually reveal a method I hadn’t thought of: 

Using a query embedded in Google Spreadsheets to mine and graph data in Google’s engine.

While the actual instructions are terse, I was able to get things up and running by visiting the actual example, and then copying and pasting the individual cells for examination.

Here is the blow by blow:

First, decide what you want to mine.  One of the examples is for income, we will use this one.

Open up Google spreadsheets and in cell A2 put (complete as printed here):

=””””&”I make $”&B2&” per year”””

[NOTE:  WordPress jacks up the quotes, so you are going to have to replace all of the quotes in the above with double quotes, or it won’t work!]

Initially it is gonna look like this “I make $ per year”.

Now in B2 put a dollar amount: 45,000.

You should see your number populate in B1 now.

Finally, the magic that actually gets the query info.

Put the following in C2:

=importXML(“http://www.google.com/search?num=100&q=”&A2,&#8221://p%5B@id=’resultStats’%5D/b%5B3%5D”)

[NOTE: Same problem here – WordPress tries to mess with the multiple quotes.  Replace all double AND single quotes manually and you will be fine, otherwise you will get an Error.]

After a brief load time you should see a number returned.  This is the number of returns that included your statement in cell A2.

Now copy and paste A2 and C2 down the line and change your values accordingly as you move down.

To create the graph, simply open “Insert->Chart” and choose your graph type.

To populate the graph with your data, make sure to clear the box right under “What Data?” and then click and drag down column C on your spreadsheet.  Make sure to remove Column C as labels.  You should see your data represented in the preview.

That’s it!  The world is now your oyster!  I can’t wait to apply this in some cases I am working on, I am still mulling over where this can be most useful, but the possibilities boggle the mind.