Thursday, February 25, 2010

Answer: Only mollusk to walk on two legs?

The answer, much to my great surprise, is the octopus!

Here's how I found it.

I knew that the search terms "two legs" and "walking"  would be pretty common.  I can easily believe there are millions of pages with "two legs" somewhere on them, and that doing a search for [ mollusk walking two legs ] would just be hopeless.

But I figured that the people most likely to write about this kind of amazing phenomenon in the natural world are probably science writers.  Thing about it:  a mollusk that walks on two legs--it's not going to make the front page of the paper, but it would probably make the science page or an article in a nature magazine of some kind.

So my first (and only) query was [ bipedal walking mollusk

Sure enough, the first result is to a science media news report from UC Berkeley.  http://berkeley.edu/news/media/releases/2005/03/24_octopus.shtml

I was also encouraged when I noticed that there were several other hits on the page that mentioned octopus in the same snippet with walking.

The details:  I put bipedal first in the query because I wanted the search terms walking and mollusk to be next to each other.

Why did I do that?  Is it important that walking and mollusk be adjacent and in that order?

Well.. yes!  Most people don't know that word order DOES sometimes matter in web searches.  In this case, I was thinking that an article (in a science magazine or a university press report) would probably say something like:

     Blah blah blah walking mollusk blah blah blah... 


In other words, I was trying to anticipate what I thought someone else would have written about our mysterious walking mollusk.

You can try it yourself.  Try [ walking bipedal mollusk ], and the results aren't nearly as good.

By contrast, when I wanted to find a YouTube video of the two-legged walking octopus, I did NOT use the term bipedal because I don't expect YouTube descriptions to use such a technical term.  Instead, I just went to Google and did a query for [ YouTube octopus walking ] and that got me just what I needed.  

Key takeaway:  When you're searching for something slightly obscure, try putting yourself into the mindset of someone who would be writing about the thing you're searching for.  What would they write?  Then, choose the key terms from that, and use those terms as your initial search.

Search on!

Wednesday, February 24, 2010

Wednesday Search Challenge (Feb 24, 2010) - Searching for a mollusk

Time for a relatively straight-forward question and a comment.

The search challenge for today is:

     What is the only mollusk known that can walk on two legs?


When you answer this question, take note of how long it takes you from the moment you start trying to solve the question until the time you get the answer.  We call that amount of time the "Time To Result" -- and it's one way of measuring how effective you are as a searcher.

When I solved this challenge, it took me about 5 seconds.  How could I do it so fast?  It's mostly by practice and noticing how people write web pages.  When I think about what to search for, I first think about how people would write about the topic.

But more on this tomorrow, when I reveal the answer!

Monday, February 22, 2010

Why Control-F is the single most important thing you can teach someone about search

One of the biggest surprises of my research life was when I discovered that LOTS of people don't know about Control-F.  Here's the story...

I was doing a field interview with a schoolbus trainer.  She was doing her searches just fine, moving right along, when she was suddenly stymied by a fairly simple search problem.  She was looking for a particular section of the California Vehicle Code and found a fairly long page that clearly should have held the information, but it just didn't seem to be there. 

After about 5 minutes of fruitlessly scrolling up and down the document, I asked "What are you looking for?"  Her rather grimaced reply was "I'M SEARCHING FOR THE CODE..."  

It took me a while to realize that she was visually searching the entire lengthy document, line-by-line, to find the code.  

It was clear that she didn't know how to use the web browser's built-in "Find" function.  I was surprised because I'd thought that this was universal knowledge.  How could you browse the web daily, as she does, and NOT know about Control-F (or Command-F for Mac users)?  

Since then, I've done a fairly extensive survey of US internet users, and the answer still shocks me. 

90% of American internet users do NOT know how to use Control-F (or equivalent) to find a given string on a page.  (Sample size so far:  2,512.)  

Among school teachers, the average isn't much better--it's currently running at 50%.  

When I teach my internet search classes, I always ask, and I'm always surprised.  I just taught a class of twelve K12 teachers, and only one person actually knew how to find a word on the page. 

If YOU don't know, here's the best tip I can give you to improve your search skills:  

Control-F, or Command-F, lets you look for a string somewhere on the page.  This works in all browsers, and nearly every piece of software (e.g., Acrobat, Powerpoint, Word, Excel...)  It usually pops up a "find" box somewhere on the application that lets you type in exactly what you'd like to find, and then    

If you only learn one keyboard shortcut in your entire life, this should be it.  Knowing how to rapidly spot the word, phrase or substring you're looking for quickly will change the way you read texts online.  

Check out my ultra-short YouTube videolet on how to use Control-F in Firefox.  




It will change the way you read.  But more about that later... 

Friday, February 19, 2010

Finding specific kinds of files on the web -- the wonders of filetype:

It's sometimes really useful to be able to find a specific kind of file when doing a Google search.  To do this, we use the filetype:  operator.  (Sounds like a scary word, but "operator" just means "tool" -- the thing we put in the query to get the effect we want.)

For instance, you might want to find a Powerpoint presentation on a particular topic, say, the botanical structure of flowers. A good query for this would be:

     [ flower tutorial filetype:PPT ] 

Here, the part of the query  filetype:PPT limits the kinds of results to just Powerpoint files.  (Note that the capitalization doesn't matter.  I just put the file extension in capitals to make it stand out.)


I most often use filetype:  as a way to look for scholarly papers.  I do a fair bit of reading, and often find myself with the name of a paper by a particular author.  

Turns out that academics love to put papers out onto the web in Acrobat format.  That will give the file a PDF extension.  So when I'm looking for a paper by Richard Nisbett and Timothy Wilson, I'll do the query: 



... and that will give me a whole set of papers by Nisbett & Wilson to read.  This handy trick will often work when a paper is otherwise unavailable.. useful to know when you're in deadline mode.

You can combine the filetype operator with other operators to help clarify what would otherwise be ambiguous searches.  For instance, you can use double quotes to get Acrobat files that I've written.  To do this, use a query like:

     [ filetype:PDF  "Daniel M Russell" ] 

Why the double quotes?  Because it will look for those terms "Daniel"  "M"  and "Russell" spelled exactly like that and in that order.  If you take the quotes off of the search, you'll get a million other hits, most of which have nothing to do with me.  (So why didn't I put quotes around Nisbett and Wilson?  Because Nisbett is a very uncommon name.  Alas, Daniel, M and Russell are all super-common, so I used the double quotes to restrict the search JUST to me.)


The remarkable thing is that you can limit your searches to almost ANY kind of file you'd like.  Here's the deep, dark secret:  There's no magical list of file type extensions.... it can be anything you want to search for.

What that means is that if you're looking for a very odd, very strange kind of file, you can limit your searches to just that kind of document.  For example, TSV often stands for "tab-separated values" and usually indicates a data file where the values are separated by tabs (rather than commas, or some other special character).

     [ filetype:TSV data ] 

Would look for TSV files that also mention the term "data" -- a handy thing to know when you're scanning the web for some data sets.

Other handy file extensions to know about:

LWP  -- Lotus Word Pro (a word processor format)
XSL -- Microsoft Excel file
PDF -- Adobe Acrobat (often used for documents with special layout)
TXT -- plain text (usually the format for README files)
PS  -- Adobe PostScript files
MP3 -- audio file format
MP4, AVI, MOV -- video file formats


But, if you want to look for [ filetype:CRAZY ], be my guest.  Search on!

Thursday, February 18, 2010

Answer: What does extra-claustral mean?

The story here is that I was really looking up something about the history of books and the relationship between printed copies and how ideas of credibility (Do you believe what a book says, or not?) change around the time of the introduction of the printing press.

In any case, I ran into James O'Donnell's article on why printed books are here to stay, and found the following phrase:

  "I here count the friars and other extra-claustral religious orders..."


And while I had seen the word "claustral" before, I wasn't really sure how O'Donnell meant "extra-claustral."  Did he mean that it was "super-" claustral (with the prefix 'extra' meaning "an increased amount of")?  Or was he referring to a particular set of beliefs of the friars?  (Something like "extra-orthodox"?)  


Here's what I did to figure it out.  


First, I tried my usual Google trick of using the define: operator.  [ define:extra-claustral ]   I didn't think it would work because of it was a hyphenated word, but it was worth a try. 


Next I tried my backup method of just asking for [ define extra-claustral ] but that wasn't especially rewarding either.   But I DID notice that there were some definitions for just the word "claustral" (and even "claustrum," a variant I hadn't thought about).  


So I went back to the define: operator to do [ define:claustral ] and learned that the primary definitions have to do with a thin membrane in the brain. 


But the last definition mentioned cloister-like.  That sounded useful--I already knew that cloisters were part of the building where monks would be.  


(Of course, it should be mentioned that I tried the obvious Google search for [ extra-claustral ], but the results were either about the brain membrane or at places like OxfordJournals.com and JSTOR, neither of which are especially useful since they hide their contents behind paywalls--you can't get to them without paying. More on paywalled results another day.)  


I realized that what I needed to do was to see this word used in context across many different settings to see what the range of possibilities could be.  


So I went to Google Books and did a search for the term.  




Just reading the snippets told me a great deal.  (A "snippet" is the black text beneath each document hit that summarizes the searched-term in context.)  The snippets tell me that "extra-claustral" refers to buildings that are "outside of the cloisters"  AND lets me know that "extra-claustral" can also refer to activities that monks pursue that are not a part of their regular churchly duties.  Ahh... now it's beginning to make sense!


Your 4th grade teacher probably told you this--that meanings can often be figured out by context, and they were right.  


But there's one more trick to point out.  The web is full of wonderful resources that amplify our intelligence.  One such resource is Wordnik.com -- a site I turn to when I want to understand a word with a great deal of context.  


So I went to Worknik.com with just the word "claustral" and found the following:  




And now, understanding the prefix "extra" as meaning "outside of," the intent of O'Donnell's use of the term became clear:  "extra-claustral" in his sense was "activities outside of the regular scope of the church's work."  


Between Wordnik's wonderfully selected quotations providing a richer set of interpretations, and Google Books finding the exact term in many actual instances of use, I came away with a deeper understanding of what the article was all about.  Search on. 



Wednesday, February 17, 2010

Wednesday Search Challenge (Feb 17, 2010) - Finding a definition

I was doing a bit of reading this morning about the history of monasteries, monks and early libraries when I ran across this word:

     extra-claustral 

Can you figure out what it means?  (And no, it's not a misspelled word.)

I'm interested in both the 'straightforward' meaning of the word and the ways in which it's used metaphorically.

Tomorrow I'll write a bit about the obvious strategies for searching out the definitions of words, and how to cope when the obvious methods don't quite work so well.

Thursday, February 11, 2010

Answer: How long was the Haiti earthquake from Jan 12, 2010?

When Scott asked me this question, it seemed pretty straightforward, but it turned out to be fairly subtle and complicated.  And there are two major points I want to make:  (1) finding the answer took a little digging, and (2) I had to learn a few things along the way.

I'm going to walk you through my chain of thinking while I was researching this question, not because it's brilliant, but because you'll see how I thought about it, and pick up what works and what doesn't.

Start:  I began with the obvious query,  [ Haiti earthquake duration ]  -- I also tried [ how long was the Haiti earthquake ] -- but neither of these gave me great results.  I was a bit surprised by this as I'd figured it would be a pretty simple thing.

So I started checking out the USGS.gov site because I knew that they have detailed records of earthquakes.  My next query was  [ Haiti earthquake site:USGS.GOV ]  

And that's when I realized that this wasn't going to be simple.  I looked at a bunch of pages at USGS and quickly learned about fitting models to earthquakes, tensors for representing earthquakes and a bunch of very technical abbreviations for things that I had to keep looking up.

For example: the quake occurred on the EPGFZ.  (That's the Enriquillo-Plaintain Garden fault zone for us non-seismologists.) I had to learn that acronym in order to figure out what I was reading.  I also picked up that the quake was considered a shallow quake--but it was around 13km deep.  That's shallow?  Clearly, I have a lot to learn about earthquakes (and I've lived through a bunch of big quakes).

I also found great summaries of the quake and the events on the  EPGFZ from USGS (see: http://earthquake.usgs.gov/earthquakes/eqarchives/poster/2010/20100112.pdf  for a nice summary chart about the geophysics of the earthquake)

But I wasn't getting an answer to the duration!

One strategy to use in questions like this is to try and get the original data for yourself.  I know that seismographs record earthquake data, so I thought I might be able to get the raw data from a sesmiograph.  Hence, my next query:  [ Haiti earthquake seismograph ]

This also didn't work too well.  When I looked through the results, I found lots of seismographs, but no charts or real data I could inspect.

Hmmmm...  Where were the earthquake charts I'd been expecting?

As I read one description of a seismograph that had recorded the quake, I saw the word "seismogram" and realized what a mistake I'd been making.  A "seismograph" is the device that records quakes, a "seismogram" is the RECORDING of the quake.

So my query searching for a seismograph was looking for the instrument, not the data!

New query:  [ Haiti earthquake seismogram ] -- and I started finding the charts I was expecting.  Boston College (http://bcespquakes.wordpress.com/2010/01/12/351/) has a nice chart, but not a lot of time details.

A few results down is the REV (Rapid Earthquake Viewer) site run by the University of South Carolina Sesmiology Department.   Now I was getting somewhere!

It was pretty quick to navigate their site and drill down to the Haiti earthquake of Jan 12, 2001.   (Yeah, one of the skills of a good searcher is being able to figure out what deep web information a given site has and use their controls to get to what you really want).

In a couple of clicks I got to a nice compilation of seismograms from the Univ. South Carolina REV site:  http://rev.seis.sc.edu/earthquakes/?eq_dbid=3286658

It's easy to see that the farther you get away from Haiti, the earthquake shock gets spread as the pressure waves propagate through the earth.  Ideally, to see the duration of actual on-the-ground-shaking you'd like to see the sesimogram from nearby.  In this chart you can see they list eleven charts from nearby (the Domincan Republic) and far away (Christmas Island).

If you look at just the seismogram below, you can see that the intense shaking is for around 10 seconds.


On the other hand...if you look at this next chart, you can see that the main shocks are for about 10 seconds, but then the ground keeps shaking at a lower level for quite some time.  I added the red dotted lines below to estimate the "normal background" shaking.  

So the question now has been transformed:  How do you define the end of earthquake shaking?  




Uh oh.  No WONDER people don't talk about the duration of the earthquake.  There are many factors that influence how long you feel the earth shaking.  Blog commenter Robin found this great FAQ from USGS that explains all of these different factors in great detail.  

Then, to make matters worse, I also found this chart on Wikipedia:  


                    

Earthquake in Haiti 2010, main shock and after shocks between 12. January and 29. January
with magnitudes larger than 4.0, data from USGS



This chart on Wikipedia from Bezur shows the incidence of fairly large aftershocks for quite some time immediately after the first earthquake.  This just makes the point even clearer--when do you say "the earth stopped moving"?  


Search Lessons:  As I said above, there are multiple lessons to learn from this challenge.  

(1)  Don't get discouraged by all of the arcane and confusing terminology you might find on the way.  Embrace it!  Revel in it!  Realize that as you learn arcane and hyper-specific terms, you're also learning how to find future items in this domain with great precision.  For example, I now know about the EPGFZ and the difference between a seismograph and a seismogram.  Next time I do an earthquake search, I'm going to be a little faster. 

This leads to my first Law of Search:  When searching, expect to learn a lot about the domain of interest.  You need to become an instant expert on the topic you're searching.   

(2) Be certain of your search terminology.  I got hung up on the "seismograph" vs. "seismogram" difference and wasted a bunch of time looking for the wrong thing.  This happens to everyone.  The trick is to not let it stop you. Pay attention to the clues in what you're reading and learn to follow the information scents you pick up along the way.  (I would have SWORN that I was looking for a seismograph chart.  Wrong.)  

(3) Learn to use the site's own features and tools.  Many sites (particularly technical ones like these) have lots of tools for drilling down into their data.  The search engines can't see into the underlying data, so it's up to you, the searcher, to use the tools they give you to "search within" the site.  

(4) Expect your problem to shift as you learn more.  I learned that "duration" is a slippery concept for earthquakes, and I changed my search goal to one of understanding what makes such a simple idea (earthquake duration) so difficult to define.  

But, for Scott, the answer is "around 10 seconds for the initial earthquake."  He's in the 4th grade, so that's a good answer for him.  

Next year we'll talk about aftershocks and differential pressure wave conductance velocities.... 


Wednesday, February 10, 2010

Wednesday Search Challenge (Feb 10, 2010)

This is a true story.

While watching the Superbowl (yes, the very same one that featured the fantastic bhel from earlier), my young friend Scott, age 9, asked me a question that sounds easy, but turns out to be more sophisticated than I would have expected.  His question is today's Search Challenge.

How long was the Haiti earthquake? 

Sounds easy, right?  Just do a search for [ Haiti earthquake ] and poke around for a bit.  (I'll tell you now that this doesn't work.)

Scott asked me this question because both he and his Mom both couldn't find the answer.

Tomorrow I'll show you my solution, and why this is a particularly complicated (and interesting!) search.

Tuesday, February 9, 2010

Strategies of thought: When do you search for something? When do you stop?

I’ve been thinking… What causes someone to start searching?  When do you get motivated enough to actually stop whatever you’re doing now and expend the energy to look up something?

I got to thinking about this the other day when I looked up the phrase “Revolution is Not a Dinner Party.”  You with a richer reading of history will recognize this as a well-known phrase from Mao’s Little Red Book. 

But I didn’t know that.  It was just the title of my daughter’s book she was reading for school.  I stopped what I was doing and did a search because… because… I’m not really sure why....  I think I did it because the phrase struck me as slightly non-standard.  It’s not a typical title, there was something just a bit odd about it.  (Think about it: most juvenile books of this genre are given titles like “Never a Princess” or “A Wrinkle in Time.”  Titles aren’t typically six words long.)

Now, I’m by nature a curious person, which simply means (when you think about it) that I will, more often than not, spend the energy to look things up.  I even always carry a small notebook with me so I can write down items for checking-out when I get to a computer. 
Bottom line: something was funny/odd about that title, so I looked it up, and found that it’s from a fairly well-known long Mao quote about why the pursuit of the revolution will cause collateral damage to people and institutions.  The idea fits perfectly with the idea of the book (which is about a young girl growing up during the Cultural Revolution). 

In this case, I wasn’t sure what I’d find—it really was just a scratch to satisfy a curiosity itch. 
More often, I’ll find myself looking for a specific piece of information.  This week I kept track of a few of them.  At different times I was looking for good dive sites in Puerto Rico, the population of Uttar Pradesh, and a short history of the development of ice skate blades. 
But then there were the other times I did some research to answer curiosities that aren’t so neatly defined: What is the political relationship between Kashmir and Pakistan?  Or…  How are slime molds related to fungi? 

Early this week I looked for a table of rainfall in the Bay Area with a sub-hour time resolution.  I found that data set in order to draw a chart to convince myself that rain in Seattle falls in a different pattern than rain in San Francisco (I found the data, charted the rain and found that it is different).

These questions weren’t hard.  But sometimes there’s no good answer.  This week there was the fruitless hour looking for the artist of a pointillist picture that a friend found in Paris.  Each time I was searching for something in order to understand the world… or to be able to make a convincing argument… or to write a paper… or… just to fill in a perceived gap in my understanding. 

Curiosity is boundless; the question really is, how much energy, time and skill will you devote to tracking down something? 

Or, as I tried to say at the start—what makes you curious enough to search in the first place?  What are the clues that make you think there’s something interesting enough to look up? 
I admit that I don’t really know the answer to that.  I do know that some people have a predisposition for thinking (in personality theory, that’s called the “Need for Cognition”).  And that people with high NFC tend to think that searching for information is intrinsically valuable and that looking for it is enjoyable. 

So there’s that first itch—a need for information.  You look something up to answer a specific question, or to get something done, or to fill in a gap.

But why? 

When I did the “revolution” search, I didn’t know ahead of time what I’d find.  Often you do—you know the answer will be something very specific: a time / a date / a place / a fact / a plan / an image.  You’re looking for something to answer a question or complete a part of the story.  Search often begins when something breaks down—when you can’t answer a question or explain how something works to your satisfaction. 

However, search is tricky when you don’t know what you’re looking for…  When you’re counting on recognition of the solution  to kick in, it becomes an inexact process at best.  “I know the answer to this question must exist.  I believe it’s somewhere on the web.” 
And search is even more tricky when you don’t know that there’s a question lurking in your thoughts or implicitly in what you’re reading.

So, here’s the deep insight:  Search begins by the recognition of a question, and ends when you recognize the answer… or at least a plausible answer.

Of course, there are many ways to fall off the path along the way.  You might think you recognize the answer, but it turns out to be the answer to some other question and not the answer to your question at all. 

This is probably the biggest error I see searchers make-settling for AN answer that looks plausible, but then really isn’t.  The real trouble is that people often don’t know enough about a topic to estimate whether or not something is a realistic answer and they stop searching too soon.  (Ironically, when you know the least is when you need search the most!) 

In study after study, we ask people the answer to fairly simple questions; questions that have answers that are clear and straightforward.  Yet we nearly always get a large range of answers, often spanning an impressive range.  Example:  When we ask “What’s the distance from Earth to the moon?”  we’ll get different responses.  Why?  Because searchers don’t check their answers.  Just as validating your arithmetic was important in 6th grade, so too is double-checking your search results, especially for answers that you can’t fact-check with a back-of-the-envelope computation. 

This seems dumb, but it’s true.  Just getting an answer isn’t the end of your search—it’s just the middle.  Unless you know otherwise, assume that the answer you see is just one possible answer.  The search engines work hard to bring you accurate, up-to-date information, but despite appearances, they really can’t read your mind.

So people begin search out of curiosity, but end when they’ve found something they believe is the answer… or when they’ve tried hard and long enough. 

As I said, I’m a curious person, so I have a high tolerance for failure while searching. Most people don’t: they’d like the answer as quickly as possible.  So they tend to stop searching when the desire to know is exceeded by the pain and time of continuing to look. 

But I know that you can often find what you seek, if only you know how and where to look.  As luck would have it, most of the searches people do can be satisfied by the results on the first page or two of results.   

If what you seek is a bit more obscure, you’ve got to make the call—when do you give up?  When do you shift strategies?  And this is where expertise shows up.  If you’ve already searched for the distance between the planets, you’re likely to believe that you can look up the distance from the Earth to the moon.  But while the distance from Kashmir to Pakistan is easily measurable, the political intrigues between the two aren’t so simple. 

Ultimately, practice with framing searches gives you the sense of what will work and what won’t.  Facts, dates, showtimes, chemical composition, weights and measures—you know that’ll work.  


But political analysis?  It’s tricky and will need you to synthesize information over multiple sources.   You can find a single document that will explain it all to you, and if you’ve done this before, you’ll know it’s guaranteed to be written from a limited perspective.  You’ll have to keep looking—it’s the way this kind of search needs to be done. 

Like all politics, to be an effective searcher, you’ve got to know when to stop. 

And I believe that point is now.  More on switching strategies later.  

Monday, February 8, 2010

Spelling is still problematic (or, Why the Superbowl commercial is more complicated than you think)

Yesterday on the Superbowl broadcast, Google showed a wonderful commercial about Paris, falling in love, getting married, moving to France and having a baby.

14 seconds into the commercial, you can see the query  translate tu es très mignon ], which works nicely.

However, since then, lots of people are trying  translate tu es tres mignon ]   (Note that they're using the American spelling of très, using a regular "e" instead of "è"  (which, for the record, is an e with an accent grave).  

The only problem is that when you do the "tu es tres mignon" Google thinks you're searching in Spanish!  So Google shows a translation from Spanish to English, and the translation output is the French phrase, which isn't particularly helpful.  It makes a certain kind of sense:  Google can't hear that you would have said "tray" (the French word for "very") instead of "tres" (as in the Spanish word for three).  


     UPDATE 11AM:  Well, it used to earlier this morning.
       Google's fixing this even as I write; which is another great object lesson--
       things change on the web constantly.  


This is one of those cases where the spelling really does matter, and while Google's spelling suggestion system is extremely good, in many non-English language cases, these subtle distinctions are important.  


For my part, I went to the home of some Indian friends to watch the Superbowl and feast on Indian snacks.  (No chips and guacamole here!)  Instead, we had something called "bell"... or that's what I thought I heard.


Naturally, I looked it up when I got home and found... NOTHING!  There are lots of hits on Indian bells (bronze or silver), and the use of bell peppers in Indian cuisine, but this wasn't helping.


Then I remembered that it sounded a bit like there might have been an "h" sound in the word.  Perhaps it was "behl" -- but that's also a terrible search result as well.

As a last resort, I tried [ bhel Indian food ] and that worked perfectly.  It's a wonderful mix of puffed rice, sev (lentil noodles) and wheat crackers. Mix with diced warm boiled potatoes, top with a melange of finely cut tomatoes, minced onions, cilantro and chillies. Drizzle fresh cilantro chutney/pesto and tamarind chutney.


You've got to love the multinational Superbowl.... especially when you get the spellings correct.



Friday, February 5, 2010

Searching for data and datasets

One of the glorious things about the internet is that it has made it very practical to share source datasets.  Among other things, this means you can reproduce analyses from data that is of historical interest (such as Galileo’s observations) or of deep personal interest to you.

The simplest way to start looking for datasets to analyze is to add the term ‘dataset’ to your query.  If you’re interested in getting your hands into statistical analysis, search for [ online dataset ] and you’ll have more content than you’ll know what to do with.  For instance:

http://lib.stat.cmu.edu/DASL/ - has datasets and stories about statistical analysis, with nice datasets ranging across a number of topics from archaeology (ancient Egyptian skull development over 4000 years) to nature (acorn size as a function of location), to zoology (wild horse population statistics)

http://exploringdata.net/datasets.htm  Draft lottery data, Galileo’s experimental data, Old Faithful eruptions, world population statistics…

If I were teaching math and statistics in high school or college, this wealth of data would be a major game-changer.  You can demonstrate methods and analyses on tiny, demonstration datasets, and then let the students loose to look at moderately sized (and very interesting) data.  With the amount of data content online, every day can be a new science fair!

Let me illustrate this with a story.

It’s been raining in Palo Alto a lot recently.  And I noticed (for about the thousandth time) that… 

Question:  The rain in Palo Alto seems to come in pulses, averaging 10 to 15 minutes per pulse, then a period of quiet, then another pulse.  Is that true? 

Answer:  To test this hypothesis, I grabbed two datasets for comparable rainy days in January, 2010. 

Seattle data is from:
 http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KWASEATT116&graphspan=day&month=1&day=11&year=2010

Palo Alto data is from:
http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KCAPALOA14

Both weather stations sample rainfall in 5 minute samples.

To find this data, I did a search for [ rainfall data palo alto ] and found that one of the first results points to Wunderground.com, which features links to local weather stations, nearly all of which provide data feeds in a CSV format.  It was quick to find a Palo Alto weather station that had the right kind of rainfall data. And from there, it was a simple step to do the same for Seattle.  (I did other cities as well, but that’s another story.)

So why didn’t I search for “dataset”?  Because a dataset is usually a curated collection of data; that is, it’s been collected, usually cleaned up and ready for use.  I chose to search for ‘data’  because I wanted a raw feed of data from an instrument.  The National Weather Service has datasets, but I wanted the data straight from the provider, hence the subtle shift in my query.

I downloaded the CSV data from each of my two cites:  Jan 11, 2010 for Seattle, and Jan 19, 2010 for Palo Alto). 

I then imported both CSVs into a Google Spreadsheet, and made a moving average of 3 samples (to smooth the curves a bit and get around sampling uncertainties). 

The observation is pretty clear:  Palo Alto weather tends to come in bursts much more than Seattle weather. 

Note that the blue (Seattle) line is VERY different than the obvious pulses in the red line (Palo Alto).




Once it starts raining in Seattle, it tends to keep raining.  The stereotype of Seattle rain is true! 

By contrast, if it's raining in Palo Alto, you can wait 15 minutes until it clears.

(At least on these two days.  For a real test of statistical significance, you'd have to do a bunch more days, testing the average behavior, etc etc.  For advanced classes, this is a great way to talk about hypothesis testing, measurements of significance, etc.)

The point remains:  As a culture, we now have easy access to more data than ever before.  If you search for it, you can often test your own ideas about what’s going on—you don’t need to wait for the intermediaries to give you their version of the story.  Check it out yourself.

Search on!

Thursday, February 4, 2010

Chaining searches / how kids search / teaching kids how to search

Yesterday's search challenge wasn't much of a challenge. The simplest search solution I know is to first find the Gladwell article on basketball with a query like:

[ Gladwell basketball ]

The first result will be the New Yorker article.  Clicking then on that article you'll quickly find that the CEO we're looking for is Vivek Ranadivé.  The next search should be obvious:

[ Vivek Ranadivé  ]   

And you will quickly see that Vivek Ranadivé is the CEO of Tibco.

As I said, this isn't that hard of a challenge.  All you had to do was to:

1.  Find the Gladwell article and skim it, looking for the coach's name.
2.  Search for the coach's name, then look for the company name. 

The reason I wanted to start off with a simple problem like this is that this "chaining searches" together to solve a problem is a fundamental skill--it's something you actually have to learn at some point in your life. 

When I go out to teach at elementary and middle schools, I'll often find that kids will have a hard time creating their own chain of reasoning like this, at least until they're in the 6th or 7th grades.  It doesn't seem to be a matter of intelligence as much as practice in working out the search chain. 

In her work at the Univeristy of Maryland, Allison Druin (and colleagues) often use "The Vice-President's Birthday" problem to assess this chaining skill. 

The problem is this: 

"On what day of the week (Sunday, Monday, Tuesday...) will the Vice-President's birthday be next year?"

Again, it's not that hard: 

1.  What's the vice-president's name?
2.  Once you have the name, lookup his birthday (day-month-year). 
3.  Once you have the date, search for a calendar for next year (that is, this year + 1).
4.  Find the date, read off the day-of-the-week. 

But figuring out that this is the sequence of steps you must go through to figure out the weekday is just a bit over the heads of many 7-to-11 year olds.   As Druin (et al.) say "Despite experience with searching, children tended to fall back on... natural language queries..."  and points out that  "Frequently those natural language queries were the verbatim questions asked by the researcher." 

To the eye of an expert searcher, this sounds crazy--but we frequently see people in our studies searching for whole phrases that repeat much of what you have asked them to find. 

However, the skills of seeing to the essential core of a question and divining search terms is a real skill, and there is a fair bit of evidence that adults basically do much the same thing. 

So, teaching a kid how to search effectively is partly one of showing how to choose search terms (and almost nothing about Boolean ANDs or ORs).  and partly one of how to break up a complex problem into a sequence of easily achievable substeps.  Those two skills interlock, as you can see in my favorite kid query for this problem:  [ vice-president birthday Sunday ]   He then went on to repeat this query with Monday, Tuesday, Wednesday... He was good-natured about it and gave the wry explanation that "well... if his birthday IS on a Sunday, then I'll find it.

Maybe.  But it's not a good strategy to follow.  A much better way is to devise a search that will yield an answer that can be used in the next step, and not to test all of the available options.  (Imagine if I'd asked "when was the last year of the Civil War?"!) 

More on how to good search strategies in times to come. 

--
Allison Druin (et al.) paper on how kids search is available at her site at University of Maryland:  "How Children Search the Internet with Keyword Interfaces."   (This will be published at the CHI 2010 conference later this year.) 

Wednesday, February 3, 2010

Wednesday Search Challenge

I usually teach search as a hands-on skill.  While I could teach you the theory of search, nothing really beats doing it and feeling the success with your searches as they pass through your fingers.

So I like to give puzzles, little search problems that are real problems that I've seen people in the wild have while they're searching.

Going forward, I'll post a little Challenge problem from my field studies each week.  You'll have 24 hours(or so) to work on it before I post a solution.  If you solve the problem before then, and you have a particularly clever solution, send me mail at DanSearchChallenge@gmail.com  (DO NOT post a comment until after I've put up the solution)

Search Challenge #1

I was talking with a friend at Peet's coffee the other day, and he mentioned that there was a great Malcolm Gladwell story in some magazine about how effective the full court press is in basketball, and we wondered why everyone doesn't use it in all their games.

My friend also said that the article was about a Silicon Valley CEO who taught the full court press strategy to his daughter's 12-year-olds basketball team, and won everything in sight!  Tha'ts also amazing: a Silicon Valley CEO with the time and devotion to coach a kids' basketball team!  (Some things are right in the world.)

So we were wondering:  


Who IS the Silicon Valley CEO (that Gladwell mentions in his story) who coaches girls' basketball AND what company is he/she the CEO for?

Comment:  This is a fairly easy challenge (maybe a 1 minute problem), but there's a reason I'm starting with a simple problem like this.  More details and commentary tomorrow.  

Tuesday, February 2, 2010

Time control - changing the time view of your search results

If you want to get the sense of a city, especially some place as active and vibrant as New Orleans (but feel free to substitute your favorite city here), it's often useful to look at the city through different lenses.

One great lens to use is TIME.  Did you know you can search for results filtered by different time boundaries?

For example, if you want to get a sense for New Orleans, you could do the obvious web search [ New Orleans ] and get back decent results.



Now, to see different time-slices of search results, click on the "Show options..." button:



This shows you the Toolbelt.  And down in the middle of the Toolbelt is a set of options for looking at search results in different slices of time.



Any Time is what you normally see on web search results.  The results are the best out of ALL the web that Google could find.

Latest are the results that are just now coming into Google.  That means they're mostly from social media postings (such as Twitter or other user feeds), but results also come in from news organizations.

Past 24 hours means just what you think.  These are the best of the results posted during the last day.

And Specific date range lets you select a time over which you'd like to see the results.  Here, I've put in the day before and the day after Katrina hit New Orleans in 2005.



This time-slice feature is great for searching for specific date topics (e.g., Katrina), but also useful for getting the sense of what's going on in and around the city.

For instance, if you search on [ New Orleans ] and click on the "Past 24 hours" button:


And you'll see what the current topics are for the city and what news outlets generate content repeatedly during the week.  (Apparently, the local football team made it into the Superbowl!)

There's one other timeslicing feature to note.  Farther down in the Toolbelt is the Timeline view option.



Which gives you a timeline view of New Orleans over time.


Obviously, the web didn't exist back at the time of that big peak in the 1860's, so what's going on?

Google has indexed a huge volume of newspapers, magazines and books from that era.  Each article is tagged with an "about date" (meaning, this is the time that the article is about).  Newspaper articles, for instance, always have a publication date--and this is what you're seeing here.  Something major happened in the 1860s (but I'll leave it to you to figure out what that was), and you see it here.

On the chart you can see the peak of Katrina stories on the right hand side.  That right-most peak is in 2005, when Katrina came through.  Note that you can click on the timeline to drill down and see those time periods in more detail.

So if you want to understand a topic, consider looking at it through different times--Recent gives you the up-to-the-second news, while Timeline can give you a great historical perspective.  All points in between are open  for searching as well.

Monday, February 1, 2010

When translations fail, you can often search for just the right word

  
Every once in a while I need to translate something from one language to another.  I assume you already know how to translate using Google’s Language Tools (if not, do a search on [ Google language translate ] and you’ll find a bunch of things there). 

Yesterday I found myself translating a Jorge Drexler song from Spanish to English using Google Translate.  And, it mostly worked pretty well.  A few idiomatic expressions don’t come across perfectly, but that’s to be expected.  A few culture-specific references also need  a bit of thought (e.g., when Drexler mentions “the bulb” / “la bombilla” he’s not talking about tulips, but the straw used to drink mate, the typical tea made from yerba mate in South America). 

As you’d expect, looking up these things are pretty straightforward. 

But every so often the translation fails in an odd way.  Here’s an example from Drexler’s song, “Guitarra y vos” (Guitar and you): 

   como tampoco hay                                                   as there is no
   guitarra sin tecnología,                                              guitar without technology,
   tecnología del nylon para las primas                          technology from premium nylon
   tecnología del metal                                                  metal technology
   para el clavijero,                                                       for the headstock,
   la prensa, la gubia y el varniz,                                    press, gouge and varnish,
   las herrmanientas del carpintero.                               the carpenter's herrmanientas


This isn’t a bad translation until you get to the last line.  What happened? 

If you’re a Spanish speaker, you might not even notice the typo. But the translation system doesn’t know what to do with the word “herrmanientas”—and I, as a semi-literate Spanish reader don’t really know either. 

But if you just pop that word into Google as a search term  [ herrmanientas ] then the spell-checker will kick in and offer up a spell-correction.  In the last few translations I’ve done, this has always worked really well.  And in this case, it suggests the correct word—herrmamientas. (Note that there's no "n" character in this word!)  

So the REAL word is “herrmamientas” – or “tools” in English. 

Pop that in place, and you’re good to go.