Stephan
The following interview of Google expert Philipp Lenssen comes from my soon-to-be released ebook (2009 edition) on power searching with Google — available free to Art of SEO newsletter subscribers (sign-up is to the right on the sidebar). It’s over 50 pages and written from a market research and marketing perspective. No prerequisite knowledge of Google or SEO is required of the reader, but it’s no beginner’s text. I think even the advanced SEO will aget a lot of value out of it (or your money back!).
Philipp is a fellow O’Reilly author (Google Apps Hacks) and Google Blogoscoped‘s founder/blogger. Without any further ado…
Stephan: For what sort of research tasks is a major search engine not well suited?
Phillip: I think a search engine can be part of any research, with three caveats.
First, you need to know how to appropriately evaluate the trust for a given page you stumble upon, as well for a given area of research. Any page you stumble upon in Google or others needs to be evaluated based on many criterias — a science on its own, parts of which become intuition after some time. Beyond that, there are areas of research where you need to be especially careful; like when you want to verify if a popular myth holds any truth, such as a quotation being attributed to someone, an anecdote relating to a famous person, an urban legend, or a “truth” where a lobbying group or popular political party has an interest in. With areas like these, even finding a dozen repetitions of that “truth” may not be enough for you to gain appropriate trust in it.
Second, researching in areas where you lack the specific words to describe the idea. In the age of Google, a keyword is also literally a key: without the key, the door remains locked. Imagine you would want to find the name of a painter of a given painting you came across a year ago. The mainstream search engines of today won’t let you take a pencil and draw what little you may remember of the painting, to then return similar images to you. There are some interesting developments in this area — I’m a big fan of TinEye.com, which lets you upload an image to find similar likely ones, and there are also search engines which let you whistle to find songs — but mostly, things are still keyword based.
Third, there are certain characters which Google and others ignore. For instance, Microsoft once released a programming language called C# (pronounced c-sharp). There are already other programming languages called C and C++. But how do you seperate these languages in Google, when Google ignores characters like “#” or “+”?
Added to these, there’s the challenge of being confronted with powers that be which may, every once in a while, work to make your research harder. It’s a more philosophical topic and not just a problem when using search engines. The basic communication approach of the powers is to hide, redefine, and distract. Let’s say John Doe is the leader of Doetonia, and he just got bribed with a big black suitcase filled with a billion Doetonia dollars. In utopia, informed citizens would be all over the place entering words like “john doe suitcase scandal” into the Google of their land. In the reality of our hypothetical example however, first of all, the scandal may not be known or if it’s known, it will be almost not covered (hiding!). Second, if the incident becomes known, the Doetonia press agency as well as other press houses which have an interest in the established system remaining stable can start to call it the “gentleman’s lapse of reason incident” or so, making it sound much more harmless — research this topic now using those words, and you may find some pros and cons in regards to the incident, but basically be stuck in a world defined by the Doe-establishment, because you’ve already started to use their words (redefining!). Third, if the incident becomes known and just reframing the issue won’t suffice, the Doe team and like-minded press can now create a red herring controversy in another area. Maybe they’ve just discovered that the pet cat of Ms. Doe has died in a car accident because there was no fence around the Doetonia leader house, and this becomes the discussion of the day, and all Doetonians jump to search engines to research the name of the cat, or whether the country should make fences mandatory for every citizen owning a pet cat (distraction!).
Stephan: For what sort of research tasks is Google not well suited, but another major search engine is? Please specify which search engine(s) you turn to in such occasions.
Philipp: TinEye.com as mentioned above is one. But Google itself got a lot of areas covered through its special search engines. I often go to Google Sets (labs.google.com/sets), for instance, because it lets me find new words for a given topic. You can enter e.g. “superman” and “batman”, and it will expand the list by returning “hulk”, “wonder woman” and the like. A sort of advanced version of this tool is becoming available with Google Squared (www.google.com/squared). These tools can be helpful if you’re still lacking the words to describe a concept, or if you don’t yet have a good overview of a certain area of knowledge.
Knowing a couple of specialized search engines might be helpful for sure, but sometimes Google also releases a specialized search engine of their own which is superior to the original. Take Google Patent Search, which was much more accessible than the official US government’s patent search. The official government search site oddly enough uses images saved in a non-web compatible format, so you’ll have trouble looking at the patent illustrations at their site!
One research tool I can highly recommend isn’t really a tool at all — it’s paid Q&A site Uclue.com. You set your price, ask a question, and have one of the researchers get back to you. The Uclue researchers were formerly working at Google Answers, before that got shut down.
Stephan: What are your favorite Google query operators, and why?
Philipp: Google has become fuzzier over the years, meaning they more frequently ignore certain words or automatically list results they consider to be targeting the word “you really meant.” But sometimes, you might have actually meant what you’ve originally written. In these cases, the plus (“+”) search operator comes in handy again.
One operator I use a lot is [site:example.com foobar], where example.com is the site you want to search across, and “foobar” is the keyword. You can even throw [intitle:something] into this mix to restrict it to sites from example.com which have the word “something” in their page title… perhaps because such pages are of a different type, and you only want to find that type. Say, [intitle:buy-this-product]. Note that the site operator also works with subdirectories, so [site:example.com/archive/2009/] is an option, too.
Stephan: Besides www.google.com, what are your favorite Google-owned websites, and why?
Philipp: I use Gmail a lot. When I’m in countries where YouTube is accessible, I use that a lot (though at the moment I’m living in China, where I can’t access it). Other Google tools like Google Image Search and Google Maps are very useful, too. Google Images expanded its feature set a lot during the last year or so. Now you can search by color, search for faces only and more. You can also use the site: operator, as described earlier, in conjunction with a Google Image search.
Stephan: What’s your web browser start page set to?
Philipp: It’s always set to a blank page, as is my desktop wallpaper. I like minimalist interfaces, they help me to focus. When I do want to search at Google, I can type “s keyword” in the Firefox address bar, which is connected to a Google search for that keyword. Or, quite often, I type “google” and hit Ctrl + Return, which will complete the domain name to http://www.google.com and let me start my search from there.
Stephan: What are your favorite third-party applications that are based on Google?
Philipp: Google’s search and translation APIs are nice. You can write interesting tools on top of that. I recently looked into writing a text editor based on JavaScript, which would run locally inside the browser but still be able to open files (you can google for “Netpadd B” to see what I came up with). In that plain text editor, you can mark any piece of text, hit a certain shortcut, and a translation of that text into English will pop up. (Or, if the text is English to begin with, it will be translated into Chinese.) There are many interesting tools and sites out there making use of Google’s APIs and gadgets.
Stephan: How/where did you learn so much about Google searching?
Philipp: I guess Google’s query syntax first and foremost targets casual users, so instead of half a dozen tuning options you can type in a casual query and have it return meaningful results. All the things for which that type of search doesn’t suffice, I usually learned from Google’s help files, blogs covering the subject, or from researching when writing for Google Blogoscoped (a blog and forum covering all things Google).
Stephan: Any training or resources (besides the ebook of course!) that you’d particularly recommend to anyone wanting to become an expert Google searcher?
Philipp: I would suggest reading blogs like Google Operating System or Google Blogoscoped, and also keep track of some of the official Google blogs.
You can pick up a lot of tricks this way.
Stephan: How does one assess the quality or credibility of the information produced by the search and various sources? Any practical tips beyond the obvious “buyer beware” type of advice?
Philipp: A huge number of details is involved. For instance:
- Does the article or page contain a full author name? Does it contain a date?
- Does the author have an About page, perhaps even with picture and full address?
- Can I verify that the author is who he claims he is? Do I know the domain, can I trust it?
- What does the design look like?
- What’s the page’s PageRank? Not that there’s no scams on PageRank-5 sites or so, but usually, I would trust a PR7 page more than a PR0 page, simply because apparently it was around for some time and got linked. You just need to make sure this won’t be your only indicator of trust, because some scammers may know how to optimize their PageRank. It’s just part of the mixture of signals.
- How commercial is the field I’m moving into? Conversely, how much of a hobby effort is it? Instinctively, I think hobby efforts with a very small target group are less likely to be a scam. “Cool screensaver” would be a topic where I’d be very, very careful. “Peter’s postage stamp archiving program”, I might be more likely to install, if I find that I get to know Peter on his page and he seems to care about the subject.
- An .edu domain may emit some extra trust, but again, you need to be careful to not take any single parameter on its own.
- Is the article well-researched? Spellchecked? Are the ads surrounding it sensible, or just too much?
- Is there a conflict of interest at play? Am I being sold a product?
- Can I confirm the found data from other sources?
Stephan: What one piece of advice about using Google as a research tool should the reader retain, if they remember nothing else?
Philipp: Sometimes, using a particular word may be necessary to find the right set of pages, even if that word may be synonymous to what you’ve already searched for. Try to imagine how an author of your imaginary ideal target page might phrase their sentences. Perhaps, when describing that baroque painting you vaguely remember and of which you forgot the artist name, you would need to write “voluptuous lady” instead of “overweight woman” to find the right result.