LLLL.com Where Less is More!
Welcome to LLLL.com Sidebar

Real-Time Search

10/07/09 5:01 PM

We hear a lot of talk about real-time search and how real-time search is going to play a bigger role in the future.. What we don’t hear much about however is that real-time search is already here. To begin this article, let’s go over what exactly real-time search is. I’ll focus on discussing real-time search engines earlier on in this article and will conclude the article with a piece on search engine algorithm history to present the past and present challenges that search engines face and the problems a real-time search engine will be faced with.

Relevant Search Results

When you use a search engine, what you’re viewing is content that has been previously indexed by the search engine. This content might have been indexed yesterday or it may have been indexed 10 years ago and been unchanged since then. Many parts of search engine algorithms currently favor older pages — an older page is likely to have accrued more relevant and trusted links for example than a newer page. Some search engines (eg. Google) even assign significant weight as a ranking factor to the age of a website, putting newer websites at a significant disadvantage for ranking well in the search engine results pages (SERPs). When we speak of a real-time search engine, we’re talking about a search engine that could not only deliver relevant results but also up-to-the-minute results for time sensitive topics. If someone typed something along the lines of “Yankees Red Sox score” (without quotes), they’re most likely interested in the score of the most recent baseball game between these 2 teams — I highly doubt they were looking for the history of the New York Yankees and Boston Red Sox rivalry article from Wikipedia which currently ranks #3 in Google. They’re most likely interested in knowing the final score of the game and perhaps the game’s highlights.

The challenge for search engines will be to determine which searches should return real-time results and which searches should return what are considered the best results according to their existing algorithms. If I do a search for “GoDaddy” on Google, it’s highly probable that I want to get to the GoDaddy.com website (I might have just seen their commercial on TV for example), and so news from GoDaddy or from other companies about GoDaddy probably isn’t what I’m interested in. On the other hand, if I do a search for “GoDaddy coupons”, I’m definitely looking for GoDaddy coupons that currently apply — not GoDaddy coupons that might have once been popular but are now expired. How do we find the balance between a search engine which includes little in the way of real-time results (eg. Google at present) and a search engine that contains only real-time results? Whoever can figure that out is sure to make a lot of money. An inconvenient solution in the meantime is to use real-time search engines for results you’d like the latest information on and traditional search engines such as Google, Yahoo, and Bing for results where results are unlikely to be time-sensitive.

Twitter has real-time search functionality, however it’s currently only from tweets on their microblogging platform — not exactly a real-time search engine at this time, however Twitter is working on that. There are some more complete real-time search engines:

  • Collecta
  • CrowdEye
  • OneRiot
  • Scoopler
  • Yauba

There’s unfortunately nothing on the market yet that’s comprehensive like Google yet returns results in real-time when necessary. The big problem with real-time search is that it would take an insane amount of computational power to keep visiting every website in the index every few minutes to keep everything up-to-date while still comprehensively analyzing the results to attempt to return the most relevant results. For now, it seems OneRiot seems like the best bet for real-time news. It’s unfortunate that it only covers pages linked to on social networks, thus missing out on topics which aren’t frequently discussed on social networks.

 

Search Engine Algorithm History

There are pros and cons to using a search engine algorithm with has on-page, link, trust, and age components. The obvious advantage is that it’s done a reasonably good job of keeping most spam blogs (splogs) from ranking high in the SERPs for competitive keyphrases (eg. the searches most people make when using a search engine). The disadvantage is that it’s led many people to try and game the system and as search engines increasingly clamp down on such behavior, it only makes it all the more profitable for those able to still manipulate search engine rankings, granted they now have less competition. Back when search engines were in their infancy, ranking well in the search engines was as easy as using the word many times on a page — this is where the whole notion of keyword density comes from. Search engines thought that a page which used a certain word 100 times on a page must be more related to that topic than pages which used it say, 10 times. This would likely be true if people weren’t trying to alter search engine rankings in their favor — it’s pretty hard to use most words 100 times on a page unless your topic is obviously related to that word or you’re willing to sacrifice human usability of your website by using the word over and over again in a nonsensical fashion. Obviously most people don’t want their website spam to come across as being website spam — what do you do when you come across a spam website? Most people hit the back button immediately and no money is made by the website owner who is often monetizing these websites through Adsense. So what came next were increasingly more creative ways of disguising spam in ways that detracted minimally from human usability. People would place keywords in the footer of their websites, hide additional keywords in a color which matched the background of the page so they couldn’t be seen (eg. if I wrote words in a white font, you wouldn’t see them on a white background), and then came the genius idea of serving humans and search engine robots different pages depending on which we identified them as being (also known as cloaking).

These are all highly frowned upon by search engines today and I wouldn’t recommend using any of these techniques unless you really don’t care about search engine traffic. There are plenty of other ways of spamming (such as title and h1 spamming, meta tag keyword stuffing, content scraping) however I really didn’t write this intending to provide a lesson on black hat SEO that works today and black hat SEO which worked in the past, so lets now move on to links and how they’ve been manipulated.

Search engines which don’t return results that visitors are looking for aren’t likely to stay popular for long — that’s the whole reason search engines are constantly changing their algorithms to make it increasingly difficult to unfairly gain a competitive advantage. Once search engines started cracking down on what I mentioned above, the next thing to be manipulated was links. Google’s PageRank algorithm used to play a very important role in the ranking of search engine results. The problem with PageRank is that the whole algorithm was based on links. Assuming a website is more important because it has more links or more links from authority websites is just as flawed as believing a page is more important because it stuffed a keyword 100 times into it’s 300 word page. Link farms were common even before Google gained popularity (most free directories are essentially link farms) due to Inktomi (a search engine which used to feed Yahoo search engine results) being heavily link-based and many observant webmasters exploited this. Many website owners would link up all their sites to each other so a brand new website could hypothetically have hundreds of backlinks from day 1. It was also common for website owners to hide links (using the same method I described above for keyword stuffing) or to stuff the footer with links. People would buy expiring or existing domains for their  PageRank and add links to their new website back to their website (still works with non-expired domains to a certain extent. AOL for example has 1800+ DMOZ links). There was guestbook spam, blog comment spam, wiki spam — all largely done to manipulate search engine rankings. Anchor text was another particularly bad one — Google used to weigh anchor text enormously in their algorithm (it’s still one of the elements with the most weight despite it’s abuse). As would be imagined, once people found out the importance of anchor text, they started making all links with keywords they wanted to rank for.

Imagine I wanted to rank for “SEO” in example — back in the past, a strategy that worked surprisingly well would be to include this word in all my links — for example over in my category section, I could add the word SEO to all the categories (even better if done in white so it doesn’t detract from human usability). Don’t try that today.. We have a similar problem today with anchor text and paid links. I’m surprised Google still puts so much weight on anchor text granted it’s so easily to manipulate. How often do you come across links in a site-wide links that contain keywords instead of the website’s name? That should be a dead giveaway that someone’s bought links. The best way to manipulate the rankings today (and it’s white hat!) really is to just get a domain name which has the keywords you plan on targeting in it. It took zero work to get this blog ranked #1 for “LLLL.com” and very little work to get ranked for “LLLL” — sending me about 1000 search engine referrals monthly between these 2 terms and derivatives of them such as LLLL.com prices, LLLL.com sales, LLLL.com price guide, etc. Obviously much more work will be required if you want to rank first for something with much more competition.

Trust is a more difficult algorithm element to manipulate, however people have even found a way around that by buying paid links on trusted websites — either with money or through other methods such as donations to charities, educational institutions, etc. Age in the index is in my opinion one of the dumbest algorithm elements ever. If we want relevant, accurate information, why would an older site necessarily be better than a newer site? I can understand the sandbox and am not saying 1 day old websites should  be ranked high in the SERPs for competitive keyphrases, however why is a 2 year old website not ranked as well as a 5 year old website? This has been the real failure of modern search engines — a site which is old and has lots of trusted links will outrank websites that are far better (take Wikipedia as an example)

[Post to Twitter] 

Related posts:

  1. Social Media’s Effect on Search Engines
  2. Real-Time Search and Social Media
  3. Should Google Be Regulated?

Posted by Reece | in Uncategorized, internet/advice |

4 Comments on “Real-Time Search”

  1. Shaun M. Says:

    Hi Reece,

    Here’s a few more for your list. I have also read that Google has something planned..

    topsy.com
    tweetmeme.com
    almost.at
    dailyrt.com
    twazzup.com
    friendfeed.com (comes with a built-in search feature)

  2. Reece Says:

    Hi Shaun,

    Thanks for taking the time to post them! :)

  3. Monika Lorincz Says:

    Regarding you real time search engine list, I would also add surchur.com, as it is one of the first real time search engines and has some unique features that separate it from the rest. Intriguing post, by the way. enjoyed reading it.

    Monika Lorincz
    http://surchur.com

  4. Reece Says:

    Thank you Monika :)

Leave a Reply

Advertisements

ad
ad