Monday 26 March 2007

Yadabyte SEO Primer 4: Mommy, what's a search engine?

Guaranteed Spider Vision

Every night, while you are sleeping, spiders are crawling the World Wide Web. These "spiders" are computer programs sitting, en masse, in the server houses of the search engine companies. These spiders see the web in one medium and one medium only, and that is text. They don't see images or animations and they don't hear sounds. Now, you might be thinking, if you have read any of my other blogs, that I'm just showing off with the metaphor about spiders seeing in text here. Well, for once good reader, I am not. If you look at your web page like a spider you get a new insight into your site's level of optimization. But how...?

There is a free-to-use tool that lets you see like a spider: its called Lynx View. Take a look at the Yadabyte.com Website and now take a look at it as a web spider sees it with Lynx View. This is how Google sees it, more or less. Google adds extra layers such as if text is bold or not, but at heart, that's the Google view.

  • Try this: Look at your website's Spider View here. What you should see, when you read down from the top through the page, are words that most represent what your site is about. If you don't, then you don't need an SEO expert to tell you that your site is critically badly optimized and unless you like it quiet, you need to take action. (If your site is totally unoptimized, in a few minutes you can do more good for your SEO than paying Yadabyte Websites or anyone else to optimize it).


The Long and The Crawl of It


Much of the work that search engines do is looking at the internet with Spider Vision. They do it do it constantly in what are called Crawls.The way a Crawl works is like this:

  • The search engine has a list of websites. It gets these from the internet's Domain Name Servers (Google has its own DNS) as well as from its own historical database.
  • When it crawls it will start off on a website - or tens of thousands at once - look at it with spider vision and save what it finds in its database (more on this later). It will then follow any links into and out of the site to other web pages and record what it finds at each one of those. This creates an explosion of searching spiders. Its an explosion you want your sight caught up in as many times and from as many directions as possible.

In an ideal world the internet would be searched like this constantly and simultaneously, but the computational power and internet bandwidth isn't available for this at the current time. So short-cuts are taken. We will discuss this in a later, more advanced section, of the series, Crawl Frequency and Depth..


So, we have the idea of this spider that reads web pages by crawling across the internet, but that's only part of the search engine; most of the really clever and innovative stuff happens after the spider has saved what it sees into the search engine database.



The Search Engine Database


All the text-based data that the spiders find is saved in the search engine company's big database. This is composed of essentially two parts: the data and the software. The data is the index and the page content - all the stuff the spider finds. The software is the absolutely crazily complex algorithms and functions that are made on the data. As a professional software architect, I can assure you, in case you didn't already know, what goes on in the engines houses of Google is some of the most sophisticated programming ever.

What's a real paradigm shift in how this software works is that it is dynamic in both directions. The search engine process and results change with what's on the web, and they change depending upon who, where and when the search is made. The reality of a search engine is that nobody, not even the people at Google or MSN, can say what any results could be.


Where we are in this series.


  • So far we have seen the framework of what it's all about from an SEO point of view.
  • We have seen what web presence is and how to get an idea of your site's web presence.
  • We have seen how traffic entails a good search result and how good search results entail traffic.
  • And in this last section we have seen how a search engine works in a conceptual, and, I hope, useful way.

The conclusion to this part of the series is that the web is so huge and complex, as are the search engine databases and softwares that represent the web, that nobody, not even
Larry and Serge, can guarantee the results of any search. It's like the weather.

The rest of this series is about how to predict the weather better and, with a little effort, get a little more sunshine on your little patch of the internet. One thing's for sure, there will be more over-stretched metaphors.



SEO Primer Part 5 will be published shortly