Skip to main content

The DNA of Search

The internet.  It's a big old place.  Full of stuff.  Files, stories, movies, music, pictures, news, reviews.  You name it, the internet has a virtual online version of it.  But how do you find what you want?  Via a search engine of course.

The search engine of choice is generally seen to be Google.  Obviously there are local variations to this, with Baidu in China for example and other more specialised engines such as ChaCha which focuses more on human analysis of the results instead of pure computational searching.   However, to generally get the most out of the internet you need to search, index and categorise what you want to view.

The basic idea behind a search engine is firstly for it to create an index of available web pages.  This index is created by automated robots or spiders, that crawl as many existing public web pages as possible, checking links and identifying the contents of the HTML pages to allow searches to be performed.

A user would then enter a list of keywords (sometimes combined with some operators such as AND, OR and NOT) to help explain what they are looking for.  The search engine scans it's index trying to perform a basic match.  The result set that the search engine returns, is then presented to the user.

Now this result set is the important part.  The result set could be quite small, in which case it's generally pretty easy for the person searching, to quickly validate and and discard any results which they deem to be inaccurate, inappropriate or just darn right bad.  However, in general, the result set will be too large to process by hand.  It could generally contain several thousands hits or sites that would need to be verified or ranked, based on their content.

Can you trust what you're looking for? (via

Most search engines will attempt to perform some basic ranking process.  The ranking could be based on using keywords that other humans have utilised programmatically over a period of time, or assigning values to index results such as the number of links within a site and so on.  Each search engine will have a proprietary way of ranking results data, which will result in different engines producing different results.

Many search engines will promote the idea of net neutrality which allows network services, responses and searches to be created unhindered and free from the likes of government, corporate or competitive interference.

But can a search engine be free from bias?  Many search engines utilise advertising to generate a revenue stream and do those advertise links cloud the true search result?  Google will identify a paid for link by tagging with the word 'sponsored' next to it to provide some clarity.

One other major form of search bias is based on previous user search history.  The idea behind this is to try and personalise the results set based on what the user has previously searched for and the subsequent websites they have clicked through to.  But this increased personalisation, whilst may have its benefits, starts to reduce the opportunity for new and random results.  The user becomes increasingly held within their own bubble of navigation and knowledge, not knowing what they don't know.

The main concern with such an approach, is that the end user has no real knowledge of the results ranking and parsing process, so they become unaware of other potentially valuable search results at their disposal.

It will be interesting to see over the coming years as the internet undoubtedly becomes larger and more diverse, whether search engine theory and the underlying ranking algorithms can become sophisticated enough to produce personalised content, whilst remaining open to the random and new.

Popular posts from this blog

Top 5 Security Predictions for 2016

It's that time of year again, when the retrospective and predictive blogs come out of the closet, just before the Christmas festivities begin.  This time last year, the 2015 predictions were an interesting selection of both consumer and enterprise challenges, with a focus on:

Customer Identity ManagementThe start of IoT security awarenessReduced Passwords on MobileConsumer PrivacyCloud Single Sign On
In retrospect, a pretty accurate and ongoing list.  Consumer related identity (cIAM) is hot on most organisation's lips, and whilst the password hasn't died (and probably never will) there are more people using things like swipe login and finger print authentication than ever before.

But what will 2016 bring?

Mobile Payments to be Default for Consumers

2015 has seen the rise in things like Apple Pay and Samsung Pay hitting the consumer high street with venom.  Many retail outlets now provide the ability to "tap and pay" using a mobile device, with many banks also offer…

Customer Data: Convenience versus Security

Organisations in both the public and private sector are initiating programmes of work to convert previously physical or offline services, into more digital, on line and automated offerings.  This could include things like automated car tax purchase, through to insurance policy management and electricity meter reading submission and reporting.

Digitization versus Security

This move towards a more on line user experience, brings together several differing forces.  Firstly the driver for end user convenience and service improvement, against the requirements of data security and privacy.  Which should win?  There clearly needs to be a balance of security against service improvement.  Excessive and prohibitive security controls would result in a complex and often poor user experience, ultimately resulting in fewer users.  On the other hand, poorly defined security architectures, lead to data loss, with the impact for personal exposure and brand damage.

Online-ification: The Role of Identity

The Wikipedia entry for Digital Transformation, "refers to the changes associated with the application of digital technology in all aspects of human society".  That is a pretty broad statement.

An increased digital presence however, is being felt across all lines of both public and private sector initiatives, reaching everything from being able to pay your car tax on line, through to being able to order a taxi based on your current location.  This increased focus on the 'online-ification' of services and content, drives a need for a loosely coupled and strong view of an individual or thing based digital identity.