Archive for January, 2009

New Scientific Research Available at DeepDyve

Monday, January 19th, 2009

One of the challenges of doing any sort of search on the Web is filtering out the noise.  In the case of doing research, the challenge is even larger as the information you are seeking is often very specific and authoritative.  A large focus for DeepDyve is in bringing these authoritative sources to you and we are pleased to announce the following content sources will be added to our index this month:

Welcome to DeepDyve

Thursday, January 15th, 2009

Greetings!  This is the inaugural post in our new blog that will be a forum for discussions about what’s on our mind – and yours – in the search and information industry.  The vision behind our company is that Search is in its infancy and today’s ‘traditional’ search engines meet only our most basic, albeit common, needs – the fat part of the long tail. However, doing research, looking for that needle in the long-tail haystack, requires a different approach.

In essence, good search of any kind comes down to the strength of the relevance algorithm, the quality and quantity of content to search against, and the ease of use of the product.  Whereas most search today is often narrow, ‘single-concept’ type queries (‘Obama’, ‘restaurant review’, etc.), research queries are inherently ‘multi-concept’ as the user wants to find broad and deep information about a complex subject (for example, starting with “pancreatic cancer treatment options, side effects, current research, clinical trials”; all the way to “effectiveness of selenium for pancreatic cancer”).

This has several implications:

  1. Relevance: there’s so much information on the Web that searching against all this data inevitably produces tremendous noise.  This problem is further exacerbated by search algorithms which use techniques such as PageRank which prioritize results based on popularity.  When you combine these factors, you end up with millions of hits where the top results are irrelevant and you have to sift for hours to find the hidden gems.
  2. Content: as mentioned above, the web is littered with content.  Unfortunately, for research purposes, the vast majority of content are news articles, amateurish and/or commercial.  Researchers want access to the best available information from authoritative sources ideally spanning general, introductory information all the way down to highly-specialized, peer-reviewed literature.
  3. Short/long queries: single-concept queries can easily be described in 3 words or less unlike multi-concept queries where researchers would like the flexibility of describing exactly what they are looking for in a natural language, or alternatively letting “the content be the query” by using a relevant article as an example for finding more articles related to that subject.  In addition, they would like to be able to search across multiple subject areas and not have to recreate their efforts from vertical to vertical.

Our mission at DeepDyve is to build a research engine that is focused on solving these challenges.  Our vision is that the paradigm of ‘research’ is evolving as more individuals take advantage of the incredible wealth of information potentially available to them not just from content sources but also from each other.  The challenge will be how to effectively make this information, and each other, easily accessible. We look forward to sharing our ideas with you on our vision and hearing your thoughts and suggestions as we embark on this journey.