As we start 2012, we want to highlight two major, new features.

  1. Discovery Tag: Our on-demand indexing engine allows you to automatically add content to your search index. You may make use of our on-demand indexing by placing our Discovery Tag (available on the Get Code page in the Admin Center) on your web pages.
  2. Themes for results pages: Our new and improved theme-based results pages help you create a customized look and feel "out of the box." You may view the themes by adding a new site in our admin center. We're also in the process of migrating all customers into this new user interface.

Below are the details for January and, as a reminder, we divide our work into three categories.

  1. Features: Things you actually notice.
  2. Chores: Back-end improvements that you don't notice.
  3. Fixes: Fixes to any code issues that may arise.

Features for Searchers

  • Searchers get better PDF result coverage in results.
  • Searchers only see indexed documents that belong to the site's domain collection.
  • Searchers see MedlinePlus (health topics) and Enhanced Agency GovBoxes, when customers opt to display them via the admin center.
  • Searchers see nicer paging.
  • Searchers don't see Indexed Document GovBox.
  • Searchers can see all Indexed Document results.
  • Searchers on template-based results pages see a nice, matching shaded header with rounded corners.
  • Mobile searchers see colors from selected theme.
  • Searchers see square fade on related searches and other modules.

Features for Agency Customers

  • Customers see new Domains page.
  • Customers have an easy way to pass multiple parameters.
  • By default, customers have a useful header on results pages during signup.
  • Customers may point at a dynamic sitemap generator for hosted sitemaps.
  • Customers may set up sites with keyword filtered results.
  • Customers see an updated left nav in the admin center.

Chores

  • Migrated USA.gov's "enhanced results" Agency and MedlinePlus to GovBoxes.
  • Refactored search classes.
  • Removed website URL from the Add New Site wizard.
  • Shortened unique index for type-ahead suggestions.
  • Upgraded Solr from 1.4 to 3.5.
  • Changed timestamp indexing for Solr to use a trie on NewsItem model.
  • Modified stopword analysis to include Solr CommonGrams filtering.

Bug Fixes

  • Fixed Spanish translations for indexed documents.
  • Fixed display error when search results contain Bing matches and indexed documents.
  • Fixed error received by customers when migrating to themes.
  • Fixed RSS feed readers to handle 503 errors more gracefully.
  • Disabled ActiveRecord observers during database migrations.
  • Removed rule to allow customers to add top level domains (e.g., .gov and .mil) as site domains.
  • Fixed parsing of PDF files.
  • Fixed tests around Bing result counts.
  • Preserved newlines on virtual header/footer/css fields when updated via admin center.
  • Fixed doctype for nested PDF documents.
  • Fixed various problems around discovery of embedded PDFs.
  • Created error page for searchers when the search per-page query parameter is set to blank.
  • Fixed issue where hosted sitemaps still existed for domains with no indexed documents.
  • Weeded out indexed documents for anything that doesn't work with URI.parse() method.
  • Handled race condition where duplicate id/content hash pair gets past Rails but is caught in MySQL.
  • Handled race condition where an IndexedDocument can't be updated because another one was created.
  • Removed assumption that pages are PDF because the URL ends in ".pdf."
  • Selected "Everything" on results page by default.
  • Removed body field from the CSV download of IndexedDocument URLs
  • Fixed infinite loop in PDF discovery and indexing.
  • Verified IndexedDocument URL length is <= 2000 rather than accepting it and silently truncating it.
  • Used document body for documents with a blank meta description tag.