Wednesday, October 1, 2008

Integrating a Notes Connector database with Google Enterprise Search


By Colin Neale

If you read my previous article, you now know a little bit about how the Domino connector for the Google Search Appliance (GSA) came about. This week, I'll cover a couple of the connector's key features, and you'll see how the system interacts with the GSA to respect Notes database and document level security in search results.

The system is made up of two Notes databases: The Connector database and the Access Control database, which is optional. The Connector database manages the crawl and feed process, and the Access Control database can be used to handle authentication and/or authorization requests that are generated by the GSA during any secure search if required.

The Connector database: template/database model

Databases that you want to feed to the GSA are registered through the creation of a "database document", as shown in Figure A. These documents can be created one at a time, in bulk, or via a LotusScript API call.


Here's a screenshot of a database registration document. (click for larger image)

Each registered database is assigned to a "template profile". A template profile specifies which documents should be fed to the GSA using standard Notes document selection formula, like that shown in Figure B.


Here's a template profile for Domino.Doc, where you can see the selection criteria. (click for larger image)

Document selection criteria are key to managing your Notes content at the GSA. For example, take another look at Figure B above. This is a sample template profile for a Domino Document Manager (Domino.Doc) file cabinet. The selection formula reads:

Select ObjectType = "Document" & IsLatest = "1" & IsCurrentVer = "1" & (Version > 0 | Draft > 0) & ShouldBeDeleted <> "1"

In English, this is telling the system to only send current version and draft documents to the GSA. The crawler will ignore any documents that do not meet the criteria.

What's more, if a document that is inside the criteria subsequently falls outside as will happen to a V1.0 document when it is changed and checked back in as V2.0 then the V1.0 document will automatically be removed from the GSA at the point that the V2.0 document is added.

So, in a document management application where you would typically only want to present the current version of policies and procedures to your users via search, selection formula can be a very powerful tool indeed.

Equally, selection formula can be applied to any type of Notes database to exclude documents that you do not want to make available to the search index.