I need to confess: I have been working with Endeca for the past 5 months. Initially, I was skeptical about yet another search engine. In the past, we had a custom Apache Lucene implementation, which exposed me to the core functionality of search engines:
- Display Index or Indexes.
I am not going to dive into the technical capabilities that Endeca provides on this blog. All data, including big data, is bound to be processed in the following manner Extraction, Transformation and Load (ETL). Endeca provides Clover as the de-facto tool for ETL. I am briefly going to describe a scenario that I ran into during my ETL process.
During the Load stage of the ETL process I ran into the following error:
Endeca requires a unique record identifier. The identifier is referred to as the ‘Spec Attribute”. Let me briefly define how Endeca sees a record.
A record in an Endeca Index is a collection of key/value pairs uniquely identified by the ‘Spec Attribute’. You can create a sequence which generates the Spec Attribute value if your data does not have a unique key . In my scenario, I was passing the Yahoo! GeoPlanet (What on Earth Id) woeid as the Spec Attribute.
The first time that I ran the graph, I saw the error. My next step was to edit the “Bulk/Add Replace”. I had defined the Spec Attribute as “Woeid”. I checked the “Reformat” activity and it could not be! I had fat fingered the value. I corrected the value to be “WoeId” and walla!