Systems like Solr, elastic search and endeca (out of the box) all assume relevance means keyword frequency in a product page, with some weighting depending of title, description, tag, etc. Delivering relevant results that users might want to purchase requires taking these systems, adding or customizing their NLP techniques, operationalizing historical user search & purchase data to determine intent, personalizing by shopper history, etc.
The challenges of massive heterogenous catalog affect other areas... Chief among them search result personalization… an individual’s gaming purchase history might cause ‘button down’ to return gaming keyboards, rather than oxford shirts, while a pet products purchase history could lead to a search for turkey returning turkey dog food.
The fact that Amazon fails to personalize search results is evidence of the difficulty & opportunity here. The sort of pervasive personalization found in AirBnb, facebook, google are simply out of reach of most ecommerce retailers…