Sunday, February 22, 2009

B2B Product Classification -- UNSPSC


B2B Product Classification
Classify Customer’s Product with UNSPSC


Introduction
Internet and Web technology starts to penetrate many aspects of our daily life. Its importance as a medium for business transactions will grow exponentially during the next years. B2B market places provide new kinds of services to their clients. Simple 1-1 connections are getting replaced by n-m relationships between customers and vendors. However, this new flexibility in electronic trading also generates serious challenges for the parties that want to realize it. The main problem here is caused by the heterogeneity of information descriptions used by vendors and customers. Intelligent solutions that help to mechanize the process of structuring, classifying, aligning, and personalizing are a key requisite for successfully overcoming the current bottlenecks of B2B electronic commerce.

Content Management in E-Commerce

B2B market places are an intermediate layer for business communications providing one serious advantages to their clients. They can communicate with a large number of customers based on one communication channel to the market place. One of the major challenges is the heterogeneity and openness of the exchanged content.
Therefore, content management is one of the real challenges in successful B2B electronic commerce.

Product descriptions must be structured. Suppliers have product catalogues that describe their products to their potential clients. This information should be made on-line available by a B2B market place. A typical content management solution provider has several hundred employees working in content factories to manually structure the product information. In the worst case, they take printed copies of the product catalogues as input.
Product descriptions must be classified. At this stage in the content management process we can assume that our product information is structured in a tabular way. Each product corresponds to an entry in a table where the columns reflect the different attributes of a product. Each supplier uses different structures and vocabularies to describe its products. This may not cause a problem for a 1- 1 relationship where the buyer may get used to the private terminology of his supplier. B2B market places that enable n-m commerce cannot rely on such an assumption. They must classify all products according to a standard classification schema that help buyers and suppliers in communicating their product information. A widely used classification schema in the US is UNSPSC1 (for details about UNSPSC, please visit http://www.unspsc.org/). Again it is a difficult and mainly manual task to classify the products according to a classification schema like UNSPSC. It requires domain expertise and knowledge about the product domain. Therefore this process is costly, however, a high quality is important to ensure maintainability and visibility of product information.
Product descriptions must be re-classified. Bottlenecks in exchanging information have led to a plethora of different standards that should improve the situation. However, usually there are two problems. First, there are too many “standards”, i.e., none of them is an actual standard. Second, mostly, standards lack important features for various application problems. Not surprisingly, both problems appear also in B2B electronic commerce.
UNSPSC is a typical example for a horizontal standard that covers all possible product domain, however, is not very detailed in any domain.


Implementation
Customers Products can be categorized according to UNSPSC via two methods
1) Artificial Intelligence
2) Human Intelligence
Artificial Intelligence should be used to mark the member products into appropriate UNSPSC commodities assuming that 50% of the markings will be wrong.
After this, Human intelligence will be used to mark the products one by one to the appropriate UNSPSC commodities. For this purpose an application (UNSPSC Browser) will be made to support the BO, Web marketing & Loyalty departments to easily integrate the Products with UNSPSC.
Artificial Intelligence
There are several industry practices to classify B2B portals customer’s products into UNSPSC categorization. Where these classifications are necessary for the B2B organizations to build their strategies these classification exercises is very expensive and time consuming. Some industry tools available for these classifications are


PIM Accelerator: http://www.zoomix.com/classification.asp
PIM Services
http://www.enventureinc.com/index.php/product-information-management.html
Golden Bullet
http://www.sti-innsbruck.at/about/business-development/projects-events/
http://excogito.nl/gb/index.jsp


After some research we found that these tools (like Golden Bullet) use different AI algorithms to classify products.
Like KNN, (Nearest Neighbour), Vector Space Model, Naïve Bayes, and all of them have prons and cons respectively.
These AI programs use data for training and then classify a product which still produces 60%-70% of accurate results, and then the vendor provides an application to support manual product classification for 100% accuracy.
We have used the same approach i.e. we have used Extended Topic Based Vector Space Model to classify our member products, for data training we have used WordsNet.
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications.

For etvsm implementation we have used Java open source API Themis. Themis is an opens source Java API that uses PostgreSQL as the database and has implemented the ETVSM Model via stored procedures, these stored procedures are accessed via the Themis Java API provided at http://code.google.com/p/ir-themis/


How our AI classification works
Our AI classification uses two approaches to categorize the 23,000 (approx) products into UNSPSC.
1) Relational Searches
2) Extended Topic Based Vector Space model.


Step 1: Search Product Base Keyword with commodities, if exactly matched with one row returns then it assigns that commodity.


Step 2: Search Product Title with commodities, if exactly matched with one row returns then it assigns that commodity.


Step 3: Loose Search Product Title with commodities, if exactly matched with one row returns then it assigns that commodity.


Step 4: Loose Search Base Keywords and Keywords with commodities, if exactly matched with one row returns then it assigns that commodity.


If none of the above four steps returns the commodity the ETVSM model is being used to search the product commodity


Step 5: Search Base keywords and Keywords into etvsm and get the highest predicted commodity and move the other predicted commodities to Product suggestions.


Step 6: Search Product Title into etvsm and get the highest predicted commodity and move the other predicted commodities to Product suggestions.


Step 7: Search Product Base into etvsm and get the highest predicted commodity and move the other predicted commodities to Product suggestions.


These product suggestions are used for our UNSPSC browser application, to let the users easily classify the product if the AI engine has wrongly predicted the commodity.


Results
Total Genuine Premium Products: 23,112
Total Commodities successfully assigned to Products= 22,768
Commodities assigned Via AI (ETVSM)= 14,035
Commodities assigned Via Relational Searches= 8,733
Total Un successful Cases: 344
.Total Time to load UNSPSC commodities, train engine and classify products= 180 Minutes

Keywords:Vector Space Model, Extended Topic Based Vector Space Model