The K-Medoids algorithm is a clustering algorithm similar to the K-Means algorithm. It partitions observations into clusters based on their proximity to the cluster center. However, instead of calculating means, K-Medoids uses medoids as the new cluster centers. A medoid is the center of a cluster with the minimal average dissimilarity to all objects in the cluster. This algorithm is more robust to noise and outliers compared to K-Means.

The K-Medoids algorithm works by alternating between two steps: the assignment step, where each observation is assigned to the cluster with the closest center, and the update step, where the new medoid is calculated as the center of the observations for each cluster. This process repeats until the assignments no longer change.

In the context of PAL (Predictive Analytics Library), the K-Medoids algorithm supports multi-threading, data normalization, and different distance level measurements. However, it does not directly support categorical data. To handle categorical attributes, they are converted into binary vectors and treated as numerical attributes. For example, if "Gender" is a categorical attribute with two distinct values (Female and Male), it will be converted into a binary vector with two dimensions (Gender_1 and Gender_2). The Euclidean distance between observations is then calculated using these binary vectors, with a weight (γ) given to the transposed categorical attributes to lessen their impact on clustering. The medoids of each cluster are updated using the traditional method.

Overall, the K-Medoids algorithm is a robust clustering algorithm that can handle numerical data and categorical data through data transformation.
------

SET SCHEMA DM_PAL;

DROP TABLE PAL_KMEDOIDS_DATA_TBL;
CREATE COLUMN TABLE PAL_KMEDOIDS_DATA_TBL(
	"ID" INTEGER,
	"V000" DOUBLE, 
	"V001" VARCHAR(5),
	"V002" DOUBLE
);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (0 , 0.5, 'A', 0.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (1 , 1.5, 'A', 0.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (2 , 1.5, 'A', 1.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (3 , 0.5, 'A', 1.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (4 , 1.1, 'B', 1.2);

INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (5 , 0.5, 'B', 15.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (6 , 1.5, 'B', 15.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (7 , 1.5, 'B', 16.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (8 , 0.5, 'B', 16.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (9 , 1.2, 'C', 16.1);

INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (10, 15.5, 'C', 15.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (11, 16.5, 'C', 15.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (12, 16.5, 'C', 16.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (13, 15.5, 'C', 16.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (14, 15.6, 'D', 16.2); 

INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (15, 15.5, 'D', 0.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (16, 16.5, 'D', 0.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (17, 16.5, 'D', 1.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (18, 15.5, 'D', 1.5);
INSERT INTO PAL_KMEDOIDS_DATA_TBL VALUES (19, 15.7, 'A', 1.6); 

DROP TABLE #PAL_PARAMETER_TBL;
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_PARAMETER_TBL(
	"PARAM_NAME" NVARCHAR(256), 
	"INT_VALUE" INTEGER, 
	"DOUBLE_VALUE" DOUBLE, 
	"STRING_VALUE" NVARCHAR(1000)
);
INSERT INTO #PAL_PARAMETER_TBL VALUES ('THREAD_RATIO', NULL, 0.2, NULL);
INSERT INTO #PAL_PARAMETER_TBL VALUES ('GROUP_NUMBER', 4, NULL, NULL);
INSERT INTO #PAL_PARAMETER_TBL VALUES ('INIT_TYPE', 1, NULL, NULL);
INSERT INTO #PAL_PARAMETER_TBL VALUES ('DISTANCE_LEVEL', 2, NULL, NULL);
INSERT INTO #PAL_PARAMETER_TBL VALUES ('MAX_ITERATION', 100, NULL, NULL);
INSERT INTO #PAL_PARAMETER_TBL VALUES ('EXIT_THRESHOLD', NULL, 1.0E-6, NULL);
INSERT INTO #PAL_PARAMETER_TBL VALUES ('NORMALIZATION', 0, NULL, NULL);
INSERT INTO #PAL_PARAMETER_TBL VALUES ('CATEGORY_WEIGHTS', NULL, 0.5, NULL);

CALL _SYS_AFL.PAL_KMEDOIDS(PAL_KMEDOIDS_DATA_TBL, "#PAL_PARAMETER_TBL", ?, ?);

