Common-cost: classification via Support Vector Machine

For the common-cost set-up, we consider a classification scenario [NoBü11]. In this example, a linear model is trained by maximizing the distance of the separating hyperplane from the training points. The complete code of this example is given at the end of this page.

Problem formulation

Suppose there are \(N\) agents, where each agent \(i\) is equipped with \(m_i\) points \(p_{i,1}, \ldots, p_{i,m_i} \in \mathbb{R}^d\) (which represent training samples in a \(d\)-dimensional feature space). Moreover, suppose the points are associated to binary labels, that is, each point \(p_{i,j}\) is labeled with \(\ell_{i,j} \in \{-1,1\}\), for all \(j \in \{1, \ldots, m_i\}\) and \(i \in \{1, \ldots, N\}\). The problem consists of building a classification model from the training samples. In particular, we look for a separating hyperplane of the form \(\{ z \in \mathbb{R}^d \mid w^\top z + b = 0 \}\) such that it separates all the points with \(\ell_i = -1\) from all the points with \(\ell_i = 1\).

In order to maximize the distance of the separating hyperplane from the training points, one can solve the following (convex) quadratic program:

\[\begin{split}\min_{w, b, \xi} \: & \: \frac{1}{2} \|w\|^2 + C \sum_{i=1}^N \sum_{j=1}^{m_i} \xi_{i,j} \\ \text{subject to} \: & \: \ell_{i,j} ( w^\top p_{i,j} + b ) \ge 1 - \xi_{i,j}, \hspace{1cm} \forall \: j, i \\ & \: \xi \ge 0,\end{split}\]

where \(C > 0\) is a parameter affecting regularization. The optimization problem above is called soft-margin SVM since it allows for the presence of outliers by activating the variables \(\xi_{i,j}\) in case a separating hyperplane does not exist. Notice that the problem can be viewed either as a common-cost problem or as a cost-coupled problem(refer to the general formulations). Here we consider the problem as a common cost, with the common objective function equal to

\[f(w, b, \xi) = \frac{1}{2} \|w\|^2 + C \sum_{i=1}^N \sum_{j=1}^{m_i} \xi_{i,j}\]

and each local constraint set \(X_i\) given by

\[X_i = \{ (w, b, \xi) \mid \xi \ge 0, \ell_{i,j} ( w^\top p_{i,j} + b ) \ge 1 - \xi_{i,j}, \text{ for all } j \in \{1, \ldots, m_i\} \}.\]

The goal is to make agents agree on a common solution \((w^\star, b^\star, \xi^\star)\), so that all of them can compute the soft-margin separating hyperplane as \(\{ z \in \mathbb{R}^d \mid (w^\star)^\top z + b^\star = 0 \}\).

Data generation model

We consider a bidimensional sample space (\(d = 2\)). Agents generate a certain number of samples (between 2 and 5) for both labels. For each label, the samples are drawn according to a multivariate gaussian distribution, with covariance matrix equal to the identity and mean equal to \((0,0)\) (for the label \(1\)) and \((3,2)\) (for the label \(-1\)). The regularization parameter is set to \(C = 10\).

Complete code

examples/setups/svm/launcher.py
###############################################################
# COMMON-COST Example
# Support Vector Machine
#
# Each agent has a certain number of randomly generated points, labeled 1 or -1.
# The points are generated by agents according to a multivariate normal distribution,
# with different mean and covariance for the two labels.
#
###############################################################
# Executed Algorithm: Constraints Consensus
###############################################################

import dill as pickle
import numpy as np
from mpi4py import MPI
from disropt.agents import Agent
from disropt.algorithms import ConstraintsConsensus
from disropt.functions import Variable, SquaredNorm
from disropt.utils.graph_constructor import binomial_random_graph
from disropt.problems import Problem

# get MPI info
NN = MPI.COMM_WORLD.Get_size()
agent_id = MPI.COMM_WORLD.Get_rank()

# Generate a common graph (everyone uses the same seed)
Adj = binomial_random_graph(NN, p=0.03, seed=1)

np.random.seed(10*agent_id)
#####################
# Problem parameters
#####################

# parameters of gaussians
mu = (np.array([0, 0]).transpose(), np.array([3, 2]).transpose())
sigma = (np.eye(2), np.eye(2))

dim = mu[0].shape[0]  # dimension of sample space

# number of samples (min 2 max 5 for each label)
nsamp = (np.random.randint(2, 6), np.random.randint(2, 6))

# regularization parameter
C = 10

#####################
# Generate problem data
#####################

# points
points = np.zeros((dim, nsamp[0]+nsamp[1]))
points[:, 0:nsamp[0]] = np.random.multivariate_normal(mu[0], sigma[0], nsamp[0]).transpose()
points[:, nsamp[0]:] = np.random.multivariate_normal(mu[1], sigma[1], nsamp[1]).transpose()

# labels
labels = np.ones((sum(nsamp), 1))
labels[nsamp[0]:] = -labels[nsamp[0]:]

# cost function
z = Variable(dim+1+NN)
A = np.zeros((dim+1+NN, dim))
A[0:dim:, :] = np.eye(dim)  # w = A @ z
B = np.zeros((dim+1+NN, 1))
B[dim+1:dim+NN+1] = np.ones((NN, 1))  # xi_1 + ... + xi_N = B @ z
D = np.zeros((dim+1+NN, 1))
D[dim] = 1  # b = D @ z
E = np.zeros((dim+1+NN, 1))
E[dim+1+agent_id] = 1  # xi_i = E @ z

obj_func = (1/2) * (A @ z) @ (A @ z) + C * (B @ z)

# constraints
F = np.zeros((dim+1+NN, NN))
F[dim+1:dim+NN+1:, :] = np.eye(NN)

constr = []
for idx in np.arange(F.shape[1]):
    constr.append(F[:, idx][:, None] @ z >= 0)

for j in range(sum(nsamp)):
    constr.append(float(labels[j]) * (points[:, j].reshape(2, 1) @ (A @ z) + D @ z) >= 1 - E @ z)  # j-th point

#####################
# Distributed algorithms
#####################

# local agent and problem
agent = Agent(
    in_neighbors=np.nonzero(Adj[agent_id, :])[0].tolist(),
    out_neighbors=np.nonzero(Adj[:, agent_id])[0].tolist())
pb = Problem(obj_func, constr)
agent.set_problem(pb)
# instantiate the algorithm
constrcons = ConstraintsConsensus(agent=agent,
                                 enable_log=True)

n_iter = NN*3

# run the algorithm
constrcons_seq = constrcons.run(iterations=n_iter, verbose=True)

# print results
constrcons_x = constrcons.get_result()

print("Agent {}: {}".format(agent_id, constrcons_x.flatten()))  # save information

if agent_id == 0:
    with open('info.pkl', 'wb') as output:
        pickle.dump({'N': NN, 'size': NN+dim+1, 'iterations': n_iter}, output, pickle.HIGHEST_PROTOCOL)
    with open('objective_function.pkl', 'wb') as output:
        pickle.dump(obj_func, output, pickle.HIGHEST_PROTOCOL)

with open('agent_{}_constr.pkl'.format(agent_id), 'wb') as output:
    pickle.dump(constr, output, pickle.HIGHEST_PROTOCOL)
np.save("agent_{}_seq.npy".format(agent_id), constrcons_seq)
examples/setups/svm/results.py
import numpy as np
import matplotlib.pyplot as plt
from disropt.problems import Problem
import pickle
import tikzplotlib

# initialize
with open('info.pkl', 'rb') as inp:
    info = pickle.load(inp)
NN = info['N']
iters = info['iterations']
size = info['size']

# load agent data
sequence = np.zeros((NN, iters, size))
local_constr = {}
for i in range(NN):
    sequence[i, :, :] = np.load("agent_{}_seq.npy".format(i), allow_pickle=True).reshape((iters, size))
    with open('agent_{}_constr.pkl'.format(i), 'rb') as inp:
        local_constr[i] = pickle.load(inp)
with open('objective_function.pkl', 'rb') as inp:
    obj_func = pickle.load(inp)

# solve centralized problem
global_constr = []
for i in range(NN):
    global_constr.extend(local_constr[i])
global_pb = Problem(obj_func, global_constr)
x_centr = global_pb.solve()
cost_centr = obj_func.eval(x_centr)

# compute cost errors
cost_err = np.zeros((NN, iters))

for i in range(NN):
    for t in range(iters):
        cost_err[i, t] = abs(obj_func.eval(sequence[i, t, :].reshape((size, 1))) - cost_centr)

# compute max violation
vio_err = np.zeros((NN, iters))
for i in range(NN):
    for t in range(iters):
        xt = sequence[i, t, :].reshape((size, 1))
        max_err = np.zeros((len(global_constr), 1))
        for c in range(len(global_constr)):
            max_err[c] = global_constr[c].function.eval(xt)
        vio_err[i, t] = np.max(max_err)

# Plot the evolution of the local estimates
# generate N colors
colors = {}
for i in range(NN):
    colors[i] = np.random.rand(3, 1).flatten()

# plot cost error
plt.figure()
plt.title('Evolution of cost error')
plt.xlabel(r"iteration $k$")
plt.ylabel(r"$|f(x_i^k) - f^\star|$")

for i in range(NN):
    plt.plot(np.arange(iters), cost_err[i, :], color=colors[i])

# # plot violation error
plt.figure()
plt.title('Maximum constraint violation')
plt.xlabel(r"iteration $k$")
plt.ylabel(r"$max_j g(x_i^k)$")

for i in range(NN):
    plt.plot(np.arange(iters), vio_err[i, :], color=colors[i])

plt.show()

The two files can be executed by issuing the following commands in the example folder:

> mpirun -np 30 --oversubscribe python launcher.py
> python results.py

References

NoBü11

Notarstefano, G.; Bullo, F.: Distributed abstract optimization via constraints consensus: Theory and applications.