# Local linear integration guide for dimensionality reduction

The performance of any machine learning model is highly dependent on the quality of the data used to train the model. When the data to train the model is very large, it should be reduced in size based on several factors. For this, many methods are used to reduce the dimensionality of the model. Some of them work in the case of linear while others work when there are nonlinear relationships between the characteristics of the data. In this article, we will discuss locally-linear embedding which is an approach to nonlinear dimensionality reduction with neighborhood-preserving embeddings of large-dimensional data. The main points covered in this article are listed below.

Contents

`Register for Analytics Olympiad 2021>>`
1. Nonlinear dimensionality reduction
2. Multiple learning
3. Local-linear integration (LLE)
1. Locally linear Hessian cover (Hessian LLE)
2. Modified local-linear integration (MLLE)
3. Local tangent space alignment (LTSA)
4. Implementation of LLE in Python

Let’s start the discussion by understanding the nonlinear dimensionality reduction approach.

Nonlinear dimensionality reduction

Data with higher dimensions require more than two or three dimensions in space to be represented, which can sometimes be difficult to understand how the data is distributed in space or difficult to interpret the data due to their dimensionality. Thus, nonlinear dimensionality reduction is an approach where the simplification of the high dimension is done by assuming that the points of interest in the data lie in the low dimensional space. So if the data is in a lower dimensional space, we can easily interpret it by visualization. Multiple learning is a way to perform nonlinear dimensionality reduction.

Multiple learning

Multiple learning can be considered as the family of nonlinear dimensionality reduction approaches. Where most approaches to multiple learning depend on the idea that the data we have is only artificially high. And we know that these types of datasets are very difficult to interpret by visualization. Visualizing data that has fewer dimensions can represent a correct overview of the data structure where visualizing large data is much less intuitive. To obtain a correct structure of the data, a reduction of dimension is necessary.

We can simply apply the dimension reduction by choosing the random projection of the data. It helps us visualize the data structure to some extent, but random data projection results in the loss of many interesting data structures in big data. To deal with this random projection problem, many methods are presented to us, such as principal component analysis (PCA), independent component analysis, linear discriminant analysis and others. These are also very powerful methods for dimension reduction, but many of them are not taken into account when working with data whose structure is not linear.

The multiple learning approach can be seen as the generalization or updating of PCA so that it can be used for data dimension reduction which consists of a nonlinear structure. The typical multiple learning approach falls under unsupervised learning where algorithms attempt to learn the structure of data without using predetermined classifications. There can be many approaches in the multiple learning family such as isomap, locally linear embedding, local tangent space alignment and others. In this article, we will learn more about the local-linear integration approach for dimension reduction.

Local-linear integration (LLE)

There can be many advantages to using local-linear integration approaches over Isomap or other approaches, like using it with sparse matrix algorithms, we can get faster optimization and better results with many problems. The LLE algorithm begins by finding a set of the nearest neighbors of each point. After finding the nearest neighbors by calculating the weights defined for each point, it helps to describe the point as a linear combination of its neighbors. And in the last step to find the low dimensional embedding for the points, it uses a technique we call eigenvector optimization. The last step also makes it possible to describe each point as a linear combination of its neighbor. LLE does not have an internal model.

To calculate the neighbor Xj from any point Xi the LLE uses the barycentric coordinates of XI. used the linear combination, the algorithm reconstructs the point of origin. The weight matrix WI gives the linear combination at point XI. errors in the reconstruction of the point of origin can be given by the following function.

Image source

Where E (W) can also be called a cost function.

By reconstructing the point of origin XI, the contribution of point Xj can be denoted by the weight matrix WI. The cost function given above is minimized by these two constraints:

• Each point of origin is reconstructed only by the points in the neighborhood of the point of origin. The algorithm can make Wij as zero if any point Xj is not the neighboring point of the point of origin XI.
• Sum of each row in WI (weight matrix) = 1.

The main objective of the algorithm is to make the data of dimensions D of dimension d where D >> d. The idea behind the generation of neighborhood-preserving maps is that the same weight matrix of the D-dimension data that helps in the reconstruction of the origin points can also be used in the construction of the same origin point in the data of. dimensions d. Also, the minimization of the cost function is taken into account and each point Xi in the dimensional data D is mapped to a point Yi in the dimensional data d.

Image source

Unlike other cost functions in different methods, here the algorithm tries to keep the weight matrix the same, and the optimization of the coordinates of the data space is done by minimizing the cost function on the Y pointsI.

If the number of data points is N, the minimization problem can be solved by the eigenvalue problem of an N-dimensional sparse matrix NX. an orthogonal set of coordinates provided by the sparse matrix with N dimensions NX funds’ of non-zero eigenvectors. Typically we can say that there can be two LLE variants.

• Locally linear Hessian cover (Hessian LLE) – it is like a basic LLE based on the sparse matrix solving technique and tends to give much better quality results than the basic LLE. it has no internal models.
• Modified local-linear integration (MLLE) – The modified LLE (MLLE) is another variant of the LLE in the local weighting matrix addressed by multiple weights of each neighborhood. This process leads to distortions in LLE maps. More formally, we can say that if the weights generated from the base LLE are projected orthogonally, we can consider them to be multiple weights.
• Local tangent space alignment (LTSA) – Local Tangent Space Alignment can also be considered as the variant of LLE as it is quite similar to LLE when talking about the LTSA algorithm. The only difference between LLE and LTSA is that LLE focuses on preserving the neighborhood distance where LTSA characterizes the local geometry of each neighborhood.

In the next section of the article, we will see how to implement LLE and its different variants using python. The implementation below is inspired by the example given by SK-Learn.

Implementation of LLE in Python

In this article, we will implement LLE and its variant on the load digits dataset provided by python’s sklearn library.

``````from sklearn.datasets import load_digits
import matplotlib.pyplot as plt, offsetbox
import numpy as np
X = data.data
y =  data.target
n_samples = 1797
n_features = 64
n_neighbors = 30
fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(7, 7))
for idx, ax0 in enumerate(axs.ravel()):
ax0.imshow(X[idx].reshape((8, 8)), cmap=plt.cm.binary)
ax0.axis("off")

_ = fig.suptitle("digits dataset", fontsize=18)``````

Go out:

The image above is a representation of the digits dataset where we have imported all the classes available in the dataset package.

• Definition of an assistance function to plot the LLE and its variants:
``````def helper_function(X, title, ax):
from sklearn.preprocessing import MinMaxScaler
X = MinMaxScaler().fit_transform(X)
show_image = np.array([[1.0, 1.0]])
for i in range(X.shape):
ax.text(
X[i, 0],
X[i, 1],
str(y[i]),
color=plt.cm.Dark2(y[i]),
fontdict={"weight": "bold", "size": 9},
)

dist = np.sum((X[i] - show_image) ** 2, 1)
if np.min(dist) < 4e-3:
continue
show_image = np.concatenate([show_image, [X[i]]], axis=0)

import sklearn
imagebox = offsetbox.AnnotationBbox(
)

ax0.set_title(title)
ax0.axis("off")
``````

The above functions will use the different integration techniques on the digital data set and on the generated integration, they will plot the projection of the different data points available on the digital data set and by visualizing we will be able to understand the integration space, whether they are well groped or dispersed through the embedding space.

• Comparison of the different LLE techniques:

We can use the scikit-learn library provided by the LocallyLinearEmbedding module to do the LLE and its different variations by simply changing the method argument when implementing the module.

from sklearn.manifold import LocallyLinearEmbedding

``````    ax0.axis("off")

"Standard LLE": LocallyLinearEmbedding(
n_neighbors=n_neighbors, n_components=2, method="standard"
),

"Modified LLE": LocallyLinearEmbedding(
n_neighbors=n_neighbors, n_components=2, method="modified"
),

"Hessian LLE": LocallyLinearEmbedding(
n_neighbors=n_neighbors, n_components=2, method="hessian"
),

"LTSA LLE": LocallyLinearEmbedding(
n_neighbors=n_neighbors, n_components=2, method="ltsa"
),

}``````

By declaring these methods, we can now visualize the projection of the original data. Moreover, we can compare the computation time for each LLE using the following function.

``````

from time import time

projections, timing = {}, {}
for name, transformer in embeddings.items():

if name.startswith("Standard LLE"):
data = X.copy()
data.flat[:: X.shape + 1] += 0.01  # Make X invertible
else:
data = X
print(f"{name}...")
start_time = time()

projections[name] = transformer.fit_transform(data, y)
timing[name] = time() - start_time``````

Go out:

Finally, we can plot the resulting projection given by each method.

``````

from itertools import zip_longest

fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(17, 24))

for name, ax0 in zip_longest(timing, axs.ravel()):
if name is None:
ax0.axis("off")
continue
title = f"{name} (time {timing[name]:.3f}s)"
helper_function(projections[name], title, ax0)
plt.show()``````

Go out:

Here in the output above we can see the difference between the different LLE variants where the standard LLE took the shortest compute time and we can tell by looking at the visualization that we have good results for the LLE hessian and the modified LLE variants of the LLE.

Final words

Here in this article we have seen how the locally linear integration algorithm works and we have applied LLE and all its variants on the numeric data and seen the comparison between them. We can say that these LLE variants are a good option for us to apply in the situation where the data is large dimension and also follow the nonlinear dimensional structure.

The references

`Join our Telegram Group. Be part of an engaging community`