Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

April 19, 2021 04:00 am GMT

Hyperparameter Tuning: Understanding Randomized Search

Hi everyone!
This is the third of the series of articles on cross-validation and hyperparameter tuning. Here are the links to the first two if youd like to give them a read before going ahead with reading this blog post.

Let us quickly outline the key ideas that weve covered thus far.

In the first article, we observed how train/test split depends heavily on the particular data records that end up in the training and test sets, which led us to understand the need for cross-validation as a way to effectively evaluate a models performance.
We then saw how cross-validation can be used in hyperparameter search; If you can remember, we decided to choose that value of the hyperparameters that yielded the highest cross-validated accuracy.
In the second article, we looked at searching for the best hyperparameters more conveniently using GridSearchCV in scikit-learn. If you remember, towards the end of the article, we touched upon the fact that performing Grid Search to tune hyperparameters is actually computationally inefficient.

We shall now try to rephrase it better, in a more formal way.

Say we have to search for M parameters; Let p_1, p_2,p_3, , p_M be the M parameters. Let the number of values that we would like to search over for p_1 be n1, for p_2 be n2, and so on, with nM values for p_M.
Grid Search considers all possible hyperparameter settings (combinations) into account and creates a model for each possible setting to choose the best model with optimal hyperparameters.
To understand it better, assume that out of M parameters, we decide to freeze the values of all hyperparameters except one, say the M_th parameter p_M. So, Grid Search involves searching through the list of nM values for the M_th hyperparameter; So, there are nM models created.
Suppose we now freeze the values of all hyperparameters except two, say the last two (p_M and p_(M-1)). We now have to search through all possible combinations of p_M and p_(M-1), each having nM and n_(M-1) possible values that we could search over.
We now take a step back and freeze the value of p_M-1 and search through all values for p_M; To account for all possible combinations, we should repeat the procedure for all n_M-1 values for p_M-1. So, this process would leave us with n_(M-1) * nM models.

Hope its clear how the complexity scales with increasing values of the number of values each hyperparameter could take. For the above example with M hyperparameters, we would have n1*n2*n3**n_M models.This is why we said that things could scale up quickly and become computationally intractable with Grid Search.

With this motivation to make hyperparameter search computationally more efficient, let us proceed to understand Randomized Search.

Understanding RandomizedSearchCV

In contrast to GridSearchCV, not all parameter values are tried out in RandomizedSearchCV, but rather a fixed number of parameter settings is sampled from the specified distributions/ list of parameters.

If some of the hyperparameters that were searching for are continuous, then we should specify the distribution rather than the list of values, while defining the parameter grid. How do we define the fixed number of parameter settings?

The number of parameter settings that are tried is given by n_iter. Theres quality vs computational cost trade-off in picking n_iter.

A very small value of n_iter would imply that were more likely to find a suboptimal solution, because we are actually considering too few combinations.
A very high value of n_iter would mean we can ideally get closer to finding the best hyperparameters that yield the best model, but this again comes with a high computation cost as before. In fact, if we set n_iter= n1*n2*n3**n_M from the previous example, then, were essentially considering all possible hyperparameter combinations and now Randomized Search and Grid Search are equivalent.

Let us build on the same example of KNNClassifier from the previous blog posts; In addition to n_neighbors, let us also search for the optimal weighting strategy uniform where all points are weighted equally and distance option weights points by the inverse of their distance. And now, let us implement Randomized Search in scikit-learn and do the following steps, as we did for Grid Search.
1. Import RandomizedSearchCV class

from sklearn.model_selection import RandomizedSearchCV

2. Define the parameter grid

# specify "parameter distributions" rather than a "parameter grid"param_dist = dict(n_neighbors=k_range, weights=weight_options)

3. Instantiate the grid; Set n_iter=10, Fit the grid & View the results

# n_iter controls the number of searchesrand = RandomizedSearchCV(knn, param_dist, cv=10, scoring='accuracy', n_iter=10, random_state=5, return_train_score=False)rand.fit(X, y)pd.DataFrame(rand.cv_results_)[['mean_test_score', 'std_test_score', 'params']]#DataFramemean_test_score std_test_score  params0   0.973333    0.032660    {'weights': 'distance', 'n_neighbors': 16}1   0.966667    0.033333    {'weights': 'uniform', 'n_neighbors': 22}2   0.980000    0.030551    {'weights': 'uniform', 'n_neighbors': 18}3   0.966667    0.044721    {'weights': 'uniform', 'n_neighbors': 27}4   0.953333    0.042687    {'weights': 'uniform', 'n_neighbors': 29}5   0.973333    0.032660    {'weights': 'distance', 'n_neighbors': 10}6   0.966667    0.044721    {'weights': 'distance', 'n_neighbors': 22}7   0.973333    0.044222    {'weights': 'uniform', 'n_neighbors': 14}8   0.973333    0.044222    {'weights': 'distance', 'n_neighbors': 12}9   0.973333    0.032660    {'weights': 'uniform', 'n_neighbors': 15}

4. Examine the best score and best hyperparameters

# examine the best modelprint(rand.best_score_)print(rand.best_params_)# Output0.9800000000000001{'weights': 'uniform', 'n_neighbors': 18}

Parameters of the best model

Surprisingly, we see that the highest accuracy score obtained in this case, where we only looked at 10 different parameter settings instead of 60 in Grid Search, is the same as before: 0.98
And the value for n_neighbors= 18, which is also one of the optimal values that we got when we initially searched for the optimal value of n_neighbors. (Recall from the earlier blog post with the same example)

Maybe we just got lucky?
What is the guarantee that we will always get the best results?
Ah, this question makes perfect sense, doesnt it?

Let us do the following now: Let us run RandomizedSearchCV for multiple times and see how many times we really end up getting lucky!

Run RandomizedSearchCV 20 times and see what happens; We log the best_score_ for every run.

# run RandomizedSearchCV 20 times (with n_iter=10) and record the best scorebest_scores = []for _ in range(20):rand = RandomizedSearchCV(knn, param_dist, cv=10, scoring='accuracy', n_iter=10, return_train_score=False)rand.fit(X, y)best_scores.append(round(rand.best_score_, 3))Let us examine all the 20 best scores now.print(best_scores)# Output: Best Scores[0.973, 0.98, 0.98, 0.98, 0.973, 0.98, 0.98, 0.973, 0.98, 0.973, 0.973, 0.98, 0.98, 0.98, 0.98, 0.973, 0.98, 0.98, 0.98, 0.973]

Upon examining the best scores above for all the 20 runs, we see that we get the best accuracy score of 0.98 about 13 times.
Looks like were lucky indeed! What about the other 7 times when we didn't quite get the best accuracy score? These accuracy scores are around 0.973 which is pretty close to 0.98.

This observation convinces us that even though Randomized Search may not always give the hyperparameters of the best performing model, the models obtained by using these hyperparameters do not perform much worse compared to the best model obtained from Grid Search. This means, the best models thus obtained, with the hyperparameters from randomized search are clearly very close to the optimal model.

In essence, these may not be the best hyperparameters, but certainly close to the best hyperparameters, except that these are found under resource-constrained settings. Hope you all understood how we could use Randomized Search for hyperparameter tuning.
Thanks for reading! Happy Learning! Until next time!

References

[1] Heres the link to the Google Colab notebook for the example discussed above
[2] Introduction to Machine Learning in Python with scikit-learn by DataSchool.
[3] Scikit-learn documentation- RandomizedSearchCV
http://scikitlearn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

Photo by Chris Lawton on Unsplash

Original Link: https://dev.to/balapriya/hyperparameter-tuning-understanding-randomized-search-343l

Share this article:

View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To