Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
August 1, 2020 05:00 am GMT

What's New in Kedro 0.16.4

If we take a look at the release notes I see one major feature improvement on the list, auto-discovery of hooks.

## Major features and improvements* Enabled auto-discovery of hooks implementations coming from installed plugins.

This one comes a bit surprising as it was just casually mentioned in #435

auto enabled plugins mentioned in issue 435

Think pytest

As mentioned in #435 this is the model that pytest uses. Not all plugins automatically start doing things right out of the box, but require a cli argument.

simplicity

It feels a bit crazy that simply installing a package will change the way that your pipeline gets executed. I do like that it requiers just a bit less reaching intot he framework stuff for the average user. Most folks will be able to write in the catalog and nodes without much change to the rest of the project.

Implementation

Reading through the docs, they show us that we can make our hooks automatically register by adding a kedro.hooks etnrypoint that points to a singleton instance of our hook.

from the docs

setup(    ...    entry_points={"kedro.hooks": ["plugin_name = plugin_name.plugin:hooks"]},)import loggingfrom kedro.framework.hooks import hook_implclass MyHooks:    @hook_impl    def after_catalog_created(self, catalog): # pylint: disable=unused-argument        logging.info("Reached after_catalog_created hook")hooks = MyHooks()

Careful with the singletons

hook authors beware

I will be a bit cautious before installing a plugin that is automatically registered. I know its not a common pattern, but if you were to leverage any part of two kedro projects at teh same time, and project specific data was stored in the instance of the hook it will likely be broken.

As long as the hook doesn't store data on the instance you will be ok. Hooks like what they have in the examples will be ok. They generally just take some information from the lifecycle arguments and do something at their prescribed lifecycle point.

Many of the hooks I am seeing in the wild are already more complicated and require the hooks author to utilize a __init__ method, and store data on the instance. If you were to do this on two pipelines simultaneusly it would break.

Use Virtual environments

Whatever virtual environment manager you use, it is more important than ever to make sure you DO NOT install plugins in your gloabl environment. Generally you should always run projects even toys or tests in a virtual environemnt.

I use conda

conda create -n my-sample-env python=3.8 -y

Overall

I think this is a really interesting direction for the project to go. Hooks are still really early. The implementation is good, but I forsee us getting some more functionality that may require us to rely on the __init__ method a little less

I have been writing short snippets about my mentality breaking into the tech/data industry in my newsletter, check it out and lets get the conversation started.

Newsletter Signup

Sign up for my Newsletter

see an issue, edit this post on GitHub

Original Link: https://dev.to/waylonwalker/what-s-new-in-kedro-0-16-4-5g75

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To