Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
October 12, 2020 10:01 am GMT

How to beat Pythons pip: A brief intro

The Pythons package installer, pip, is known to have issues when resolving software stacks. In the upcoming series of articles, I will briefly discuss an approach that helped to resolve versions of libraries for applications faster than pips resolution algorithm. Moreover, the resolved software stacks are scored based on various aspects to help with shipping high-quality software.

Python is one of the most growing programming languages out there. There is no doubt its becoming the programming language of choice for data science, machine learning engineers, or software developers. In my eyes, Python code is a pseudo-code that simply runs easy to write, easy to maintain. Creating an API server using Flask, making data analysis in Jupyter notebooks, or creating a neural network using TensorFlow, these all can be easily written in a few lines of code. Any performance-critical parts can be optimized thanks to CPythons C API. Python is a very effective weapon in anyones inventory.

Python, pip & resolvers

The Python Packaging Authority (PyPA) is a working group that maintains a core set of projects used in Python packaging. One of these components is pip the PyPA recommended tool for installing Python packages. If you developed any Python application, most probably you have used it or at least considered it to be used for installing libraries for your project. Similar tools are Pipenv (also maintained by PyPA) or Poetry.

pip does its job in most cases pretty well it can install your desired software from PyPI the Pythons Packaging Index that hosts open-source projects. Alternatively, you can use your privately hosted Python indexes as a source of software to be installed. Unfortunately, pip lacks a proper implementation of a resolver that can in some cases lead to painful situations. As of today, PyPA is working on a new pips resolver implementation. Mostly, resolvers use an implementation but lets have a look at the resolution problem from the other side.

A state-space made out of Python packages

Lets say we want to create an application that uses two libraries called simplelib and anotherlib. These libraries can be installed in different versions. These versions can have a different impact on the resulting software shipped e.g. performance impact, security impact, or, in the worst cases, the application does not assemble at all. Now, lets create a function that includes such observations and performs "scoring" with respect to versions included in the software installed. Such function would have discrete values and for our artificial example it could look like this visualization (assuming the libraries do not have any transitive dependencies):

Alt Text

To make it more intuitive lets try to interpolate values and plot the resulting figure:

Alt Text

On the horizontal axes, you can see different versions of simplelib and anotherlib libraries. On the vertical axis, you can see different values of the scoring function. If you would use pip, Pipenv, or Poetry, all these tools will resolve as more recent versions of libraries as possible on our graph it would be the rightmost value:

Alt Text

But what if we want to ship better software? What if the most recent releases are broken? That would require manual work, increased maintenance cost and one can easily end up in dependency hell.

Thoths advise: install the right software

The idea described above gave birth to a project called Thoth. Thoth is a recommendation engine for Python applications that can resolve not the latest, but the greatest set of libraries that get installed for your application. Thoths resolver is resolving and scoring software stacks based on its aggregated knowledge hence its a server-side resolution. You can submit requirements that you have for your application and Thoths recommendation engine can resolve the software stack that satisfies them.

In the upcoming articles, I will dive more into Thoths internals how the resolution is performed, what are the key concepts implemented, and how the implementation can resolve and score tenths, hundreds, or thousands of software stacks a second. One of the concepts used there is reinforcement learning that helps to resolve high-quality software stacks based on observations in Thoths knowledge base, so stay tuned!


Original Link: https://dev.to/fridex/how-to-beat-python-s-pip-a-brief-intro-4bec

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To