Python rapidly gains ground across some areas of software development. With the readability and open-source wealth backend, devs find it easy to implement and powerful at the same time. One of the fields where Python has already obtained real hegemony is Machine Learning. And this status was achieved mostly due to extensive standard libraries just born to support data analyzing.
Even if Python is not the most popular programming language (yet!) it's definitely winning the race of the fastest growing one and loved by the highest amount of developers, who mention it as a next programming language they want to learn.
The broad range of utilization and relatively painless mastering makes it a perfect tool to handle advanced, complex applications including sophisticated algorithms and complicated engines. That’s why Python is a great match for machine learning and widespread through artificial intelligence solutions. Today we want to take a closer look at open-source (BSD license) libraries enabling to cut the time and effort software developers put into different fields of ML work.
A powerful Python open-source data science library which was implemented in applications like GoogleMaps or Uber. It was created by Wes McKinney to capture data into the nice and clear structures which provide intuitive and pleasant analysis. Pandas offers two object types - DataFrame (two-dimensional kind of a spreadsheet with rows and columns) and Series (a single column) and multiple methods for filtering, reshaping, pivoting, subsetting, and indexing data sets. It also makes your work less bothersome and ho-hum with automatization of data alignment and a plethora of other useful utilities. Moreover, it’s friendly not only for the user. Pandas is welcoming to a wide variety of input data (including flat files) like CSV, TSV, Excel, HDF, JSON, THML, SQL, HDF5 and compatible with other Python packages so you can combine it with for example Plotly to create interactive graphs straight from data frames.
- Commits: 18 997
- Releases: 101
- Contributors: 1 431
When you think about Machine Learning, it’s Google which probably comes to your mind first. It was just a matter of time to receive some smashing ML tool from Mountain View’s giant. And it came in 2015 when Google Brain team released the library for deep neural networks research. Even though this is its main purpose and in a scientific environment TensorFlow feels like fish in the water, the package finds a lot of business applications. A chief advantage of this library is supporting distributed computing which becomes handy when the graphs need to be computed on separate processes and different servers. Like Pandas, Tensorflow loves team-working with other packages. In the company of Keras, a minimalistic but extensible API perfect for effective prototyping and advanced research, neural network building process may need only a few lines of code. It’s also able to run multiple platforms like CPUs, GPUs, TPUs or mobile. No wonder Tensorflow with all its assets is widely applied to voice recognition and object identification from pictures. Just look how Airbbnb used it to help categorize its listing photos.
- Commits: 50 845
- Releases: 79
- Contributors: 1 871
Type of professor in this Machine Learning - Python fellowship invented by Jim Hugunin (author of Numeric, an ancestor of NumPy). It is the most related to data science tasks as it supports working with multidimensional arrays and matrices and provides advanced mathematical functions. Its structures facilitate manipulating with data which in Python are normally collected in not-so-pleasant lists. A collection of tools enable easy work and calculations across high-performance arrays. It’s frequently compared with MATLAB giving similar utilities to run efficient linear algebra and manipulate matrix. And if there’s some operation that can’t be done in those areas than it’s probably a moment to use SciPy library which contains a lot of foolproof and convenient numerical routines.
- Commits: 19 813
- Releases: 151
- Contributors: 738
Best buddy of NumPy with pretty the same role - defining multidimensional arrays and perform evaluated and optimized mathematical expressions on them. Theano was originally developed by Montreal Institute for Learning Algorithms at the Université de Montréal in 2007 so it’s one of the precursors in its field - deep learning; It is commonly used to build neural networks with advanced algorithms and simplify the whole process of creating models. What’s really great about is its ability to be run efficiently on both CPU and GPU architectures.
- Commits: 28 079
- Releases: 33
- Contributors: 334
Last but not least a little bit different item on the list which might be considered as a cherry on the cake. Matplotlib, a brainchild of John Hunter, is a marvelous library designed to deliver simple and decent 2D visualizations. It “tries to make easy things easy and hard things possible” by providing an extensive range of graphs like line, scatter, and stem plots, histograms, error charts, spectrograms or pie charts. With a little effort, you can also customize other elements like labels, legends or grids. Matplotlib is able to use different sorts of GUI kits and it performs on multiple platforms.
- Commits: 28 938
- Releases: 78
- Contributors: 793
Of course, those are only a few tools to make your machine learning work in Python nice and smooth. There’s a bunch of other libraries, frameworks and APIs like Google Cloud Vision that enables powerful image analysis or PyTorch, relatively fresh library commonly used for deep learning apps and natural language processing (BTW it might be even more attractive for developers working in software houses in Poland as its co-inventor is our homie - Adam Paszke :) ).