Articles

Amazing, but little-known Python libraries

The Python programmer is always looking for new libraries, which can improve the work in data engineering and business intelligence projects.

In this article we see some little-known, but very useful python libraries:

1. Pendulum

Although many libraries are available in Python for DateTime, I find Pendulum easy to use on any date operation. A pendulum is my favorite bookcase for my daily use at work. Extends the built-in Python datetime module, adding a more intuitive API for managing time zones and performing date and time operations such as adding time intervals, subtracting dates, and converting between time zones. Provides a simple and intuitive API for formatting dates and times.

Installation
!pip install pendulum
Example
# import library

import pendulum
dt = pendulum.datetime(2023, 1, 31)
print(dt)
 
#local() creates datetime instance with local timezone

local = pendulum.local(2023, 1, 31)
print("Local Time:", local)
print("Local Time Zone:", local.timezone.name)

# Printing UTC time

utc = pendulum.now('UTC')
print("Current UTC time:", utc)
 
# Converting UTC timezone into Europe/Paris time

europe = utc.in_timezone('Europe/Paris')
print("Current time in Paris:", europe)
output

2. ftfy

Have you encountered when the foreign language in the data does not appear correctly? This is called Mojibake. Mojibake is a term used to describe garbled or scrambled text that occurs as a result of encoding or decoding problems. It usually occurs when text written with one character encoding is incorrectly decoded using a different encoding. The ftfy python library will help you fix Mojibake, which is very useful in NLP use cases.

Installation
!pip install ftfy
Example
print(ftfy.fix_text('Correct the sentence using “ftfyâ€\x9d.')) print(ftfy.fix_text('✔ No problems with text')) print(ftfy.fix_text('à perturber la réflexion '))
output

In addition to Mojibake, ftfy will fix bad encodings, bad line endings, and bad quotes. may understand text that has been decoded as one of the following encodings:

  • Latin-1 (ISO-8859–1)
  • Windows-1252 (cp1252 — used in Microsoft products)
  • Windows-1251 (cp1251 — the Russian version of cp1252)
  • Windows-1250 (cp1250 — the Eastern European version of cp1252)
  • ISO-8859–2 (which is not exactly the same as Windows-1250)
  • MacRoman (used on Mac OS 9 and earlier)
  • cp437 (used in MS-DOS and some versions of the Windows command prompt)

3. Sketch

Sketch is a unique AI coding assistant designed specifically for users working with the pandas library in Python. It uses machine learning algorithms to understand the context of user data and provides relevant code suggestions to make data manipulation and analysis tasks easier and more efficient. Sketch does not require users to install any additional plug-ins in their IDE, making it quick and easy to use. This can significantly reduce the time and effort required for data-related tasks and help users write better, more efficient code.

Installation
!pip install sketch
Example

We need to add a .sketch extension to pandas dataframe to use this library.

.sketch.ask

ask is a feature of Sketch that allows users to ask questions about their data in a natural language format. Provides a text-based response to the user's query.

# Importing libraries import sketch import pandas as pd # Reading the data (using twitter data as an example) df = pd.read_csv("tweets.csv") print(df)
# Asking which columns are category type df.sketch.ask("Which columns are category type?")
output
# To find the shape of the dataframe df.sketch.ask("What is the shape of the dataframe")

.sketch.howto

howto is a feature that provides a block of code that can be used as a starting or ending point for various data-related tasks. We can ask for snippets of code to normalize their data, create new features, track data, and even build models. This will save time and make it easy to copy and paste the code; you don't have to write the code manually from scratch.

# Asking to provide code snipped for visualizing the emotions df.sketch.howto("Visualize the emotions")
output

.sketch.apply

The .apply function it helps generate new features, parse fields, and perform other data manipulations. To use this feature, we need to have an OpenAI account and use the API key to perform the tasks. I haven't tried this feature.

I enjoyed using this library, especially like it works, and I find it useful.

4. pgeocode

“pgeocode” is an excellent library that I recently stumbled upon that has been incredibly useful for my spatial analysis projects. For example, it allows you to find the distance between two postal codes and provides geographic information by taking a country and postal code as input.

Installation
!pip install pgeocode
Example

Get geographic information for specific postcodes

# Checking for country "India" nomi = pgeocode.Nominatim('In') # Getting geo information by passing the postcodes nomi.query_postal_code(["620018", "620017", "620012"])
output

“pgeocode” calculates the distance between two postcodes by taking the country and postcodes as input. The result is expressed in kilometres.

# Finding a distance between two postcodes distance = pgeocode.GeoDistance('In') distance.query_postal_code("620018", "620012")
output

5. rembg

rembg is another useful library that easily removes the background from images.

Installation
!pip install rembg
Example
# Importing libraries
from rembg import remove import cv2 # path of input image (my file: image.jpeg) input_path = 'image.jpeg' # path for saving output image and saving as a output.jpeg output_path = 'output.jpeg' # Reading the input image input = cv2.imread(input_path) # Removing background output = remove(input) # Saving file cv2.imwrite(output_path, output)
output

You may already be familiar with some of these libraries, but for me, Sketch, Pendulum, pgeocode, and ftfy are indispensable for my data engineering work. I rely on them a lot for my projects.

6. Humanize

Humanize” provides simple, easy-to-read string formatting for numbers, dates, and times. The goal of the library is to take the data and make it more user-friendly, for example by converting a number of seconds into a more readable string like "2 minutes ago". The library can format data in a variety of ways, including formatting numbers with commas, converting timestamps to relative times, and more.

I often use integers and timestamps for my data engineering projects.

Installation
!pip install humanize
Example (Integers)
# Importing library import humanize import datetime as dt # Formatting numbers with comma a = humanize.intcomma(951009) # converting numbers into words b = humanize.intword(10046328394) #printing print(a) print(b)
output
Example (date and time)
import humanize import datetime as dt a = humanize.naturaldate(dt.date(2012, 6, 5)) b = humanize.naturalday(dt.date(2012, 6, 5)) print(a) print(b)

Ercole Palmeri

Innovation newsletter
Don't miss the most important news on innovation. Sign up to receive them by email.
Tags: python

Latest Articles

The Benefits of Coloring Pages for Children - a world of magic for all ages

Developing fine motor skills through coloring prepares children for more complex skills like writing. To color…

May 2, 2024

The Future is Here: How the Shipping Industry is Revolutionizing the Global Economy

The naval sector is a true global economic power, which has navigated towards a 150 billion market...

May 1, 2024

Publishers and OpenAI sign agreements to regulate the flow of information processed by Artificial Intelligence

Last Monday, the Financial Times announced a deal with OpenAI. FT licenses its world-class journalism…

April 30 2024

Online Payments: Here's How Streaming Services Make You Pay Forever

Millions of people pay for streaming services, paying monthly subscription fees. It is common opinion that you…

April 29 2024