Planet Python

Last update: May 17, 2021 09:40 PM UTC

May 17, 2021

PyCharm

Thank You for Supporting the PyCharm and DSF Campaign

This April we joined forces with the Django Software Foundation to get as much support as possible from our audience for Django.

In one month we managed to reach out to thousands of Pythonistas who helped us with this fundraiser campaign promotion. Thanks to your active participation, shares, reposts, and purchases, we managed to raise $45K for the DSF.

This represents roughly 20% of the DSF’s overall budget, which goes directly towards funding the continued development and support of Django via the Django Fellowship program and Django conferences worldwide.

Apart from the fundraiser itself, we created educational materials for those interested in Django. Take a look at:

This episode of our Early Access PyCharm podcast about Django support in PyCharm. It will help you understand how exactly you can benefit from PyCharm while developing in Django.
This extensive tutorial on developing Django apps on AWS.
This webinar on tips and tricks for Django with PyCharm by Paul Everitt.

If you would like to contribute to the Django community, the fundraiser is still ongoing and you can donate directly at djangoproject.com.

Thanks again to all contributors for your support!

May 17, 2021 07:41 PM UTC

Paolo Amoroso

Python with Replit: A Journey in the Cloud

Can I use only Replit for all my Python development? It’s what I set out to find.

Follow along my journey to coding in Python on Chrome OS only with the tools and resources of Replit. I want to learn to live off the land in Replit; to develop, test, check into version control, run, document, deploy, and host Python code with Replit. I’ll share my experiences in Python with Replit, a blog post series documenting my ongoing efforts.

Python REPL in Replit on an ASUS Chromebox 3

A Python REPL in Replit on my ASUS Chromebox 3.

This is not a philosophical quest for cloud purity or a “use only brand X for 30 days” blog challenge. It’s rather the realization of how much my tools shape the way I work. When in Chrome, do as the chromies do.

I want Replit to be my main Python environment, figure out how to work around its limitations, and push the boundaries of what it can do.

I’m a hobby programmer and a Python beginner, not a professional developer. These constraints define the journey and frame my setup and tooling decisions.

Why use Replit on Chrome OS

Why do I want to use Replit on Chrome OS, anyway?

Because Replit is the platform that best matches my browser-based desktop setup and workflow. Replit is a development and collaboration environment fully in the cloud with dozens of programming languages and frameworks, including Python that’s among the best supported.

Understanding my motivations requires a step back. In 2015 I switched to Chrome OS as my only desktop operating system. Doing everything in a browser became second nature and shaped how I work on the desktop. Which means apps and resources outside of the browser, such as traditional desktop IDEs, create friction and impose tradeoffs.

Although Chrome OS seems “just a browser” (which is why it’s eating the world), the ability to run Android and Linux apps is also backed in. There are amazing astronomy Android apps I use on my Chrome OS devices, as well as great astronomy tools for Linux. What about Python IDEs for Android and Linux?

The only usable Android Python IDE, although an engineering marvel, has incompatibilities and limitations. Its major drawback is the IDE and Python code run and live only on a single device. When you’re used to accessing all your apps and data from the cloud, anything less is limiting.

On Crostini, the Chrome OS Linux container system, there's an ample choice of Python development tools such as PyCharm and VS Code. But these great environments don’t fit well into my browser-based cloud workflow. Again, the IDE and Python code on a device run and live there. Installation, maintenance, and data backups (Chrome OS can back up a full Linux image and is not granular enough) are additional burdens desktop IDEs require. Plus, traditional IDEs are overkill for my Python skills and where I’m aiming.

Why Replit is a good Python environment

Enter Replit.

Replit runs fully in the cloud. Firing it up is as simple as visiting a website from any desktop or mobile device with a browser and an Internet connection. And, most importantly for me, a web app like Replit is a first-class citizen on Chrome OS.

Replit has not always been that good at Python though.

Early versions of the platform offered basic features that were best for creating console programs. But Replit looked promising and already had a significant benefit for Python development. A REPL is a virtual environment in disguise: you don’t need to explicitly use virtual environment tools like venv, either from a shell or an IDE.

In early 2020, the Replit team must have strapped a rocket on the platform because they began cranking out new features and performance improvements on an almost weekly basis. These enhancements had a major positive impact also on Python. For example, now the default Python REPL can run most graphical apps out of the box, whereas earlier you needed specialized REPLs with a GUI framework or library built in such as Tkinter.

The new Replit helped me develop Spacestills, my first Python project. I’m continuing to experiment with Replit and work on new Python code. I can’t wait to share my experiences with you in the Python with Replit series.

May 17, 2021 03:04 PM UTC

Real Python

Embedded Python: Build a Game on the BBC micro:bit

Writing code that runs in the terminal or in your web browser is good fun. Writing code that affects the real world, however, can be satisfying on a whole other level. Writing this sort of code is called embedded development, and Python is making it more accessible than ever!

In this tutorial, you’ll learn:

What embedded development is and why you would use Python to do it
What your hardware and software options are for running Python on an embedded system
When Python is a good fit for an embedded system and when it’s not
How to write a basic game on the BBC micro:bit with MicroPython

This tutorial contains code snippets that allow you to build a simple game on the BBC micro:bit. To access the full code and get a sneak preview on what you’ll be building, click the link below:

Get Sample Code: Click here to get the sample code you’ll use to learn about embedded development with Python in this tutorial.

What Is Embedded Development?

Embedded development is writing code for any device that isn’t a general-purpose computer. This definition is a little bit ambiguous, so some examples might help:

General-purpose computers include laptops, desktop PCs, smartphones, and so on.
Embedded systems include washing machines, digital machines, robots, and so on.

As a general rule of thumb, if you wouldn’t call something a computer, but it still has code running on it, then it’s probably an embedded system. The name comes from the idea of embedding a computer into a physical system to perform some task.

Embedded systems tend to be designed to do a single task, which is why we refer to regular computers as “general purpose”: they are designed to do more than one task.

In the same way that you need a computer to run regular code, to run embedded code, you need some kind of hardware. These pieces of hardware are usually referred to as development boards, and this tutorial will introduce you to a few designed to run Python.

Python for Embedded Development

One of the best things about learning Python is that it’s applicable in so many places. You can write code that runs anywhere, even on embedded systems. In this section, you’ll learn about the trade-offs that come with using Python for your embedded project and some things to be aware of when starting out.

Benefits of Using Python

The core benefit that Python brings when building an embedded system is development speed. Python has libraries available for most tasks, and this still mostly holds true for its embedded implementations. You can focus on building your system since many of the problems you’d encounter have been solved already.

Since Python is higher level than other common embedded languages, the code you’ll write will be more concise. This helps development speed, meaning you’ll write code faster, but it also helps keep your code understandable.

Python is memory managed. C++, a common choice for embedded development, is not. In C++, you are responsible for freeing up memory when you’re done with it, something that is very easy to forget, leading to your program running out of memory. Python does this for you.

Disadvantages of Using Python

While Python’s memory management is a big help, it does incur a minor speed and memory cost. The MicroPython docs have a good discussion on memory issues.

Another thing to consider is that the Python interpreter itself takes up space. With a compiled language, the size of your program depends just on your program, but Python programs need the interpreter that runs them. The Python interpreter also takes up RAM. On the micro:bit, you can’t write Bluetooth code with Python since there’s not enough room for Python and Bluetooth at the same time.

Since Python is interpreted, it can never be quite as fast as a compiled language. An interpreted language needs to decode each instruction before running it, but a compiled language can just run. In practice, though, this rarely matters as Python programs still run fast enough for most use cases.

Things to Watch Out for When New to Embedded Development

Modern computers have lots of memory to work with. When you’re programming them, you don’t have to worry too much about the size of lists you create or loading a whole file at once. Embedded systems, however, have limited memory. You have to be careful when writing your programs not to have too many things in memory at once.

Similarly, processor speeds on embedded systems are much slower than on desktop computers. The processor speed determines how quickly your code gets executed, so running a program on an embedded computer will take longer than running it on a desktop computer. It’s more important to think about the efficiency of embedded code—you don’t want it to take forever to run!

Perhaps the biggest change when programming embedded systems is power requirements. Laptops, phones and desktop computers either plug into the wall or have large batteries. Embedded systems often have tiny batteries and have to last for a really long time, sometimes even years. Every line of code that you run costs a little bit of battery life, and it all adds up.

Read the full article at https://realpython.com/embedded-python/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

May 17, 2021 02:00 PM UTC

EuroPython

EuroPython 2021: Talk Voting is Open

Talk voting is your chance to tell us what you’d like to see at EuroPython 2021. We will leave talk voting open until:

Sunday, May 23, 23:59:59 CEST

In order to vote, please log in to the website and navigate to the talk voting page:

https://ep2021.europython.eu/events/talk-voting/

Who can participate?

Any registered attendee of EuroPython 2021 with a paid ticket, as well as attendees of one of the past EuroPython conferences (going back to 2015) can vote. If you have submitted a proposal this year, you are also eligible to vote.

If you have not attended EuroPython before, but want to participate in talk voting, you have to buy a ticket before you can vote.

How talk voting works

The talk voting interface lists all submitted proposals, including talks, helpdesks and posters. You can then vote on the proposal you'd like to see at the event.

Details on the voting process are described on our talk voting page.

Talk Selection

After the talk voting phase, the EuroPython Program Workgroup (WG) will use the votes to select the talks and build a schedule.

The talk voting is a good and strong indicator of what attendees are interested to see. Submissions are also selected based on editorial criteria, for example, to increase diversity, give a chance to less mainstream topics and make sure that topics don't overlap too much.

In general, the Program WG will try to give as many speakers a chance to talk as possible. If speakers have submitted multiple talks, the one with the highest rating will most likely get selected.

Enjoy,
EuroPython 2021 Team
EuroPython Society
EuroPython 2021 Website

May 17, 2021 10:53 AM UTC

Paolo Amoroso

A NASA TV Still Frame Viewer in Python

I wrote Spacestills, a Python program for viewing NASA TV still frames.

The main window of Spacestills running on Replit.

As a hobbyist wishing to improve my Python programming skills, for some time I’ve wanted to work on learning projects more substantial than code snippets, throwaway tools, or short scripts.

Spacestillschecks several boxes. The problem domain is one of my primary interests, space exploration. At about 350 lines of code, it’s a non-trivial system with a GUI. It accesses the network to download data from the web. Finally, the program relies on a few Python libraries.

About the program

Spacestills periodically downloads NASA TV still frames from a web feed and displays them in a GUI.

The program allows to correct the aspect ratio of the frames and save them in PNG format. It downloads the latest frame automatically and gives the option to reload manually, disable the automatic reload, or change the download frequency.

As a learning exercise, Spacestillsis a basic program with minimal features.

However, it does something useful: capturing and saving images of space events NASA TV covers. To complement the commentary and discussion, space enthusiasts often live blog events by sharing to social networks or forums the screenshots they manually take from NASA TV. Spacestillsspares the effort of using screen capture tools and saves image files that are ready for sharing.

Visit the Spacestills project site for the full source code as well as the installation and usage instructions. You can run Spacestills online on Replit.

Development environment

I developed Spacestills with Replit. Replit is a development, deployment, and collaboration environment in the cloud that supports dozens of programming languages and frameworks, including Python. As a Chrome OS and cloud enthusiast, I love Replit because it works fully in the browser and there’s nothing to download or install.

The full workspace where I developed and maintain Spacestills is available on Replit where you can also run the program online.

On my ASUS Chromebox 3, which I use for working on Spacestills with Replit, I could have run in a Crostini Linux container a traditional Python IDE such as PyCharm. But Replit is perfect for my needs, is not overkill, starts up fast, and demands fewer resources.

Resources and dependencies

Spacestills relies on a number of resources and Python libraries.

This is by design. I wanted a programming project simple enough to complete at my learning stage, but complex enough to pull together network data and a few Python libraries.

NASA TV feed

The Kennedy Space Center website has a page with a selection of NASA video feeds, including the NASA TV Public Channel. The feeds show the latest still frames and are updated automatically.

Each feed comes with frames in three sizes and Spacestills relies on the largest NASA TV feed featuring 704x408 pixel frames. The maximum update frequency is once every 45 seconds. Therefore, retrieving the latest still frame is as simple as downloading a JPEG image from the feed’s URL.

The raw images are stretched vertically and look odd. So the program can correct the aspect ratio by squeezing the images and producing an undistorted 16:9 version.

Python

Although Spacestills may work with earlier versions, I recommend using Python 3.6. The program doesn't use any language features specific to version 3.6 or later, but PySimpleGUI will eventually require 3.6.

Libraries

Spacestills depends on these Python libraries:

Pillow: image processing
PySimpleGUI: GUI framework (Spacestills uses the Tkinter backend)
Requests: HTTP requests

Design

The program displays one image at a time, a still frame from the NASA TV feed.

I decided to store the image in memory for simplicity and performance. This does away with the complexity of keeping track of temporary files. Memory storage is an acceptable tradeoff because the single image is small and doesn’t raise resource usage concerns.

I could simplify the logic by having the program unconditionally download, resize, and display a new still frame from the feed, even when the user changes the aspect ratio.

Aside from performance issues, this may create inconsistencies. What if the user wanted to resize the currently displayed image when a new one was available in the feed? Downloading unconditionally would discard the image the user still wanted. Instead, Spacestills caches the original image of the current frame and resizes a copy.

The cache solves another problem. Cycling several times between making the same bitmap larger and smaller degrades quality and introduces artifacts, so Spacestills fetches the cached original whenever it needs to change the size.

Image representation and storage

The PySimpleGUI Image user interface element is a natural choice for visualizing a still frame.

Since PySimpleGUI’s Tkinter backend accepts few image formats as input to the PySimpleGUI Image element, so there are constraints on the format for storing the image downloaded from NASA TV’s feed. Since PySimpleGUI.Image doesn’t support JPEG Spacestills converts the downloaded image to PNG which PySimpleGUI.Image accepts.

The program stores the image data in a Pillow PIL.Image object. The library has convenient methods for creating images from files, resizing images, and saving them to files.

The StillFrame class

StillFrame is a Spacestills class that holds a still frame downloaded from the NASA TV feed.

The StillFrame.image attribute stores the image as a PIL.Image instance in PNG format and all the StillFrame methods maintain the format invariant, converting to PNG if necessary. The cached original image, again a PIL.Image instance, goes in the StillFrame.original_image attribute.

The class has methods for returning the raw image data PySimpleGUI.Image needs, calculating the aspect ratio, resizing the image, and converting to PNG.

The decision to keep the image in memory brings the side benefit of simplifying some methods, for example StillFrame.bytes() that returns the raw bytes of a frame image. The implementation relies on Python’s BytesIO binary streams and is straightforward:

def bytes(self):

file = BytesIO()

self.image.save(file, 'png')

file.seek(0)

return file.read()

Recall that to maintain the format invariant, the image needs to be converted to PNG. So, after creating a BytesIO stream, the method saves to it the PIL.Image in self.image, rewinds the stream, reads it back, and returns the bytes.

The StillFrame.topng() method for converting the image to PNG was originally private. Since Spacestills has no APIs or clients, private methods and attributes would probably over-engineer the program at this early stage. If I decide to re-use some of the code, for example, to employ a different GUI or visualization frontend, I may provide better encapsulation.

Why use a class at all? Isn’t it overkill for managing a cache and not much else?

I experimented with holding a still frame in a PIL.Image and using Python’s ability to dynamically create a new attribute for the cache without changing the original class. This way the image processing methods became ordinary functions, taking the image as an argument.

However, I didn’t see a clear advantage of one solution over the other and went with the class-based one. This leaves the door open to storing additional state, for example metadata such as the download time.

Still frame download and processing

Function download_image() is at the core of the program as it downloads an image from the feed and returns a PIL.Image instance. The function creates a PIL.Image that reads from a BytesIO stream which, in turn, reads the response content an HTTP request to the feed URL returns.

In case of network errors or other issues, download_image() returns a blank image with a blue background. This way no empty area is left in the GUI where the image is supposed to be. Function make_blank_image() takes care of creating and returning the blank image.

To refresh the view with the latest still frame, either automatically or manually by the user, the code calls function refresh(). The function has to be called with an argument indicating whether to resize the still frame if the aspect ratio correction option is active. refresh() can update the PySimpleGUI.Image element that displays the still frame with just one line of code:

window['-IMAGE-'].update(data=still.bytes())

The code retrieves the element identified by the key -IMAGE- from the window dictionary, then updates its data parameter with the raw bytes of the still frame still, which holds the current still frame. The data parameter of the PySimpleGUI.Image element accepts the bytes of an image.

Function change_aspect_ratio() does the actual resizing of the still frame and updates the PySimplegUI.Image element with similar code.

To save the still frame, the program calls the straightforward function save(). It takes as arguments a StillFrame instance and a filename returned by the PySimpleGUI popup_get_file() function, which presents to the user a file save dialog.

The image download loop

Spacestills runs a loop that downloads the still frames from the feed and displays them, unless the user turns off the automatic update option. The user can also change the frame update frequency or manually reload the frame.

The download loop is implemented in part as a clause of the program’s event loop, a feature of the PySimpleGUI GUI framework. The download loop relies also on the next_timeout() function, to calculate the time when to download the next frame, and the timeout_due() predicate to check whether the next download is due.

The event loop

The program’s main() function mostly contains PySimpleGUI boilerplate code to create a window from a layout and run the event loop.

The loop branches on the event and values read from the program window, updates some state, and calls the functions to perform the requested actions. There’s a clause for every user interface element plus a check of whether the next automatic image reload is due.

GUI layout

I experimented with several GUI layouts, and none were completely satisfactory.

Aside from the natural grouping of the auto-reload settings, I found no good way of further organizing the user interface elements into groups or hierarchies that would make the program more intuitive. In the end I went with a layout that arranges the elements in two rows, the first one containing all the elements except for the auto-reload settings that go on the second row.

Open issues

Spacestills is my largest Python project so far, and I’m mostly satisfied. It has all the features I planned, works well, and the code is moderately Pythonic.

However, the system has a few issues and ways it didn’t turn out as expected.

Overall, Spacestills feels like a beginner project, which is not necessarily a bad thing at my learning stage. A first hint is some identifiers don’t have consistent names and others aren’t descriptive enough.

I also feel the design leaves something to be desired. I’m not sure I picked the right abstractions, functions, and classes to implement the logic. This is likely a challenge beginners face when scaling their code to larger systems.

The event loop seems long. But wrapping its functionality in several small functions for the sake of brevity wouldn’t probably help much. Although not encapsulated, the current code is straightforward, makes the logic clear, and improves readability.

Speaking of design and abstractions, the way refresh() and other functions depend on the layout of the GUI via hard-coded window keys seems off. It’s in part a consequence of the way the domain logic interleaves with framework code. I can’t think of a better way of decoupling the references to user interface elements from the domain logic.

The download loop is based on polling. Comparing times in a tight loop feels suboptimal, and I do not know the impact on performance. But the program seems responsive enough.

There’s a minor issue I’m not sure is a bug of my program, of PySimpleGUI, or my misunderstanding of the framework. Typing a filename in the save dialog without going through the browse dialog doesn't add the default .png extension.

Finally, I added comprehensive docstrings and some comments, as well as a README file. Is all this documentation overkill for such a small program? I don’t know, but I wanted to practice with thoroughly documenting a system.

Although I’m aware of these issues and problems, I’m unable to figure better ways of addressing them. This is likely another consequence of my early stage of learning. Hey, it’s progress!

Possible improvements

Spacestills has room for improvement beyond fixing the open issues.

A further step may be to break down the code from a monolithic, single-file program into modules implementing different parts of the system. This would probably simplify the tests, which are currently missing altogether.

Modularizing the program would bring another benefit. Once the domain logic is sufficiently decoupled from the presentation, it shouldn’t be difficult to provide an alternate user interface such as a web frontend.

Spacestills is a step of my journey to using Python with Replit in the cloud.

May 17, 2021 09:27 AM UTC

Python Pool

6 Ways to Read a CSV file with Numpy in Python

Welcome to another module of numpy. In our previous module, we had got insights on numpy in python. But the task becomes difficult while dealing with files or CSV files in python as there are a humongous amount of data in a file. To make this task easier, we will have to deal with the numpy module in python. If you have not studied numpy, then I would recommend studying my previous tutorial to understand numpy.

Introduction

One of the difficult tasks is when working with data and loading data properly. The most common way the data is formatted is CSV. You might wonder if there is a direct way to import the contents of a CSV file into a record array much in the way that we do in R programming?

Why CSV file format is used?

CSV is a plain-text file that makes it easier for data manipulation and is easier to import onto a spreadsheet or database. For example, You might want to export the data of certain statistics to a CSV file and then import it to the spreadsheet for further data analysis. It makes users working experience very easy programmatically in python. Python supports a text file or string manipulation with CSV files directly.

Ways to load CSV file in python

There are many ways to load a CSV file in python. The three common approaches in Python are the following: –

Load CSV using numpy.
Using Standard Library function.
Load CSV using pandas.
Using PySpark.

Out of all the three today, we will discuss only how to read a CSV file using numpy. Moving ahead, let’s see how Python natively uses CSV.

Reading of a CSV file with numpy in python

As mentioned earlier, numpy is used by data scientists and machine learning engineers extensively because they have to work with a lot with the data that are generally stored in CSV files. Somehow numpy in python makes it a lot easier for the data scientist to work with CSV files. The two ways to read a CSV file using numpy in python are:-

Without using any library.
numpy.loadtxt() function
Using numpy.genfromtxt() function
Using the CSV module.
Use a Pandas dataframe.
Using PySpark.

1.Without using any built-in library

Sounds unreal, right! But with the help of python, we can achieve anything. There is a built-in function provided by python called ‘open’ through which we can read any CSV file. The open built-in function copies everything that is there is a CSV file in string format. Let us go to the syntax part to get it more clear.

Syntax:-

open('File_name')

Parameter

All we need to do is pass the file name as a parameter in the open built in function.

Return value

It returns the content of the file in string format.

Let’s do some coding.

file_data = open('sample.csv')
for row in file_data:
    print(row)

OUTPUT:-

Name,Hire Date,Salary,Sick Days Left 
Graham Bell,03/15/19,50000.00,10 
John Cleese,06/01/18,65000.00,8 
Kimmi Chandel,05/12/20,45000.00,10 
Terry Jones,11/01/13,70000.00,3 
Terry Gilliam,08/12/20,48000.00,7 
Michael Palin,05/23/20,66000.00,8

2. Using numpy.loadtxt() function

It is used to load text file data in python. numpy.loadtxt( ) is similar to the function numpy.genfromtxt( ) when no data is missing.

Syntax:

numpy.loadtxt(fname)

The default data type(dtype) parameter for numpy.loadtxt( ) is float.

import numpy as np
data = np.loadtxt("sample.csv", dtype=int)
print(data)# Text file data converted to integer data type

OUTPUT:-

[[1. 2. 3.]  [4. 5. 6.]]

Explanation of the code

Imported numpy library having alias name as np.
Loading the CSV file and converting the file data into integer data type by using dtype.
Print the data variable to get the desired output.

3. Using numpy.genfromtxt() function

The genfromtxt() function is used quite frequently to load data from text files in python. We can read data from CSV files using this function and store it into a numpy array. This function has many arguments available, making it a lot easier to load the data in the desired format. We can specify the delimiter, deal with missing values, delete specified characters, and specify the datatype of data using the different arguments of this function.

Lets do some code to get the concept more clear.

Syntax:

numpy.genfromtxt(fname)

Parameter

The parameter is usually the CSV file name that you want to read. Other than that, we can specify delimiter, names, etc. The other optional parameters are the following:

Name	Description
fname	file, file name, list to read.
dtype	The data type of the resulting array. If none, then the data type will be determined by the content of each column.
comments	All characters occurring on a line after a comment are discarded.
delimiter	The string is used to separate values. By default, any whitespace occurring consecutively acts as a delimiter.
skip_header	The number of lines to skip at the beginning of a file.
skip_footer	The number of lines to skip at the end of a file.
missing_values	The set of strings corresponding to missing data.
filling_values	A set of values that should be used when some data is missing.
usecols	The columns that should be read. It begins with 0 first. For example, usecols = (1,4,5) will extract the 2nd,5th and 6th columns.

Description of the paramters

Return Value

It returns ndarray.

from numpy import genfromtxt
data = genfromtxt('sample.csv', delimiter=',', skip_header = 1)
print(data)

OUTPUT:

[[1. 2. 3.]  [4. 5. 6.]]

Explanation of the code

From the package, numpy imported genfromtxt.
Stored the data into the variable data that will return the ndarray bypassing the file name, delimiter, and skip_header as the parameter.
Print the variable to get the output.

4. Using CSV module in python

The CSV the module is used to read and write data to CSV files more efficiently in Python. This method will read the data from a CSV file using this module and store it into a list. Then it will further proceed to convert this list to a numpy array in python.

The code below will explain this.

import csv
import numpy as np

with open('sample.csv', 'r') as f:
    data = list(csv.reader(f, delimiter=";"))

data = np.array(data)
print(data)

OUTPUT:-

[[1. 2. 3.]  [4. 5. 6.]]

Explanation of the code

Imported the CSV module.
Imported numpy as we want to use the numpy.array feature in python.
Loading the file sample.csv in reading mode as we have mention ‘r.’
After separating the value using a delimiter, we store the data into an array form using numpy.array
Print the data to get the desired output.

5. Use a Pandas dataframe in python

We can use a dataframe of pandas to read CSV data into an array in python. We can do this by using the value() function. For this, we will have to read the dataframe and then convert it into a numpy array by using the value() function from the pandas’ library.

from pandas import read_csv
df = read_csv('sample.csv')
data = df.values
print(data)

OUTPUT:-

[[1 2 3]  [4 5 6]]

To show some of the power of pandas CSV capabilities, I’ve created a slightly more complicated file to read, called hrdataset.csv. It contains data on company employees:

hrdataset CSV file

Name,Hire Date,Salary,Sick Days Left 
Graham Bell,03/15/19,50000.00,10 
John Cleese,06/01/18,65000.00,8 
Kimmi Chandel,05/12/20,45000.00,10 
Terry Jones,11/01/13,70000.00,3 
Terry Gilliam,08/12/20,48000.00,7 
Michael Palin,05/23/20,66000.00,8

import pandas
dataframe = pandas.read_csv('hrdataset.csv')
print(dataFrame)

OUTPUT:-

         Name      Hire Date   Salary   Sick Days Left 
0   Graham Bell    03/15/19    50000.0          10 
1   John Cleese    06/01/18    65000.0           8 
2   Kimmi Chandel  05/12/20    45000.0          10 
3   Terry Jones    11/01/13    70000.0           3 
4   Terry Gilliam  08/12/20    48000.0           7 
5   Michael Palin  05/23/20    66000.0           8

6. Using PySpark in Python

Reading and writing data in Spark in python is an important task. More often than not, it is the outset for any form of Big data processing. For example, there are different ways to read a CSV file using pyspark in python if you want to know the core syntax for reading data before moving on to the specifics.

Syntax:-

spark.format("...").option(“key”, “value”).schema(…).load()

Parameters

DataFrameReader is the foundation for reading data in Spark, it can be accessed via spark.read attribute.

format — specifies the file format as in CSV, JSON, parquet, or TSV. The default is parquet.
option — a set of key-value configurations. It specifies how to read data.
schema — It is an optional one that is used to specify if you would like to infer the schema from the database.

3 ways to read a CSV file using PySpark in python.

df = spark.read.format(“CSV”).option(“header”, “True”).load(filepath).

2. df = spark.read.format(“CSV”).option(“inferSchema”, “True”).load(filepath).

3. df = spark.read.format(“CSV”).schema(csvSchema).load(filepath).

Lets do some coding to understand.

diamonds = spark.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv")

OUTPUT:-

3 ways to read a CSV file using PySpark in python.

diamonds

Conclusion

This article has covered the different ways to read data from a CSV file using the numpy module. This brings us to the end of our article, “How to read CSV File in Python using numpy.” I hope you are clear with all the concepts related to CSV, how to read, and the different parameters used. If you understand the basics of reading CSV files, you won’t ever be caught flat-footed when dealing with importing data.

Make sure you practice as much as possible and gain more experience.

Got a question for us? Please mention it in the comments section of this “6 ways to read CSV File with numpy in Python” article, and we will get back to you as soon as possible.

FAQs

How do I skip the first line of a CSV file in python?

Ans:- Use csv.reader() and next() if you are not using any library. Lets code to understand.

Let us consider the following sample.csv file to understand.

sample.csv

fruit,count 
apple,1 
banana,2

file = open('sample.csv')
csv_reader = csv.reader(file)
next(csv_reader)

for row in csv_reader:
    print(row)

OUTPUT:-

['apple', '1'] 
['banana', '2']

As you can see the first line which had fruit, count is eliminated.

2. How do I count the number of rows in a csv file?

Ans:- Use len() and list() on a csv reader to count the number of lines.

lets go to this sample.csv data

1,2,3
4,5,6
7,8,9

file_data = open("sample.csv")
reader = csv.reader(file_data)
Count_lines= len(list(reader))
print(Count_lines)

OUTPUT:-

As you can see from the sample.csv file that there were three rows that got displayed with the help of the len() function.

The post 6 Ways to Read a CSV file with Numpy in Python appeared first on Python Pool.

May 17, 2021 07:56 AM UTC

[Solved] Valueerror: Too Many Values to Unpack (Expected 2)

Errors are illegal operations or mistakes. As a result, a program behaves unexpectedly. In python, there are three types of errors – Syntax errors, logic errors, and exceptions. Valuerror is one such error. Python raises valueerror when a function receives an argument that has a correct type but an invalid value. ‘Python valueerror: too many values to unpack (expected 2)’ occurs when you are trying to access too many values from an iterator than expected.

Introduction to Valueerror: too many values to unpack (expected 2)

Functions in python have the ability to return multiple variables. These multiple values returned by functions can be stored in other variables. ‘Python valueerror: too many values to unpack (expected 2)’ occurs when more objects have to be assigned variables than the number of variables or there aren’t enough objects present to assign to the variables.

What is Unpacking?

Unpacking in python refers to the operation where an iterable of values must be assigned to a tuple or list of variables. The three ways for unpacking are:

1. Unpacking using tuple and list:

When we write multiple variables on the left-hand side of the assignment operator separated by commas and tuple or list on the right-hand side, each tuple/list value will be assigned to the variables left-hand side.

Example:

x,y,z = [10,20,30]
print(x)
print(y)
print(z)

Output is:

10
20
30

2. Unpacking using underscore:

Any unnecessary and unrequired values will be assigned to underscore.

Example:

x,y,_ = [10,20,30]
print(x)
print(y)
print(_)

Output is:

10
20
30

3. **Unpacking using asterisk(*):**

When the number of variables is less than the number of elements, we add the elements together as a list to the variable with an asterisk.

Example:

x,y,*z = [10,20,30,40,50]
print(x)
print(y)
print(z)

Output is:

10
20
[30, 40, 50]

We unpack using an asterisk in the case when we have a function receiving multiple arguments. And we want to call this function by passing a list containing the argument values.

def my_func(x,y,z):
    print(x,y,z)
 
my_list = [10,20,30]
 
my_func(my_list)#This throws error

The above code shall throw an error because it will consider the list ‘my_list’ as a single argument. Thus, the error thrown will be:

      4 my_list = [10,20,30]
      5 
----> 6 my_func(my_list)#This throws error

TypeError: my_func() missing 2 required positional arguments: 'y' and 'z'

To resolve the error, we shall pass my_list by unpacking with an asterisk.

def my_func(x,y,z):
    print(x,y,z)
 
my_list = [10,20,30]
 
my_func(*my_list)

Now, it shall print the output.

10 20 30

What exactly do we mean by Valueerror: too many values to unpack (expected 2)?

The error message indicates that too many values are being unpacked from a value. The above error message is displayed in either of the below two cases:

While unpacking every item from a list, not every item was assigned a variable.
While iterating over a dictionary, its keys and values are unpacked separately.

Also, Read | [Solved] ValueError: Setting an Array Element With A Sequence Easily

Valueerror: too many values to unpack (expected 2) while working with dictionaries

In python, a dictionary is a set of unordered items. Each dictionary is stored as a key-value pair. Lets us consider a dictionary here named college_data. It consists of three keys: name, age, and grade. Each key has a respective value stored against it. The values are written on the right-hand side of the colon(:),

college_data = {
      'name' : "Harry",
      'age' : 21,
      'grade' : 'A',
}

Now, to print keys and values separately, we shall try the below code snippet. Here we are trying to iterate over the dictionary values using a for loop. We want to print each key-value pair separately. Let us try to run the below code snippet.

for keys, values in college_data:
  print(keys,values)

It will throw a ‘Valueerror: too many values to unpack (expected 2)’ error on running the above code.

----> 1 for keys, values in college_data:
      2   print(keys)
      3   print(values)

ValueError: too many values to unpack (expected 2)

This happens because it does not consider keys and values are separate entities in our ‘college_data’ dictionary.

For solving this error, we use the items() function. What do items() function do? It simply returns a view object. The view object contains the key-value pairs of the college_data dictionary as tuples in a list.

for keys, values in college_data.items():
  print(keys,values)

Now, it shall now display the below output. It is printing the key and value pair.

name Harry
age 21
grade A

Valueerror: too many values to unpack (expected 2) while unpacking a list to a variable

Another example where ‘Valueerror: too many values to unpack (expected 2)’ is given below.

Example: Lets us consider a list of length four. But, there are only two variables on the left hand of the assignment operator. So, it is likely that it would show an error.

var1,var2=['Learning', 'Python', 'is', 'fun!']

The error thrown is given below.

ValueError                            Traceback (most recent call last)
<ipython-input-9-cd6a92ddaaed> in <module>()
----> 1 var1,var2=['Learning', 'Python', 'is', 'fun!']

ValueError: too many values to unpack (expected 2)

While unpacking a list into variables, the number of variables you want to unpack must be equal to the number of items in the list.

The problem can be avoided by checking the number of elements in the list and have the exact number of variables on the left-hand side. You can also unpack using an asterisk(*). Doing so will store multiple values in a single variable in the form of a list.

var1,var2, *temp=['Learning', 'Python', 'is', 'fun!']

In the above code, var1 and var2 are variables, and the temp variable is where we shall be unpacking using an asterisk. This will assign the first two strings to var1 and var2, respectively, and the rest of the elements would be stored in the temp variable as a list.

print(var1, var2, temp)

The output is:

Learning Python ['is', 'fun!']

Valueerror: too many values to unpack (expected 2) while using functions

Another example where Valueerror: too many values to unpack (expected 2) is thrown is calling functions.

Let us consider the python input() function. Input() function reads the input given by the user, converts it into a string, and assigns the value to the given variable.

Suppose if we want to input the full name of a user. The full name shall consist of first name and last name. The code for that would be:

fname, lname = input('Enter Name:')

It would list it as a valueerror because it expects two values, but whatever you would give as input, it would consider it a single string.

Enter Name:Harry Potter
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-4d08b7961d29> in <module>()
----> 1 fname, lname = input('Enter Name:')

ValueError: too many values to unpack (expected 2)

So, to solve the valuerror, we can use the split() function here. The split() method in python is returning a list of substrings from a given string. The substring is created based on the delimiter mentioned: a space(‘ ‘) by default. So, here, we have to split a string containing two subparts.

fname, lname = input('Enter Name:').split()

Now, it will not throw an error.

Summarizing the solutions:

Match the numbers of variables with the list elements
Use a loop to iterate over the elements one at a time
While separating key and value pairs in a dictionary, use items()
Store multiple values while splitting into a list instead or separate variables

FAQ’s

Q. Difference between TypeError and ValueError.

A. TypeError occurs when the type of the value passed is different from what was expecting. E.g., When a function was expecting an integer argument, but a list was passed. Whereas, ValueError occurs when the value mentioned is different than what was expected. E.g., While unpacking a list, the number of variables is less than the length of the list.

Q. What is valueerror: too many values to unpack (expected 2) for a tuple?

A. The above valuerror occurs while working with a tuple for the same reasons while working with a list. When the number of elements in the tuple is less than or greater than the number of variables for unpacking, the above error occurs.

Happy Learning!

The post [Solved] Valueerror: Too Many Values to Unpack (Expected 2) appeared first on Python Pool.

May 17, 2021 05:19 AM UTC

Mike Driscoll

PyDev of the Week: Tim Arnold

This week we welcome Tim Arnold (@jtimarnold) as our PyDev of the Week! Tim co-authored Black Hat Python, 2nd Edition: Python Programming for Hackers and Pentesters. If you’re interested in hacking or pentesting, you might want to check out that book!

You can see what else Tim is up to on his website / blog or by checking out his GitHub profile.

Tim Arnold

Let’s spend some time getting to know Tim better!

Can you tell us a little about yourself (hobbies, education, etc): for No Starch.

I really, really, like to read just about anything. Besides that, I like hacking, lock picking, and to get into the woods which has been especially important during the pandemic.

Also, I’m interested in photography, finding new work by others and practicing it myself.

Why did you start using Python?

A long time ago, I took over a project at my job that was all Python, so I had no choice.

But it didn’t take long for me to fall in love with its simplicity and expressiveness.

What other programming languages do you know and which is your favorite?

Definitely, Python is my favorite language but I also use Javascript for some node-based projects. And (this is off the wall), I really love LaTeX, a typesetting macro language.

What projects are you working on now?

My day job is building and maintaining a publishing system from end-to-end. It starts with technical source content marked up in LaTeX and ends with documentation in PDF, XML, and HTML. I created the system completely from open source software, and the core is written in Python.

Which Python libraries are your favorite (core or 3rd party)?

I think we all take the standard library for granted. It has some devilishly useful libs. The os package? indispensable!

For 3rd party, my workhorse lib is plasTeX, which converts LaTeX source documents into a document object model. And of course I really love lxml the xml/html parsing lib. It makes dealing with either xml or HTML very easy.

How did your book, Black Hat Python, come about?

It was a twisty path. Justin Seitz wrote the Black Hat Python, First Edition in 2015. I was teaching some cybersecurity classes for our local ISSA in 2018 and I did a web-based course using his book as the text.

I loved the book and got in touch with Justin and NoStarch Press and in 2019 we decided to work on a new Edition. So with the Second Edition, I refactored the code and updated it to Python 3.x. Also I included a little more code explanation, based on what I experienced in teaching with the book.

It keeps the great scenarios from Justin and I recoded the examples using some of the enhancements in Python like context managers, f-strings, the ipaddress module, and so on.

What have you learned about writing or being an author since writing the 2nd edition?

Writing the second edition was a roller coaster. Sometimes I felt in over my head and that I’d hit a wall. And sometimes it was just plain fun: knowing where I wanted to go and that I knew how to do it.

Building stuff (and writing about it) is exciting. Sometimes I think of programming as interacting with a user but separated in space and time. After writing this edition, now I think of writing as teaching but separated in space and time.

Is there anything else you’d like to say?

I’ve seen a few language-specific cultures and I am impressed with the Python community and glad to be part of it. I remember a while back on UseNet that I’d see a beginner ask a poorly formed question and think ‘uh oh, this is going to be bad…’, but the responses would be so helpful and patient. I think the community is so respectful, helpful, and inclusive. Impressive people make for an impressive culture.

Thanks for doing the interview, Tim!

The post PyDev of the Week: Tim Arnold appeared first on Mouse Vs Python.

May 17, 2021 05:05 AM UTC

Python Pool

Matplotlib 2D Histogram Plotting in Python

A histogram is commonly used to plot frequency distributions from a given dataset. Whenever we have numerical data, we use histograms to give an approximate distribution of that data. It shows how often a given value occurs in a given dataset. Matplotlib 2D Histogram is used to study the frequency variation of a given parameter with time.

We use 2D histograms when in a given dataset, the data is distributed discretely. As a result, we only want to determine where the frequency of the variable distribution is more among the dense distribution. There is a predefined function ‘matplotlib.pyplot.hist2d()’ present in python . It is present in the matplotlib library in python and is used to plot the matplotlib 2D histogram.

Matplotlib Library

Matlplotlib is a library in python which is used for data visualization and plotting graphs. It helps in making 2D plots from arrays. The plots help in understanding trends, discovering patterns, and find relationships between data. We can plot several different types of graphs. The common ones are line plots, bar plots, scatter plots and histograms.

What is a Histogram in ‘Matplotlib 2D Histogram’ ?

Histograms are frequency distribution graphs. From a continuous dataset, a histogram will tell about the underlying distribution of data. It highlights various characteristics of data such as outliers in a dataset, imbalance in data, etc. We split the data into intervals, and each interval signifies a time period. It is not the height but the area covered by the histogram, which denotes frequency. To calculate frequency, we need to multiply the width of the histogram by its height.

Parameters:

x: is a vector containing the ‘x’ co-ordinates of the graph.

y: is a vector containing the ‘y’ co-ordinates of the graph.

bins: is the number of bins/bars in the histogram.

range: is the leftmost and rightmost edge for each bin for each dimension. The values occurring outside this range will be considered as outliers.

density: is a boolean variable that is false by default, and if set to true, it returns the probability density function.

weights: is an optional parameter which is an array of values weighing each sample.

cmin is an optional scalar value that is None by default. Thus, the bins whose count is less than cmin value would not be displayed.

cmax is an optional scalar value that is None by default. The bins whose count is greater than cmax value would not be displayed.

Return values

h: A 2D array where the x values are plotted along the first dimension and y values are plotted along the second dimension.

xedges is a 1D array along the x-axis

yedges is a 1D array along the y axis

image is the plotted histogram

Example Matplotlib 2D Histogram:

Here, we shall consider a height distribution scenario, and we will construct a histogram for the same.

Let us first create a height distribution for 100 people. We shall do this by using Normal Data Distribution in NumPy. We want the average height to be 160 and the standard deviation as 10.

First, we shall import numpy library.

import numpy as np

Now, we shall generate random values using random() function.

heights = np.random.normal(160, 10, 100)

Now, we shall plot the histogram using hist() function.

plt.hist(heights)

Also Read | Numpy histogram() Function With Plotting and Examples

Understanding the hist2d() function used in matplotlib 2D histogram

The hist2d() function comes into use while plotting two-dimensional histograms. The syntax for the hist2d() function is:

def hist2d(x, y, bins=10, range=None, density=False, weights=None, cmin=None, cmax=None, *, data=None, **kwargs)

2D Histograms

Unlike a 1D histogram, a 2D histogram is formed by a counting combination of values in x and y class intervals. 2D Histogram simplifies visualizing the areas where the frequency of variables is dense. In the matplotlib library, the function hist2d() is used to plot 2D histograms. It is a graphical technique of using squares of different color ratios. Here, each square groups its number into ranges. Higher the color ratio in 2D histograms, the higher the data that falls into that bin.

Let us generate 50 values randomly.

x = np.random.standard_normal(50)
y = x + 10

Now, we shall plot using hist2d() function.

plt.hist2d(x,y)

Now, we shall try to change the bin size.

x = np.random.standard_normal(1000000)
y = 3.0 * x + 2.0 * np.random.standard_normal(1000000)
plt.hist2d(x,y,bins=50)

The output would be:

Now, we shall change the color map of the graph. The function hist2d() has parameter cmap for changing the color map of the graph.

plt.hist2d(x,y,bins=50,cmap=plt.cm.jet)

Another way to plot the 2d histogram is using hexbin. Instead of squares, a regular hexagon shape would be the plot in the axes. We use plt.hexbin() for that.

plt.hexbin(x,y,bins=50,cmap=plt.cm.jet)

The output after using hexbin() function is:

hist2d() vs hexbin() vs gaussian_kde()

hist2d() is a function used for constructing two dimensional histogram. It does it by plotting rectangular bins.

hexbin() is also a function used for constructing a two-dimensional histogram. But instead of rectangular bins, hexbin() plots hexagonal bins.

In gaussian_kde(), kde stands for kernel density estimation. It is used to estimate the probability density function for a random variable.

FAQ’s on matplotlib 2D histogram

Q. What are seaborn 2d histograms?

A. Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing statistical graphics. For example, we can plot histograms using the seaborn library.

Q. What are bins in histogram?

A. A histogram displays numerical data by grouping it into ‘bins’ of different widths. Each bin is plotted as a bar. And the area of the bar determines the frequency and density of the hat group.

Q. What is the difference between histogram and bar graph?

A. A bar graph helps in comparing different categories of data. At the same time, histogram helps in displaying the frequency of occurrence of data.

Have any doubts? Feel free to tell us in the comment section below.

Happy Learning!

The post Matplotlib 2D Histogram Plotting in Python appeared first on Python Pool.

May 17, 2021 04:19 AM UTC

May 16, 2021

Python Software Foundation

The 2021 Python Language Summit

Every year, a small group of core developers from Python implementations such as CPython, PyPy, Jython, and more come together to share information, discuss problems, and seek consensus in order to help Python continue to flourish.

The Python Language Summit features short presentations followed by group discussions. The topics can relate to the language itself, the standard library, the development process, documentation, packaging, and more! In 2021, the summit was held over two days by videoconference and was led by Mariatta Wijaya and Łukasz Langa.

If you weren't able to attend the summit, then you can still stay up to date with what's happening in the world of Python by reading blog posts about all of the talks that were given. Over the next few weeks, you'll be able to dive into all of the news from the summit so you can join in on the big conversations that are happening in the Python community.

Day 1

Welcome, Introductions, Guidelines

Łukasz Langa

PEP 654 — Exception Groups and except*

Irit Katriel, Yury Selivanov, and Guido van Rossum

Progress on Running Multiple Python Interpreters in Parallel in the Same Process

Victor Stinner and Dong-hee Na

CPython Improvements at Instagram

Dino Viehland

Making CPython Faster

Guido van Rossum

HPy — Present and Future

Antonio Cuni

Lightning Talks, Round 1

Petr Viktorin, Lorena Mesa, Scott Shawcroft, and Jeff Allen

Day 2

Welcome Back and Introductions

Łukasz Langa

Challenges Packaging Python for a Linux Distro

Matthias Klose

The Python Documentation Work Group

Mariatta Wijaya and Carol Willing

What Is the stdlib?

Brett Cannon

What Should I Work on as a Core Dev?

Eric Snow

Fuzzing and Testing Python With Properties

Zac Hatfield-Dodds

Lightning Talks, Round 2

Ronny Pfannschmidt, Pablo Galindo, Batuhan Taskaya, Luciano Ramalho, Jason R. Coombs, Mark Shannon, and Tobias Kohn

We hope you enjoy diving into what went on at the summit, and we're looking forward to seeing how the Python community continues these discussions.

May 16, 2021 11:54 PM UTC

The 2021 Python Language Summit: Progress on Running Multiple Python Interpreters in Parallel in the Same Process

Victor Stinner and Dong-hee Na gave a presentation at the 2021 Python Language Summit about running multiple Python interpreters in parallel in the same process.

Use Cases

Victory Stinner started by explaining why we would need to make the changes that they're discussing. One use case would be if you wanted to embed Python and extend the features of your application, like Vim, Blender, LibreOffice, and pybind11. Another use case is subinterpreters. For example, to handle HTTP requests, there is Apache mod_wsgi, which uses subinterpreters. There are also plugins for WeeChat, which is an IRC client written in C.

Embedding Python

One of the current issues with embedding Python is that it doesn't explicitly release memory at exit. If you use a tool to track memory leaks, such as Valgrind, then you can see a lot of memory leaks when you exit Python.

Python makes the assumption that the process is done as soon as you exit, so you wouldn't need to release memory. But that doesn't work for embedded Python because applications can survive after calling Py_Finalize(), so you have to modify Py_Finalize() to release all memory allocations done by Python. Doing that is even more important for Py_EndInterpreter(), which is used to exit the subinterpreter.

Running Multiple Interpreters in Parallel

The idea is to run one interpreter per thread and one thread per CPU, so you use as many interpreters as you have CPUs to distribute the workload. It's similar to multiprocessing use cases, such as distributing machine learning.

Why Do We Need a Single Process?

There are multiple advantages to using a single process. Not only can it be more convenient, but it can also be more efficient for some uses cases. Admin tools are designed for handling a single process rather than multiple. Some APIs don't work with cross-processes since they are designed for single processes. On Windows, creating a thread is faster than creating a process. In addition, macOS decided to ban fork(), so multiprocessing uses spawn by default and is slower.

No Shared Object

The issue with running multiple interpreters is that all CPUs have access to the same memory. There is concurrent access on the refcnt object. One way to make sure that the code is correct is to put a lock on the reference counter or use an atomic operation, but that can create a performance bottleneck. One solution would be to not share any objects between interpreters, even if they're immutable objects.

What Drawbacks Do Subinterpreters Have?

If you have a crash, like a segfault, then all subinterpreters will be killed. You need to make sure that all imported extensions support subinterpreters.

C API & Extensions

Next, Dong-hee Na shared the current status of the extension modules that support heap types, module state, and multiphase initialization. In order to support multiple subinterpreters, you need to support multiphase initialization (PEP 489), but first you need to convert static types to heap types and add module state. PEP 384 and PEP 573 support heap types, and we mostly use PyTypeFromSpec() and PyTypeFromModuleAndSpec() APIs. Dong-hee Na walked the summit attendees through an example with the _abc module extension.

Work Done So Far

Victor Stinner outlined some of the work that has already been done. They had to deal with many things to make interpreters not share objects anymore, such as free lists, singletons, slice cache, pending calls, type attribute lookup cache, interned strings, and Unicode identifiers. They also had to deal with the states of modules because there are some C APIs that directly access states, so they needed to be per interpreter rather than per module instance.

One year ago, Victor Stinner wrote a proof of concept to check if the design for subinterpreters made sense and if they're able to scale with the number of CPUs:

Work That Still Needs to Be Done

Some of the easier TODOs are:

Converting remaining extensions and static types
Making _PyArg_Parser per interpreter
Dealing with the GIL itself

Some of the more challenging TODOs are:

Removing static types from the public C API
Making None, True, and False singletons per interpreter
Getting the Python thread state (tstate) from a thread local storage (TLS)

There are some ideas for the future:

Having an API to directly share Python objects
Sharing data and use one Python object per interpreter with locks
Supporting spawning subprocesses (fork)

If you want to know more, you can play around with this yourself:

./configure --with-experimental-isolated-subinterpreters

#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS

May 16, 2021 11:53 PM UTC

"Morphex's Blogologue"

An Open Source license for scripts, small code bits and programs

I have some miscellaneous code here:

https://github.com/morphex/misc

Which hasn't been given a license yet, and I was wondering what license to give it. Just to make things clear, and make it easier to make use of the software. Using imapsync I was reminded of this.

It's been put up there for sharing and re-use obviously, so what are the best suggestions for a license? I was thinking BSD, GPL or LGPL.

Not sure whether to use GPL 2 or 3 though.

My email is morphex@gmail.com

May 16, 2021 08:20 PM UTC

An IMAP migration script

So, last December I got an email from the email hosting provider for Nidelven IT that the email server would be taken down in 6 months time.

I didn't like the timing, as I was in court process, the third one in 7 years about my kids, but understand that things are expensive to maintain, a potential security hole etc. when they age.

So I wrote a little script that pretty much would do what was necessary.

Then after some thinking, it struck me that this is something others would need to do, and it wasn't completely straightforward. So I decided I could model a script based on the process I was using.

Here's the script:

https://github.com/morphex/misc/blob/master/migrate-imap.py

I found the imapsync script:

https://github.com/imapsync/imapsync

Which can be used to do the heavy lifting. I read the license file for that project, and although I'm not a lawyer, it seems straightforward enough that I can use it for my needs. It might've been a better choice to use a known license, but whatever, it is very minimalist and straightforward in its wording.

The script just lists folders for now, then I guess it could build a shell script file which calls imapsync, and that can be inspected and executed.

I was scratching my head a bit as I was writing the script, as the print() statement printed parentheses, then I saw I was running it with python 2 and not 3.

Other than that, I wasn't able to figure out a way to parse command line options for the script using just getopt, am I missing something or is there another module?

[Update on the 13th of May]

The script is now more or less complete. Gilles also responded to an email, saying imapsync imapsync also has --justfolderlists.

I couldn't quite understand the getopt module, haven't used it much before.

[Update on the 15th of May]

I'm now using this script to run imapsync, and imapsync is chugging away, at around 5-6 messages per second.

After posting the previous update I looked over the script a few times, and spotted a print() statement too much, in the generation of the shell script. That goes to show that just looking over code is useful.

Latest commit here: https://github.com/morphex/misc/commit/bcf34c85e93237e79f1920a7184bf0f4e7f5032f

I also made SSL mandatory, it's the kind of mistake someone could make, not using SSL, and it's easy to edit the script file afterwards to remove it, if you know what you're doing.

[Update on the 16th of May]

So the build migration script script is working, and imapsync looks like a sturdy piece of software, it ran through hundreds of thousands of messages in one run. Had to add a command line flag to copy header-less messages, imapsync suggested it might be Draft messages etc. and was on about a Message-ID. A second pass copying over remaining messages was uneventful.

May 16, 2021 08:17 PM UTC

BreadcrumbsCollector

The disenchantment of Python web frameworks

tl;dr Popular Python web frameworks have less significant differences than it appears. Then, there’s Django which makes all competition look micro. Even given the rising popularity of FastAPI, I strongly believe there’s room for at least one another big framework.

Comparing Python frameworks

Back when I worked for a software house – STX Next I contributed several times to an article on their blog entitled A Beginner’s Introduction to Python Web Frameworks. The goal there was to provide a rough overview of available solutions. During work for the same company, I got hands-on experience with Django, Flask and Pyramid. In my opinion, Flask and Pyramid are quite alike. Others that I’ve been using for shorter periods of time are Sanic, aiohttp and Falcon. Currently, I’m working mostly with FastAPI and graphene.

How does the process of learning a new framework looks like? “Okay, here’s some hello world tutorial that shows how to manage routing, views, etc.”

Although the syntax may differ, for the most of the time the end result is exactly the same.

If one needs anything else, their best bet would be to type [name of framework]-[3rd party lib / missing function] into Google:

aiohttp-jwt
flask-sqlalchemy
flask-login
pyramid-openapi3

A common denominator for them is to leave design decisions up to developers – no imposed ORM or even type of databases, etc. There’s huge freedom in how to arrange code and your logic. Also, there are countless ways to do it wrong.

The most recent game-changer is having async support. I’m glad to see that new frameworks have it and existing ones (e.g. Flask or Django) are adding it. So it would be safe to assume that in a near future it will be available at hand. The other significant difference is what’s included. One hast to mention FastAPI that skillfully combines several libraries and Starlette framework to provide superior developer experience. I believe that prior to FastAPI, the only option to get such automatically updating documentation was to manually integrate several smaller libraries.

And then, there’s Django.

Django is a state of mind

Paraphrase of “Mac is a state of mind” from 2007 “Mac or PC” Rap Music Video

It has an order of magnitude more “goodies” bundled. It could be also characterised as opinionated. I don’t see it as a bad thing – but it’s a trait that cannot be ignored. To me, Django is Python’s Ruby On Rails. The most apparent similarity is their approach to persistence – both rely on the Active Record pattern. Django ORM sounds more familiar, but it’s just a reincarnation of an old pattern that says “attach save method to the model!”.

It’s so appealing, that this concept has been copied in the Python world several times – Masonite’s Orator, async-ready Tortoise ORM or peewee.

Beyond frameworks

To sum up, you’re either left to Django that makes a lot of decisions for you or a microframework where you have to figure out many things on your own. Microframeworks can provide building blocks (such as blueprint in Flask or router in FastAPI) but you can arrange them as you see fit. It’s a completely different experience from Django that has startapp CLI command.

In the longer term, what makes sense is how the project was divided (not with what means), how well it is tested or how easy it is for newcomers to grasp the overall structure. Of course, that counts if and only if the project is successful and has a chance to grow old.

My ideal framework

In my opinion, in Python’s ecosystem there’s a room for at least one more framework. It should have:

first-citizen dependency injection container (no, not like in FastAPI – not bound to views),
async support,
an ORM as powerful as SQLAlchemy (or the SQLAlchemy itself) but as an option,
testability from the ground up, starting from test client to tips on making app testable in parallel using pytest-sdist,
basic web framework stuff such as routes, views, etc,
GraphQL support or tips on integrating it would be nice,
+ other boring stuff like configuration supporting several mechanisms etc

What can you have today from this list?

As far as I know, there are no frameworks in Python that would have all these features. My closest bet is to create one’s own mix using:

injector (no async support – but there’s also Dependency Injector)
some configuration lib, which will be convenient with your deployment environment,
Starlette
SQLAlchemy (with recent advancements in supporting async it’s really promising!)
pytest + pytest-sdist + python-mockito + a few others

Summary

This post is mostly an old man’s rant, but the point is simple – don’t sweat it when choosing the next framework. Choose what seems to be the most comfortable in the mid-term (e.g. built-in features), but consider what it will take to support the solution in the long-term.

The post The disenchantment of Python web frameworks appeared first on Breadcrumbs Collector.

May 16, 2021 08:03 PM UTC

AI Pool

Visualization with Seaborn

This article will enable you to use the seaborn python package to visualize your structured data with seaborn barchart, scatter plot, histogram, line, and distplot

May 16, 2021 02:26 PM UTC

Python Software Foundation

The 2021 Python Language Summit: Welcome, Introductions, Guidelines

As attendees slowly filtered into the virtual event leading up to the official start time, they were clearly happy to see each other and have the chance to get together virtually even though PyCon US and the language summit have had to be remote for two years in a row. Although we would like to see each other in person again, one benefit of keeping the summit virtual this year was that more people were able to participate than usual.

Normally, the number of people attending would be small enough that there would be time for each person to take a moment to introduce themselves to the group. Since there were more participants than usual this year, Łukasz Langa walked us through a slide deck that had a page for each of the attendees. It was an international event, with participants attending from North America, South America, Europe, Africa, the Middle East, Asia, and Oceania.

After Łukasz finished the introductions, Ewa Jodlowska told us about the code of conduct and the procedures in place to help all participants feel welcome and have a positive experience.

With the 2021 Python Language Summit off to a good start, we took a group photo and were ready to launch into the talks!

May 16, 2021 10:17 AM UTC

The 2021 Python Language Summit: PEP 654 — Exception Groups and except*

PEP 654 was authored by Irit Katriel, Yury Selivanov, and Guido van Rossum. This PEP is currently at the draft stage. At the 2021 Python Language Summit, the authors shared what it is, why we need it, and which ideas they rejected.

What Is PEP 654?

The purpose of this PEP is to help Python users handle unrelated exceptions. Right now, if you're dealing with several unrelated exceptions, you can:

Raise one exception and throw away the others, in which case you're losing exceptions
Return a list of exceptions instead of raising them, in which case they become error codes rather than exceptions, so you can't handle them with exception-handling mechanisms
Wrap the list of exceptions in a wrapper exception and use it as a list of error codes, which still can't be handled with exception-handling mechanisms

PEP 654 proposes:

A built-in exception type that wraps other exceptions
New syntax with except* to handle exception groups

Each except* clause will be executed once, at most. Each leaf exception will be handled by one except* clause, at most.

In the discussions about the PEP that have happened so far, there were no major objections to these ideas, but there are still disagreements about how to represent an exception group. Exception groups can be nested, and each exception has its own metadata:

Originally, the authors thought that they could make exception groups iterable, but that wasn't the best option because metadata has to be preserved. Their solution was to use a .split() operation to take a condition on a leaf and copy the metadata:

Splitting exception groups with .split()

Why Do We Need Exception Groups and except*?

There are some differences between operational errors and control flow errors that you need to take into account when you're dealing with exceptions:

Operational errors vs control flow errors in Python

In Example 1, there is a clearly defined operation with a straightforward error. But in Example 2, there are concurrent tasks that could contain any number of lines of code, so you don't know what caused the KeyError. In this case, handling one KeyError could potentially be useful for logging, but it isn't helpful otherwise. But there are other exceptions that it could make more sense to handle:

It's important to understand the differences between operational errors and control flow errors, as they relate to try-except statements:

Operational errors are typically handled right where they happen and work well with try-except statements.
Control flow errors are essentially signals, and the current semantics of the try-except statement doesn't handle them adequately.

Before asyncio, it wasn't as big of a problem that there weren't advanced mechanisms to react to these kinds of control flow errors, but now it's more important that we have a better way to deal with these sorts of issues. asyncio.gather() is an unusual API because it has two entirely different operation modes controlled by one keyword argument, return_exceptions:

The problem with this API is that, if an error happens, you still wait for all of the tasks to complete. In addition, you can't use try-except to handle the exceptions but instead have to unpack the results of those tasks and manually check them, which can be cumbersome.

The solution to this problem was to implement another way of controlling concurrent tasks:

If one tasks fails, then all other tasks will be cancelled. Users of asyncio have been requesting this kind of solution, but it needed a new way of dealing with exceptions and was part of the inspiration behind PEP 654.

Which Ideas Were Rejected?

Whether or not exception groups should be iterable is still an open question. For that to work, tracebacks would need to be concatenated, with shared parts copied, which isn't very efficient. But iteration isn't usually the right approach for working with exception groups anyway. A potential compromise could be to have an iteration utility in traceback.py.

The authors considered teaching except to handle exception groups instead of adding except*, but there would be too many bakwards compatibility problems. They also thought about using an except* clause on one exception at a time. Backwards compatibility issues wouldn't apply there, but this would essentially be iteration, which wouldn't help.

May 16, 2021 10:15 AM UTC

May 15, 2021

William Minchin

Seafoam 2.5.0 Released

It’s time for a new update to Seafoam, the website theme currently in use here (on my Blog) and by my wider site.

The biggest change this update brings is the addition of period archive (i.e. daily, month, and yearly) archive pages. I’m actually not sure why they weren’t include previously, although it is possible that the feature (in Pelican) didn’t yet exist at the time this theme was first created.

This update also moves from my namespace plugins to the same plugins maintained by the larger Pelican community (see upgrading for configuration changes required on your side).

Upgrading

Upgrading should is straight forward. I haven’t broken anything on purpose since v2.0.0 came out.

To install or to upgrade, you can use pip:

pip install seafoam --upgrade

If you’re already running Pelican v4.5 (or newer) and only using namespace plugins, then the required plugins will automatically load. However, most will have to update your pelicanconf.py to point to the new plugin names:

# pelicanconf.py
PLUGINS = [
    # 'minchin.pelican.jinja_filters',  # <-- remove this line
    # 'minchin.pelican.plugins.image_process',  # <-- remove this line
    'pelican.plugins.jinja_filters',
    'pelican.plugins.image_process',
    # others, as desired...
]

To be clear, Seafoam still supports Pelican 3 (i.e. you don’t need to upgrade to Pelican 4.5 quite yet) and the latest versions of the two required plugins support back to Pelican 3 as well.

Future Plans

I’ve been working a bunch of late to update the plugins used by this blog, and it got me thinking that perhaps I could/should write a plugin to complement the theme. At a very basic level, it could used to feed the theme version into the global (Pelican) configuration so it could be included in the footer. But expanding on that idea, it could semi-automatically ensure that your Pelican site is configured as needed (plugins included, image process configured, theme selected) to speed up first setting up your site. The other place that it could be interesting is to use it for certain formatting pieces; for example, v2.4.7 was released to fix table formatting, but it did that by applying Bootstrap’s table formatting rules to all tables on the site, whereas a plugin could apply the right HTML class to only those tables within the body of articles (so if you use tables for formatting somewhere, it won’t blow up your site). Nothing has been started yet, but I’m excited by the possibilities.

Also, as I wrote previously, this theme is based on Bootstrap 3, and I’d figured I’d skip Bootstrap 4 and go straight to Bootstrap 5. Bootstrap 5 is still in alpha testing, and I haven’t done anything on this since last time, so this is likely a long way out.

Changelog

See my previous post for earlier changelog entries.

Version 2.4.7 — April 17, 2021

bug apply table formatting without requiring the .table class (as is normally required by Bootstrap)

Version 2.5.0 — May 15, 2021

feature: add stylized period archive pages
bug: fix 404 page layout issues and typos
support: upgrades from minchin.pelican.jinja-filters to pelican-jinja-filters (It’s the same plugin, just under a new name on PyPI and packaged as a namespace plugin for Pelican 4.5 or newer.)
support: upgrades from minchin.pelican.plugins.image-process to pelican-image-process (It’s the same plugin, just under a new name on PyPI and packaged as a namespace plugin for Pelican 4.5 or newer.)

May 15, 2021 09:24 PM UTC

Image Process Plugin 1.2.1 & 2.1.1 for Pelican Released

Image Process is a plugin for Pelican, a static site generator written in Python.

Image Process let you automate the processing of images based on their HTML class attributes. Use this plugin to minimize the overall page weight and to save you a trip to Gimp or Photoshop each time you include an image in your post.

Image Process is used by this blog’s theme to resize the source images so they are the correct size for thumbnails on the main index page and the larger size they are displayed at on top of the articles.

This Release

This post actually covers five releases:

v1.2.1 doesn’t add any functionality or bugfixes directly, but is designed to point users to the new v2 releases.
v1.3.0 returned the plugin to the stewardship of Whisky Echo Bravo, who wrote the first versions of this plugin. This is the first version of the plugin available on PyPI as pelican-image-process.
v2.0.0 reorganized the project codebase to make this work as a “namespace plugin”. Added by Pelican 4.5 is a feature to automatically activate such plugins. This release also fixed a bug with the crop API, and added the ability to create progressive JPEGs and to work within Atom feeds. It also transfers the code repo (and project stewardship) to the Pelican-Plugins organization.
v2.1.0 adds the ability to copy EXIF data to processed photos.
v2.1.1 lowers the minimum Pelican version to 3 (from 4.5). Under the hood, it also updates the local development infrastructure to work better on Windows.

Upgrading

to v1.2.1

To upgrade simply use pip:

pip install minchin.pelican.plugins.image-process --upgrade

If you run v1.2.1, you will get a warning message when you generate your site with Pelican encouraging you to upgrade to v2. This is mostly for those who won’t stumble upon this blog entry! That said, the plugin will continue to work as it has previously without further effort on your part.

to v1.3

I’d recommend you skip this update, at this point, and go straight to v2. There’s nothing wrong with this release, pre se, but I’m not in a position to test any installation instructions.

to v2

v1.3 introduced a different package name, so you’ll have to uninstall the old package and install the new one. Again, pip is the simplest way:

pip install pelican-image-process --upgrade
pip uninstall minchin.pelican.plugins.image-process

The new package name and file layout is to make the plugin a “namespace plugin”. Namespace plugins are actually a really cool idea that if you create your package in the right way, your “host” program can find the plugins simply by having them installed on your system! For Pelican, they need to be in the pelican.plugins namespace.

Two caveats of this approach is that you’ll need Pelican version 4.5 (or later) to automatically load these namespace plugins, and (at least if my understanding is correct) you have to either rely on namespace plugins alone OR the PLUGINS setting of your pelicanconf.py; i.e. if you specify PLUGINS in your settings, auto-loading of namespace plugins is turned off. Neither of these are deal breakers, but this background may prove useful in debugging your setup. Overall, I think namespace plugins are an awesome idea, and I hope it doesn’t take too long to get everything switched over.

So if you’re using other non-namespace plugins, or a Pelican version before 4.5, you’ll also need to update your pelicanconf.py with the new plugin name:

# pelicanconf.py

PLUGINS = [
    # others...
    # minchin.pelican.plugins.image_process  # <-- remove this line
    "pelican.plugins.image_process",
]

Finally, v2.0.0 bumps the minimum Pelican version up to 4.5; if you’re using an older version of Pelican and don’t want to upgrade yet, then use v2.1.1 of the plugin.

The new features (generating progressive JPEGs and applying to Atom feed images) are automatically enabled.

As for the change in the crop API, it’s a bugfix so the plugin behaviour should now match the documented (anticipated) behaviour; specifically crop .

to v2.1.0

Assuming you’ve done the steps listed above to upgrade to v2.0.0, pip remains the simplest way to upgrade:

pip install pelican-image-process --upgrade

To copy over EXIF data, you’ll need to set IMAGE_PROCESS_COPY_EXIF_TAGS (in your pelicanconf.py) to True. You will also need to install ExifTool. I haven’t tried it but it looks like ExifTool supports Windows, just be sure that it’s been added to your PATH.

to v2.1.1

Assuming you’ve done the steps listed above to upgrade to v2.1.0, pip remains the simplest way to upgrade:

pip install pelican-image-process --upgrade

This version lowers the minimum Pelican version 3 (which is something I needed to incrementally upgrade my site; I’m stuck at v3.7.1 for a bit yet while I upgrade some other plugins).

Thoughts on These Releases and the Future

This part is more of a personal than technical note, and continuation of my thoughts about the last Jinja Filters release.

The “ownership” of this code is even more involved that the Jinja Filters plugin. With Jinja Filters, that was code that I’d written myself, packaged, and eventually moved (at my request) to be under the Pelican Plugins organization. Here, I adopted someone else’s existing code, packaged it and used it myself, and eventually they returned from the woodwork to reclaim it (and then transferred it to the Pelican Plugins organization). On one hand, this represent the wonder of Open Source in that I resurrect a “dead” plugin; on the other, it raises an interesting question of what does ownership mean in such a landscape? Did I ever “own” this code? Was it mine to give away or surrender? I think the language fails here, and so perhaps the term “stewardship” rather than “ownership” is more helpful.

In any case, I’m excited to see that the plugin is being maintained without requiring a bunch of my personal effort and is getting features added as well. When I had assumed stewardship for maintenance, I always felt at a disadvantage because I didn’t have the deep understanding that would have come from writing the original code, so I’m happy to let someone else take that on. I’m slightly sad though because this plugin represented my most starred repo on GitHub, and was the one Pelican plugin that I’d put out to the world that I knew other people were actually using.

Moving forward, I’m not sure if every release will get a release post. I suspect the releases I’m involved in will get a post, but hopefully there will continue to be some without my involvement!

Now, only 10 more plugins to go! I want to move all the plugins I use to namespace plugins and then upgrade from Pelican 3.7.1 to 4.6 (or whatever the then-current version is). I’m a little bit closer. :)

May 15, 2021 08:46 PM UTC

Weekly Python StackOverflow Report

(cclxxv) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2021-05-15 18:54:54 GMT

May 15, 2021 06:55 PM UTC

PyCon

PyCon US 2024 and 2025 Announcement

We’re pleased to announce our location for PyCon US for 2024 and 2025: It’s Pittsburgh, Pennsylvania!

As PyCon US 2021 is taking place virtually, we’re excited to let you know about the next four years of in-person PyCon US conferences.

When we announced the cancellation of our in-person events at PyCon US 2021, and confirmed our venue for 2022 and 2023 (Salt Lake City, which we’d locked in before the pandemic even began), we heard from so many of you that they were disappointed that we’d have to miss out on Pittsburgh entirely.

We also recognize that as people are reluctant to travel long distances, we want people on the eastern side of North America to be able to look forward to another nearby PyCon with some certainty.

With that in mind, we’re so pleased you’ll get another chance to take part in PyCon US and enjoy everything Pittsburgh has to offer!

PyCon US will be held at the David L. Lawrence Convention Center in Pittsburgh on the following dates:

In 2024:

May 15-16, Tutorials
May 17-19, Conference
May 20-23, Sprints

(Note: Pentecost falls on May 19th and 20th.)

In 2025:

May 14-15, Tutorials
May 16-18, Conference
May 19-22, Sprints

If you’d like to learn more about why Pittsburgh is a great location for PyCon US, our announcement post for our original Pittsburgh plans is a great place to start.

Our colleagues at Visit Pittsburgh were great to work with before the pandemic started, and their close cooperation was instrumental in helping the Python Software Foundation come out of 2020 with a strong financial footing. We’re excited that we’re going to have the opportunity to work with them again.

From an operational perspective, in usual years, the Python Software Foundation spends hundreds of staff hours evaluating proposals and travelling to PyCon US host cities. Returning to a city where we already have a working relationship means we’re able to avoid this expense — we won’t need to do another site visit until 2023 at the earliest — and put our resources into continuing to focus on the rest of the Python ecosystem. We’re excited about the possibilities this opens up for us.

Post written by: Chris Neugebauer

May 15, 2021 01:47 PM UTC

AI Pool

Using Autoencoder to generate digits with Keras

This article contains a real-time implementation of an autoencoder which we will train and evaluate using very known public benchmark dataset called MNIST data.

May 15, 2021 12:19 PM UTC

Understanding of Support Vector Machine (SVM)

Explanation of the support vector machine algorithm, the types, how it works, and its implementation using the python programming language with the sklearn machine learning package

May 15, 2021 10:22 AM UTC

Python Pool

3 Proven Ways to Convert List to Set in Python

In this article, we will explore the list and set and conversion from the list to set in python. Before jumping to this exciting thing, first of all, let’s have an overview of the concept of lists and set.

What is List in python?

List in python is defined as the ordered collection values of different data types inside a square bracket [].

List in python

From the above image, we can have an overview of the list. Here, 0,1 and 2 are the index. Alia, Meera, and Arya are the values or elements of the list having ‘string’ data type.
But the list can be Li = [0, ‘Suman’, 0.24, True]. The length of this list will be 4, having the following indexes: – Li [0] = 0, Li [1] = ‘Suman’, Li [2] = 0.24, Li [3] = True.

What is a set in python?

Set in python is defined as the unordered collection of unique items having different data types inside a curly bracket {}. For example, Se = {1, ‘Mandeep’, True} or may be Se = {‘Sheetal’, ‘Karun’}.

Sets in python

Difference between list and set

You may be wondering if both list and set can hold items having different data types than what is the difference between them?

List	Set
The list is an Ordered Collection of elements.	Set is an Unordered Collection of elements.
Elements of the list can be modified and replaced.	Elements in a set cannot be altered, modified, or replaced.

Till now you have understood the concepts of list and set. But the question is why do we need this conversion from the list to set?
The answer is that set does not allow duplicate elements. If we use a set then the elements will be unique.

Why do we need to remove duplicate elements?

we need to remove duplicate elements because there are many instances when we don’t need duplicate values. Well, let me explain this with the help of an example. In real life whenever a user makes an entry into the database, there are high chances that he might commit a mistake while entering data. Suppose a teacher is entering the marks of students into an excel sheet, he ended up entering a student name along with the marks and roll number twice into the excel sheet. So, practically we need to write some code to not let that happen. Hence, we can convert a list into a set.

List to set conversion

You have understood why we need to convert List to Set. Now let’s explore how to convert the list to a set. There are many approaches to doing this.

1. Using set() Function

This approach is one of the simplest methods of converting a list into a set. All you need is to use the set() constructor and pass the list as an argument.

Syntax: set(list).

#create a list
my_list = ['Rice', 'Potato', 'Tomato']

#convert the list using set
se = set(my_list)

#display the set
print(se)

Explanation of the code

Created a list having few elements.
Converted the list to a set using set data structure by passing the list as a parameter.
Displayed the items in the set.

2. Using Custom Function

This approach is using a function call in python.

def list_to_set_conversion(list1):
    se = set()
    for x in list1:
        se.add(x)
    return se
Names = ['Alia', 'Bob', 'Ana', 'Sita', 'Alia']
s = list_to_set_conversion (Names)
print(s)

OUTPUT: {'Ana', 'Alia', 'Bob', 'Sita'}

3. Using dict.fromkeys()

The disadvantage of the above two approaches was that we were not getting the set in an orderly manner. So, for this, we are using our next approach to preserve the order.

list1 = ['Alia', 'Bobby', 'Bobby', 1, 1, 2, 3]
x = list(dict.fromkeys(list1))
se = set(x)
print(se)

dict.fromkeys()

Time Complexity of Converting List into a Set

Every algorithm and every code in this world has a certain space and time complexity. The time complexity of converting a list to a set is linear i.e., the number of elements present in the list. So, if the set has ‘n’ elements, the time complexity is O(n). The reason behind this is that the loop will iterate over each element present in the list that is O(n), and add this element to the set will cost O(1). Together the time complexity will be formulated as O(n) * O(1) = O(n).

Also See

Conclusion

As we all know python is a very simple language to understand because of its simplicity. Due to this the conversion of the python list becomes simpler and easier to read. It’s even simple to understand at what cost we can convert through the time complexity. The time complexity is O(n) which is minimal.

Summary

List in python is ordered collection values.
It can have any type of data.
List is mutable.
It can have duplicate elements.
The element of the list can be accessed by its index.
Sets are an unordered collection of elements.
Sets are immutable.
It cannot have duplicate elements.
Set has a highly optimized method for checking whether an element is contained in the list.
Set is based on the Hash table data structure.
Elements in sets cannot be accessed by its index.
A set() method is used to convert the list into a set by simply passing the list as the parameter.
The time complexity for the conversion is O(n) using a loop.

QNA

Let’s go through some questions to make our learning fun and interactive. We can add a small exercise after every topic so that learners can perform them to gain their confidence.

Predict the output of the code?

def list_to_set_conversion(list1):
   se = set()
    for x in list1:
se.add(x)
    return se
Names = ['Tina', 'Kimmi', 'Chanda', 'Sita', 'Alia', 'Chanda', 'Tina']
s = list_to_set_conversion (Names)
print(s)

Ans: –

2. Complete the missing part of the code so that it displays the following as the correct output.

{‘Abhishek’, ‘Ramesh’, ‘Mohan’, ‘John’, ‘Riya’}

names = ['Mohan', 'Abhishek', 'Ramesh', 'Mohan', 'John', 'Riya']
s =?
print(s)

Ans:-

3.Find the length of the given code snippet.

names = ['Mohan', 'Abhishek', 'Ramesh', 'Mohan', 'John', 'Riya', 'John']
print(len(names))

Ans: –

4. What is the time complexity of the following code snippet?

def list_to_set_conversion(list1):
    se = set()
    for x in list1:
        se.add(x)
    return se
Names = ['Arunima', 'Bobita', 'Annam', 'Sita', 'Alia', 'Annam', 'Alia']
s = list_to_set_conversion (Names)
print(s)

Ans: –

5. Find the length of the given code snippet.

names = {'Mohan', 'Abhishek', 'Ramesh', 'Mohan', 'John', 'Riya', 'John'}
print(len(names))

Ans: –

The post 3 Proven Ways to Convert List to Set in Python appeared first on Python Pool.

May 15, 2021 05:46 AM UTC

Brett Cannon

Unravelling the `pass` statement

This is the next post in my series on Python&aposs syntactic sugar. It&aposs unfortunately been a while since my last post due to Python 3.10 and PyCon US 2021 taking up a lot of my time. But with those no longer being a distraction, I can get back into a rhythm of doing posts, so I&aposm going to do probably the easiest bit of Python syntax that I can unravel: pass.

The definition for pass is extremely simple:

pass is a null operation — when it is executed, nothing happens.

That means pass does as much as "pass" or 42 does on their own: absolutely nothing but take up a line of syntax. You can look at the bytecode and see it does absolutely nothing (loading and returning None is automatically appended to all functions, so you can ignore that bit):

>>> import dis
>>> def spam(): pass
... 
>>> dis.dis(spam)
  1           0 LOAD_CONST               0 (None)
              2 RETURN_VALUE

Disassembly of pass

The reason pass even exists is so that you can syntactically signal that something is purposefully empty versus accidentally leaving something out. For instance, in the following if statement:

if x:
    pass
else:
    "pass"

Example of if statement is do-nothing branches

In the first block you know for certain that nothing is supposed to be there. But in that second block using "pass", you can&apost be certain that the line was actually supposed to be y = "pass" or any other valid use of the "pass" string.

The use of pass is also uniquely Python due to our lack of curly braces. In most languages, if you wanted an empty block, you would just have { /* This space intentially left blank. */ } . But since we don&apost have curly braces ( from __future__ import curly_braces makes that abundantly clear 😉) we need some other way to make an empty block work, and that way is pass.

While it&aposs easy to replace pass syntactically since it&aposs purely a syntactic construct, it does provide a simple yet useful bit of semantics for developers.

May 15, 2021 04:56 AM UTC

Planet Python

May 17, 2021

Why use Replit on Chrome OS

Why Replit is a good Python environment

What Is Embedded Development?

Python for Embedded Development

Benefits of Using Python

Disadvantages of Using Python

Things to Watch Out for When New to Embedded Development

Who can participate?

How talk voting works

Talk Selection

About the program

Development environment

Resources and dependencies

NASA TV feed

Python

Libraries

Design

Image representation and storage

The StillFrame class

Still frame download and processing

The image download loop

The event loop

GUI layout

Open issues

Possible improvements

Introduction

Why CSV file format is used?

Ways to load CSV file in python

Reading of a CSV file with numpy in python

1.Without using any built-in library

Syntax:-

Parameter

Return value

2. Using numpy.loadtxt() function

Syntax:

3. Using numpy.genfromtxt() function

Syntax:

Parameter

Return Value

4. Using CSV module in python

5. Use a Pandas dataframe in python

6. Using PySpark in Python

Syntax:-

Parameters

Conclusion

FAQs

Introduction to Valueerror: too many values to unpack (expected 2)

What is Unpacking?

1. Unpacking using tuple and list:

2. Unpacking using underscore:

3. Unpacking using asterisk(*):

What exactly do we mean by Valueerror: too many values to unpack (expected 2)?

Valueerror: too many values to unpack (expected 2) while working with dictionaries

Valueerror: too many values to unpack (expected 2) while unpacking a list to a variable

Valueerror: too many values to unpack (expected 2) while using functions

Summarizing the solutions:

FAQ’s

Q. Difference between TypeError and ValueError.

Q. What is valueerror: too many values to unpack (expected 2) for a tuple?

Matplotlib Library

What is a Histogram in ‘Matplotlib 2D Histogram’ ?

Parameters:

Return values

Example Matplotlib 2D Histogram:

Understanding the hist2d() function used in matplotlib 2D histogram

2D Histograms

hist2d() vs hexbin() vs gaussian_kde()

FAQ’s on matplotlib 2D histogram

May 16, 2021

Day 1

Day 2

Use Cases

Embedding Python

Running Multiple Interpreters in Parallel

Why Do We Need a Single Process?

No Shared Object

What Drawbacks Do Subinterpreters Have?

C API & Extensions

3. **Unpacking using asterisk(*):**