Master of Communications UMN Holds Webinar Discussing Pop Culture and Digital Brands
April 20, 2022Let’s Get to Know R, One of the Tools in Data Science
April 20, 2022Presentation Material of Data Quality with Python for Beginners (Doc. DQLab)
Tangerang – Continuing the discussion of Python for beginners, DQLab and Kominfo provided materials on Data Quality with Python for Beginners. In this material, DQLab and Kominfo aim to introduce beginners to the basic concepts of data quality using Python, which is an essential part of the pre-analysis process. So, with complete trust, DQLab and Kominfo invited speaker Shella Theresya Pandiangan, a Data Scientist at United Tractor. This live session was held online on the 4th of April 2022.
Shella started the discussion by explaining the definition of Data quality. Data quality measures the data’s condition based on factors of accuracy, completeness, consistency, reliability, and up-to-date. Measuring the quality level of data can help identify data errors that need to be resolved and assess the data in the IT system to determine whether or not the data is appropriate to meet the intended purpose.
“The key to good data is data accuracy. So before processing data or modeling, you should understand the data they have,” Shella said.
Also read Semua Jurusan Bisa Belajar Python untuk Jadi Praktisi Data Pemula yang Profesional
Data quality is in the process before feature engineering, namely data pre-processing and Exploratory Data Analysis (EDA). As previously explained, EDA is a statistical approach that aims to find and summarize a dataset and find out the structure and relationships between variables in the dataset. This process includes data cleansing and data profiling.
Data cleansing is identifying pieces of data that are incorrect, incomplete, inaccurate, irrelevant, or missing. The data is then modified, replaced, or deleted as needed. When you want to implement data quality, several things must be considered: missing values, data duplication, anomalies and outliers, data types, data type correction, and feature extraction.
“Before cleaning, you should know the data operation,” Shella said.
There are selection, filtering, addition, deletion, rename, and sorting in data operations. Selection is the selection of data relevant to the analysis to be received from the existing data collection. Filtering is used to filter data based on specific criteria. The addition is used to add data to a column or row. Deletion means to remove data from a column. Rename is used to rename data. Finally, sorting is used to sort the data.
In the following discussion, Shella explained the basic functions of Pandas. First, there is the “head” function to display some of the top data. The opposite of the head function is the “tail” function, used to display some of the lowest data. Furthermore, the “describe” function is used to display information on the count, mean, standard deviation, minimum, maximum, and number of percentiles. Usually, the “describe” function is used for numeric data. This continues to the data frame that can be used to display statistical functions in the form of mean or min.
To see data quality, Data Engineering must check for missing values by applying “.isnull()” if the data is NaN and “.notnull()” if the data does not contain NaN. .isnull() means checking for missing values, while .notnull() means checking for missing values. However, both have the same goal: to check the missing value.
“.isnull() and .notnull() are the opposite,” Shella said.
This material is the final discussion in this session. During this session, Shella also showed in real-time how to apply the things she discussed. To become a data practitioner, you need programming language skills, one of which is Python. To start, you can learn python for beginners first only in DQLab.
*by Agnes Nurlisa | DQLab
Kuliah di Jakarta untuk jurusan program studi Informatika| Sistem Informasi | Teknik Komputer | Teknik Elektro | Teknik Fisika | Akuntansi | Manajemen| Komunikasi Strategis | Jurnalistik | Desain Komunikasi Visual | Film dan Animasi | Arsitektur | D3 Perhotelan | International Program, di Universitas Multimedia Nusantara. www.umn.ac.id