Sharing Session: UCSI University Postgraduate Programs
March 14, 2022Beginners Can Also Extract Data Using Pentaho Data Integration!
March 22, 2022TANGERANG – R is one of the programming languages that data practitioners must understand. Many large companies are using R to help speed up statistical data processing. To help fulfil the competencies desired by companies, DQLab conducted a session dedicated to introducing R through the Tetris Program virtually on Monday (14/03/22). DQLab invited Erika Siregar, the Co-Founder of R-Ladies Jakarta, to provide materials introducing R, R Studio, and how to code with R.
R is a programming language and software that allows for statistical data processing. But, R is not only used for statistics but other various data science life cycles, such as data acquisition, data exploration, data manipulation, data visualization, machine learning, dashboard creation, etc.
Erika shared that to prepare yourself before starting a career in data science, learning R can be one of the first steps. Because, apart from being free and open-sourced, R can be used in various computer systems like Windows, Linux and MacOs. In addition, R also supports data visualization, especially when using ggplot and Shiny. Another advantage is that it is easy to understand. Another advantage is that R is one of the most popular languages used by statisticians and academics. Due to that, many R communities are available to help you when facing difficulties or learning R.
Other than R, understanding RStudio is also important. Erika gave the simplest explanation differentiating the two, “R is like the machine, and RStudio is the interface. RStudio makes communicating with R easier, allowing statistical and data science functions to work effortlessly. It’s important to install R first before RStudio.” Erika also gave directions in opening and using R Studio for beginners.
Also read 3 Programming Languages You Must Know and Learn
Before learning to write the R code, Erika briefly explained the do’s and don’ts when assigning the variable’s name. She explained that the variable’s name is case-sensitive. Here are some of the rules important to note:
- It cannot start with numbers or symbols,
- It must not be separated by spaces. If the name is more than one word, symbols like “-”, “_”, or “.” can be used.
- When assigning values to variables, use operators “<-” or “=”. Do not use symbols like “^”, “!”, “$”, “@”, “+”, “=”, “/”, “%”, or “*:”.
- Only numbers, text, objects, formulas, etc., are valid variables.
- The first step to write code is to create a new R file by clicking “File”, then “New File”, and clicking the “R Script”. The tab “First_code R” tab will appear.
Erika introduced the “R Package,” a collection of the R function that contains codes and data samples. R Package is used to enrich the R functionality when processing data. She also showed steps to install, load, uninstall and unload packages. There are several packages to install, such as readxl to get data from excel, RMySQL to get data from MySQL, mongolite to get data from MongoDB, jsonlite to get data from JSON, googlesheets4 to get data from google sheets, haven and foreign to get data from SPSS, SAS and STATA.
“The packages that I think are very important are dplyr for data manipulation, ggplot2 for data visualization, tidyverse as universe packages and Shiny for creating dashboards. These four are also my personal favorite,” Erika shared.
Erika said that there are two types of data: single (atomic) and non-single data. A single data type contains character, numeric, categorical, logical (boolean), integer, date, complex and raw. Meanwhile, non-single data types consist of vector, list, factor and table. The way to get data in R is internally and externally. If it’s internally, you can use data already embedded in R Studio or create your data. If it’s externally, you can go through the packages mentioned earlier.
Erika’s explanation in this session shows that apart from Python, R is also an important programming language to learn by data practitioners. Start learning R by knowing the basics and practicing at DQLab.id.
by Lathifa Lisa – DQLab
Kuliah di Jakarta untuk jurusan program studi Informatika| Sistem Informasi | Teknik Komputer | Teknik Elektro | Teknik Fisika | Akuntansi | Manajemen| Komunikasi Strategis | Jurnalistik | Desain Komunikasi Visual | Film dan Animasi | Arsitektur | D3 Perhotelan | International Program, di Universitas Multimedia Nusantara. www.umn.ac.id