Data Science Using R ‘Value Added’ (Quiz 1 And Assignment 1)

Data Science Using R

Assignment

Write a R program to create a sequence of numbers from 20 to 50 and find the sum
and mean of numbers from 20 to 50

Solution :

``````print("Sequence of numbers from 20 to 50:")
print(seq(20,50))
print("Mean of numbers from 20 to 50:")
print(mean(20:50))
print("Sum of numbers from 20 to 50:")
print(sum(20:50))``````

Write a R program to create a Data frame having details of 5 employees. write a
command to retrieve data from 2,3,4 row from employee data frame

Solution :

``````Employees=data.frame(
Name=c("Employee 1","Employee 2","Employee 3", "Employee 4","Employee 5"),
Gender=c("M","M","F","F","M"),
Age=c(20,28,23,29,32))
print("Details of the employees:")
print(Employees[2:4,])``````

Write command to install and load a package in R. Briefly explain package Tidyverse

Solution :

``````Tidyverse is a one stop shop for data science and data analysis.The purpose of Tidyverse is to provide key data transformation functions in a single package. This way you don't have to keep installing packages every time you want to add a new transformation to data. The packages in the tidyverse share a common philosophy of data and R programming, and are designed to work together naturally. Tidyverse contains ggplot2, tibble, tidyr, readr, purrr, and dplyr packages.

#install
install.packages("tidyverse")

library(tidyverse)``````

Explain with examples various data structures in R

Solution :

``````Data structures are very important to understand because these are the objects you will manipulate on a day-to-day basis in R. Dealing with object conversions is one of the most common sources of frustration for beginners.

R has 6 basic data types:
Vectors:
In R, a sequence of elements which share the same data type is known as vector. A vector supports logical, integer, double, character, complex, or raw data type.

Code Example : a<-c(1,2,3,4,5,6)

Lists:
Lists are the objects of R which contain elements of different types such as number, vectors, string and another list inside it. It can also contain a function or a matrix as its elements. A list is a data structure which has components of mixed data types.

Code Example : list1<- list("Sam", "Green", c(8,2,67), TRUE, 51.99, 11.78,FALSE)

Arrays: In R, an array is created with the help of the array() function. This array() function takes a vector as an input and to create an array it uses vectors values in the dim parameter.

Code Example :
vec1<-c(1,2,3,4,5,6)

vec2<-c(7,8,9,10,11,12)
a1<-array(c(vec1,vec2),dim=c(2,3,2))

Matrices:A matrix is created with the help of the vector input to the matrix function. On R matrices, we can perform addition, subtraction, multiplication, and division operation.

Code Example :
matrix1<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =3, byrow = TRUE)

Data Frames: A data frame is a two-dimensional array-like structure or a table in which a column contains values of one variable, and rows contains one set of values from each column. A data frame is a special case of the list in which each component has equal length.

Code Example : empid<-c(1:4)
empname<-c("Sam","Rob","Max","John")
empdept<-c("Sales","Marketing","HR","R & D")
emp.data<-data.frame(empid,empname,empdept)

Factors: Factors are used in data analysis for statistical modeling. They are used to categorize unique values in columns, such as “Male”, “Female”, “TRUE”, “FALSE”, etc., and store them as levels. They can store both strings and integers. They are useful in columns that have a limited number of unique values.

Code Example :
data<-c("Male","Female","Male","Child","Child","Male","Female","Female")
print(data)
factor.data<-factor(data)
print(factor.data)
``````

What is the recycling of elements in a vector? Give an example

Solution :

``````In R Programming language, recycling of elements is when we perform operations on two different vectors having different lengths. In it, the elements of the shorter length vector are used to complete the operation.
Code Example :
INPUT
val1<-c(4,1,0,6)
val2<-c(2,4)
print(val1*val2)
OUTPUT
[1] 8 4 0 24``````

Data Science Using R

Quiz

1. Select all tools required for building a data science project*

1. Transform
2. Visualize
3. statistics
4. tidy
5. analysis

2. CRAN is a*

1. R Community
2. a central software repository
3. a library
4. R package

3. define the class of vector X<-c(1, “a”, TRUE)*

1. character
2. numeric
3. logical
4. factor

4. consider a vector X<-c(“jan”,”feb”,”march”,”april”,”may”) and a vector Y<-X[c(TRUE,FALSE,TRUE,FALSE, FALSE)]. What will will the output of print(Y).*

1. [1] “jan” “march” “may”
2. [1] “jan” “march”
3. [1] “feb” “april” “may”
4. [1] “march” “april “may”

5. what will the output of X<-V1*V2 where V1<-c(1,2,3,4) and V2<-c(1,2)*

1. [1] 1 4 3 8
2. [1] 2 4 4 6
3. error in code
4. [1] 0 0 2 2

6. consider a list, list1<- list(“Sam”, “Green”, c(8,2,67), TRUE, 51.99, 11.78,FALSE). what will be the output of list1[[3]][1]*

1. [1] “Green”
2. [1] 8 2 67
3. ERROR
4. [1] 8

7. Which of the following is a non-homogeneous data structure in R*

1. vector
2. list
3. matrix
4. array

8. consider a data frame named emp.data (below), what will be the output of emp.data[2:3,2]*

1. 2 2 Rob Marketing
2. [1] “Marketing” “HR”
3. [1] “Rob” “Max”
4. 3 3 Max HR

9. consider emp.data data frame as in above question. how can we add a new column/variable to the existing data frame*

1. emp.data%>%salary<-c(30000,20000,40000,50000)
2. emp.data\$salary<-c(30000,20000,40000)
3. emp.data\$salary<-c(30000,20000,40000,50000)
4. emp.data&salary<-c(30000,20000,40000,50000)

10. __________ used to categorize unique values in columns called “levels”*

1. Factor
2. Data frame
3. Matrix
4. List

11. An array is a ___________ data structure in R*

1. two-dimensional
2. one- dimenstional
3. multi-dimensional
4. none of the above

12. what will be the output of code:*

1. [1] 1 [1] 1.414214 [1] 1.732051 [1] 2 [1] 2.236068
2. [1] 1 [1] 2 [1] 3 [1] 4 [1] 5
3. [1] 1.732051 [1] 1.414214 [1] 1.732051
4. 1 4 6 16 25

13. command to a load a package to current R environment is:*

1. library(package_name)
2. install.packages(“package_name”)
3. installed.packages(“package_name)

14. Select all packages included in “tidyverse” package*

1. dplyr
2. purrr
4. ggplot2
5. MySQL
6. tibble
7. tidyr

15. consider M1 <- matrix(c(1:9), nrow = 3, ncol =3, byrow= TRUE), what will be the output of M1[2,]*

1. [1] 4 5 6
2. [1] 1 2 3
3. [1] 7 8 9
4. [1] 2 4 6