Data Science Using R Assignment
Question 1: Make a basic histogram using the age data from the Titanic data set using ggplot.
Link for downloading dataset Titanic: Click Here For Data
>dt1 <- "https://github.com/datasciencedojo/datasets/blob/master/titanic.csv" >data <- read.csv(dt1) >names(data) OUTPUT  "PassengerId" "survived"  "Pclass" "Name"  "sex" "Age"  "sibsp" "Parch"  "Ticket" "Fare"  "cabin" "Embarked" >qplot( + data$Age, + geom = "histogram", + binwidth=25, + colour=I('blue') + )
Question 2: Use data set Cereals.csv available at : Click Here For Data
a) Read in the dataset, print first five rows.
>dt2<-"https://gist.githubusercontent.com/lisawilliams/a91ffcea96ac3af9500bbf6b92f1408e/raw/728e9b2e4fb0da2baa34e2da2a9d732d74b484ab/cereal.csv" >impdt<-read.csv(dt2) >names(impdt) OUTPUT "Cereal.Name" "Manufacturer" "Type" "Calories" "Protein..g." "Fat" "Sodium" "Dietary.Fiber" "Carbs" "Sugars" "Display.Shelf" "Potassium" "Vitamins.and.Minerals" "Serving.Size.Weight" "Cups.per.Serving" >head(dta,5)
b) Add a new variable/column to the dataset called ’totalcarb’, which is the sum of ’carbo’ and ’sugars’. Recall Section
>dtb2=dta$Carbs + dta$Sugars >dtb2
>dta$totalCarbs <- dtb2 >head(dta,g)
c) How many unique manufacturers are included in the dataset
Data Science Using R Assignment
Question 3: The ‘cars’ data set gives the speed of cars and the distances taken to stop.Note that the data were recorded in the 1920s. Plot the ‘cars’ data set as a scatter plot.
library(ggplot2) ggplot(cars, mapping = aes(x = speed, y = dist)) + geom_point()
Question 4 :Write commands to connect MySQL database with R.
Commands to connect MySQL database with R. R requires RMySQL package to create a connection object while calling the function. dbConnect() is the function used to create a connection object in R. Syntax: dbConnect(drv, user, password, dbname, host) Where drv: represents database drivers, User: represents username, Password: represents password value assigned to database server, dbname: represents name of the database and host: represents host name Example: install.packages(“RMySQL”) library(“RMySQL”) mysqlconn = dbConnect(MySQL(), user = 'root', password = 'welcome', dbname='mydb',host='localhost')
a) Create a table student having student ID as primary key, studentname, class, marks, percentage.
mysqlconn = dbConnect(MySQL(), user = 'root', password = 'welcome', dbname = 'mydb', host ='localhost') dbSendQuery(mysqlconn, 'CREATE TABLE student(Id INTEGER PRIMARY KEY, Student Name VARCHAR(20), class INT, marks INT, percentage INT)')
b) Write a command to Insert data of 5 students in table student
mysqlconn = dbConnect(MySQL(), user = 'root', password = 'welcome', dbname = 'mydb',host = 'localhost') dbSendQuery(mysqlconn,"INSERT INTO student VALUES(1,'St1',12)") dbSendQuery(mysqlconn,"INSERT INTO student VALUES(2,'St2',12)") dbSendQuery(mysqlconn,"INSERT INTO student VALUES(3,'St3',12)") dbSendQuery(mysqlconn,"INSERT INTO student VALUES(4,'St4',12)") dbSendQuery(mysqlconn,"INSERT INTO student VALUES(5,'St5',12)")
c) Write a command to retrieve all data from student table
query= “SELECT * FROM student”; rs=dbSendQuery(mysqlconn, query);
Question 5: Explain the steps of reading and writing into a CSV file.
Steps of reading and writing into a CSV file:
Reading a CSV file
The contents of a csv file can be read as a data frame in R using the read.csv(…) function. The csv file to be read should be either present in the current working directory or the directory should be set accordingly using the setwd(…) command in R.
Writing into a CSV file
The contents of the data frame can be written into a CSV file. The CSV file is stored in the current working directory with the name specified in the function write.csv(data frame, output CSV name) in R.
More on Data Science Using R : Click Here
This content is uploaded for study, general information, and reference purpose only.