How in Microsoft SQL Server to retrieve data from Google Analytics using R

In this article I want to show in detail how you can use R in Microsoft SQL Server to implement data retrieval from Google Analytics (and generally from any API).







Acknowledgments:







Since I’ve never been a marketer, I needed the help of a specialist. The test room and access to Google Analytics (GA) was organized by Alexei Seleznev , and also gave practical advice.

He is a professional analyst in marketing. And as a gratitude for the help, Aleksey’s telegram channel is mentioned here, where he conducts his activity.







The task - we have an MS SQL server and we want to receive data in DWH by API



To connect to Google Analytics (GA) we will use the googleAnalyticsR package.







This package is selected, for example, due to its popularity. You can use another package, for example: RGoogleAnalytic .

Approaches to solving the problem will be the same.







Install R on MS SQL Server



this is done through the standard interface for installing MS SQL components.

























  1. This is R that SQL Server will work directly with (called in SQL queries).
  2. A client copy of R can be used with it from RStudio without fear of breaking something on the database server.


We agree with the license and pay attention that not ordinary R will be installed, but Microsoft R Open













In a nutshell, what is it:

Microsoft takes R Open ennobles it with its packages and also distributes it for free.

Accordingly, packages of this version of R are available for download not in CRAN but in MRAN .







But that is not all. In fact, when installing MS SQL, we get not a pure MRAN, but something more - Microsoft ML Server .







For us, this means that there will be additional packages in the set of R libraries - RevoScaleR .







RevoScaleR is designed to process big data and build machine learning models on significant datasets.







This information should be kept in mind because there is a high probability of questions related to different versions of R. packages.







After installing the components, we get the default interface for interacting with R from Microsoft.













This console is not the most convenient that you can use, so immediately download and install the free version of RStudio .







We configure SQL server to work with R



In SSMS we execute the following scripts:







We allow to execute scripts on the SQL server







sp_configure 'external scripts enabled', 1; RECONFIGURE WITH OVERRIDE;
      
      





Restart Server SQL







Make sure that the scripts R scripts are executed







 EXECUTE sp_execute_external_script @language =N'R', @script=N'print(version)';
      
      





Find the location of the R packages that are used by the SQL server







 declare @Rscript nvarchar(max) set @Rscript = N' InstaledLibrary <- library() InstaledLibrary <- as.data.frame(InstaledLibrary$results ) OutputDataSet <- InstaledLibrary ' EXECUTE sp_execute_external_script @language = N'R' , @script = @Rscript WITH RESULT SETS (([Package] varchar(255) NOT NULL, [LibPath] varchar(255) NOT NULL, [Title] varchar(255) NOT NULL));
      
      











In my case, the path to R MS SQL packages:

C: / Program Files / Microsoft SQL Server / MSSQL14.MSSQLSERVER / R_SERVICES / library







We start RStudio.







It is possible that several versions of R will be installed on the computer and you need to make sure that we are working with a version of SQL server.



















Settings will be applied after restarting RStudio.







Install the googleAnalyticsR package



In RStudio team







 library()
      
      





find out the path to the package library of the client version of R (with which RStudio works)













In my case, this way:

C: / Program Files / Microsoft SQL Server / 140 / R_SERVER / library







Install the googleAnalyticsR package through RStudio



















Here there is an unobvious nuance:

You can’t just write something to the MS SQL system folders. Packages will be saved in a temporary directory as ZIP archives.













In Explorer, go to the temporary folder and unzip all the packages.













Unzipped packages must be copied to the R Services library directory (which the MS SQL server works with).







In my example, this is a folder

C: / Program Files / Microsoft SQL Server / MSSQL14.MSSQLSERVER / R_SERVICES / library







Also, unzipped packages must be copied to the client version of R (which RStudio works with)







In my example, this is a folder

C: / Program Files / Microsoft SQL Server / 140 / R_SERVER / library







(we learned these paths from previously executed scripts)







Before copying to the R Services folder, it is better to save a copy of the library folder, as practice shows, there are different cases and it is better to be able to return to existing packages.







When copying, replace all available packages.







To consolidate the acquired skill, repeat the exercise.

Only now we do not install packages, but update all available ones.

(to connect to GA, this is not necessary, but it is better to have the latest versions of all packages)







In RStudio we check for new packages













Packages will be downloaded to a temporary folder.

We perform the same actions with them as when installing new packages.







Checking MS SQL Internet Access



 declare @Rscript nvarchar(max) set @Rscript = N' library(httr) HEAD("https://www.yandex.ru", verbose()) ' EXECUTE sp_execute_external_script @language = N'R' , @script = @Rscript
      
      





Since SQL Server does not have Internet access by default, most likely your previous script will cause the following error.













We open Internet access for R scripts from SQL.







SQL 2017













SQL2019













In SSMS







 --      create database Demo go use Demo go --  ,       Google Analytics create schema GA go --        GA drop table if exists [GA].[token] create table [GA].[token]( [id] varchar(200) not null, [value] varbinary(max) constraint unique_id unique (id))
      
      





Get the Google Analytics Token



In RStudio, we execute the following code:

At the same time, an authentication window in Google services will open in the browser, you will need to log in and give permission to access Google Analytics.







 #       Sys.setenv(TZ="Europe/Berlin") library(googleAnalyticsR) #   ga_auth() PathTokenFile <- paste ( getwd(),"/", ".httr-oauth" , sep="") TokenFile <- readBin(PathTokenFile, "raw", file.info(PathTokenFile)$size) #     conStr <- 'Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Uid=Sa;Pwd=SaSql123' ds <- RxOdbcData(table="ga.token", connectionString=conStr) #     rxWriteObject(ds, "ga_TokenFile", TokenFile)
      
      





In SSMS, make sure that the token from Google is received and recorded in the database







 Select * from [GA].[token]
      
      





Check connection to GA via RStudio



 #    RStudio Sys.setenv(TZ="Europe/Berlin") library(googleAnalyticsR) #     conStr <- 'Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Uid=Sa;Pwd=SaSql123' #      ds <- RxOdbcData(table="ga.token", connectionString=conStr) PathTokenFile <- paste ( getwd(),"/", ".httr-oauth" , sep="") TokenFile <- rxReadObject(ds, "ga_TokenFile") write.filename = file(PathTokenFile, "wb") writeBin(TokenFile, write.filename) close(write.filename) Sys.setenv("GA_AUTH_FILE" = PathTokenFile) #   ga_auth() #  ga_id account_list <- ga_account_list() ga_id <- account_list$viewId #    OutputDataSet <-google_analytics(ga_id, start="2019-01-01", end="2019-08-01", metrics = "sessions", dimensions = "date") OutputDataSet
      
      





If all goes well, add the R script to SQL and execute the query.







 drop table if exists #GA_session create table #GA_session ( [date] date, [sessions] int ) declare @Rscript nvarchar(max) set @Rscript = N' #    RStudio Sys.setenv(TZ="Europe/Berlin") library(googleAnalyticsR) #     conStr <- ''Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Uid=Sa;Pwd=SaSql123'' #      ds <- RxOdbcData(table="ga.token", connectionString=conStr) PathTokenFile <- paste ( getwd(),"/", ".httr-oauth" , sep="") TokenFile <- rxReadObject(ds, "ga_TokenFile") write.filename = file(PathTokenFile, "wb") writeBin(TokenFile, write.filename) close(write.filename) Sys.setenv("GA_AUTH_FILE" = PathTokenFile) #   ga_auth() #  ga_id account_list <- ga_account_list() ga_id <- account_list$viewId #    OutputDataSet <-google_analytics(ga_id, start="2019-01-01", end="2019-08-01", metrics = "sessions", dimensions = "date") OutputDataSet$date <- as.character(OutputDataSet$date) ' -- print @Rscript insert into #GA_session ([date],[sessions]) EXECUTE sp_execute_external_script @language = N'R', @script = @Rscript Select * from #GA_session order by [date] asc
      
      





Please note that the script uses Login and Password - this is not very good.

Therefore, we change the connection string to Windows authentication.







 conStr <- ''Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Trusted_Connection=true'' #   <<<===     –     
      
      





After changing the authentication method, it will be necessary to add the access to the database to the calling R.













(Of course, it’s better to use user groups, as part of the demo I simplified the solution)







We execute the SQL query as a procedure







 Create procedure Ga.Get_session @Date_start date ='2019-01-01', @Date_End date ='2019-08-01' as drop table if exists #GA_session create table #GA_session ( [date] date, [sessions] int ) declare @Rscript nvarchar(max) set @Rscript =CONCAT( N' #    RStudio Sys.setenv(TZ="Europe/Berlin") library(googleAnalyticsR) #     conStr <- ''Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Trusted_Connection=true'' #   <<<===     -    ds <- RxOdbcData(table="ga.token", connectionString=conStr) PathTokenFile <- paste ( getwd(),"/", ".httr-oauth" , sep="") TokenFile <- rxReadObject(ds, "ga_TokenFile") write.filename = file(PathTokenFile, "wb") writeBin(TokenFile, write.filename) close(write.filename) Sys.setenv("GA_AUTH_FILE" = PathTokenFile) #   ga_auth() #  ga_id account_list <- ga_account_list() ga_id <- account_list$viewId #    OutputDataSet <-google_analytics(ga_id, start="' , @Date_start ,N'", end="' , @Date_End ,N'", metrics = "sessions", dimensions = "date") OutputDataSet$date <- as.character(OutputDataSet$date) ' ) -- print @Rscript insert into #GA_session ([date],[sessions]) EXECUTE sp_execute_external_script @language = N'R', @script = @Rscript Select * from #GA_session order by [date] asc
      
      





Check the operation of the procedure







 --    exec Ga.Get_session --      exec Ga.Get_session @Date_start ='2019-08-01', @Date_End ='2019-09-01'
      
      





R script is not complicated; it can always be copied to R Studio. Modify and save in SQL procedure.

For example, I only changed the dimensions parameter and can already load the landingPage by date.







 Create procedure [GA].[Get_landingPage_session] @Date_start date ='2019-01-01', @Date_End date ='2019-08-01' as drop table if exists #GA_session create table #GA_session ( [date] date, landingPagePath nvarchar(max), [sessions] int ) declare @Rscript nvarchar(max) set @Rscript =CONCAT( N' #    RStudio Sys.setenv(TZ="Europe/Berlin") library(googleAnalyticsR) #     conStr <- ''Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Trusted_Connection=true'' #   <<<===     -    ds <- RxOdbcData(table="ga.token", connectionString=conStr) PathTokenFile <- paste ( getwd(),"/", ".httr-oauth" , sep="") TokenFile <- rxReadObject(ds, "ga_TokenFile") write.filename = file(PathTokenFile, "wb") writeBin(TokenFile, write.filename) close(write.filename) Sys.setenv("GA_AUTH_FILE" = PathTokenFile) #   ga_auth() #  ga_id account_list <- ga_account_list() ga_id <- account_list$viewId #    OutputDataSet <-google_analytics(ga_id, start="' , @Date_start ,N'", end="' , @Date_End ,N'", metrics = "sessions", dimensions = c("date" ,"landingPagePath")) OutputDataSet$date <- as.character(OutputDataSet$date) ' ) -- print @Rscript insert into #GA_session ([date],landingPagePath,[sessions]) EXECUTE sp_execute_external_script @language = N'R', @script = @Rscript Select * from #GA_session order by [date] asc
      
      





are checking







 exec [GA].[Get_landingPage_session]
      
      





In principle, everything is ready.







I would like to note that using R via SQL, you can get data from any API

For example: getting the exchange rate







 -- https://www.cbr-xml-daily.ru Declare @script nvarchar(max) set @script = N' encoding = "utf-8" Sys.setlocale("LC_CTYPE", "russian") Sys.setenv(TZ="Europe/Berlin") library(httr) url <- "https://www.cbr-xml-daily.ru/daily_json.js" resp <- GET(url) library(jsonlite) Response <- fromJSON(content(resp, as = "text")) OutputDataSet <- data.frame(matrix(unlist(Response$Valute$USD), nrow=1, byrow=T),stringsAsFactors=FALSE) OutputDataSet <- rbind(OutputDataSet,data.frame(matrix(unlist(Response$Valute$EUR), nrow=1, byrow=T),stringsAsFactors=FALSE)) ' EXEC sp_execute_external_script @language = N'R' , @script = @script with result SETS UNDEFINED
      
      





or getting data from the first API you get, some farms in Australia ...







 -- https://dev.socrata.com/ Declare @script nvarchar(max) set @script = N' library(httr) url <- "https://data.ct.gov/resource/y6p2-px98.json?category=Fruit&item=Peaches" resp <- GET(url) library(jsonlite) Response <- fromJSON(content(resp, as = "text")) OutputDataSet <- as.data.frame(Response) OutputDataSet <- OutputDataSet [, c("category" , "item" , "farmer_id" , "zipcode" , "business" , "l" , "location_1_location", "location_1_city" , "location_1_state" , "farm_name", "phone1" , "website", "suite")] ' EXEC sp_execute_external_script @language = N'R' , @script = @script with result SETS UNDEFINED
      
      





Total:





Backup database MS SQL 2017 with all the code is available here

(for playback, you need to install packages, give out rights, specify the name of your server)








All Articles