The Federal Energy Regulatory Commission (FERC) requires oil and natural gas pipelines to submit annual (and in some cases quarterly) revenue and cost data related to their operations in the Form 6 (oil pipelines) and Form 2 (natural gas pipelines). Historic data is made publicly available (at the links provided above) in FoxPro databases. Given that this is not the easiest format to work with the data, I have written a small script in Python which reads the database and converts it to flat files using the Python package DBF. I have also included a short messy script that reads the data into R. It is possible to note use Python at all, and read the files directly into R using the package RODBC. This package only works with 32-bit R and requires you to download the Visual FoxPro driver from Windows, which is difficult to find. The below program is set-up to read in the Form 6 data, although the Form 2 data is in an identical format.
Python Script
################################################################ # Converting FoxPro Database to Flat Files ################################################################ import csv, os, re from dbfread import DBF targetFolder = "f6_2015" inputFolder = r"[inputPath]" % targetFolder outputFolder = r"[outputPath]\%s" % targetFolder # Function to generate contents of folder def findDBF( folder = inputFolder ): return [re.sub("\.DBF", "", f) for f in os.listdir( folder ) if re.search("DBF", f)] # Function to create folder if it doesn't exist def createFolder( folder ): if not os.path.exists(folder): os.makedirs(folder) # Function to convert DBF to CSV def myConverter(fileName = "F6_CNTRL_OVER_RESP", inputPath = inputFolder, outputPath = outputFolder): # Open the DBF file print(fileName) table = DBF(r"%s\%s.DBF" % (inputPath, fileName), encoding = "cp1252", char_decode_errors='replace' ) # Write to CSV file with open(r"%s\%s.csv" % (outputPath, fileName), "w", newline = '', encoding="utf8") as outFile: writer = csv.writer(outFile) rc = writer.writerow(table.field_names) for record in table: rc = writer.writerow( list( record.values() ) ) # Function to convert all records def convertFolder( folder = targetFolder ): # Create the lowest subdirectory for r, d, f in os.walk(inputFolder): if d: inputFolder = inputFolder + "\%s" % d[0] # Create folder if it doesn't exist createFolder( outputFolder ) # Loop through files and convert for myFile in [f for f in findDBF( folder = inputFolder ) if re.search("NOTE", f) == None]: myConverter( fileName = myFile, inputPath = inputFolder, outputPath = outputFolder )
R Script
################################################################# # Reading Form 6 Data ################################################################# if (!require(dplyr)) { install.packages("dplyr") library(dplyr) } if (!require(sqldf)) { install.packages("sqldf") library(sqldf) } if (!require(stringr)) { install.packages("stringr") library(stringr) } if (!require(reshape2)) { install.packages("reshape2") library(reshape2) } options(stringsAsFactors = FALSE) ################################################################# # Import Function ################################################################# # Datasets of Interset toKeep <- tolower( c("F6_COMP_BAL_SHEET", "F6_INCOME_STMNT", "F6_STATS_OPER_A", "F6_CARRIER_PROPERTY", "F6_UNDIV_JOINT_INT_PROP", "F6_ACC_DEPR_CARRIER", "F6_CARRIER_PROPERTY", "F6_UNDIV_JOINT_INT_PROP", "F6_ACC_DEPR_CARRIER", "F6_ACC_DEPR_SYSTEM", "F6_OP_REV_ACCTS", "F6_OP_EXP_ACCTS", "F6_ANNUAL_COST_SERV") ) # Base Folder Path year <- 2015 filepath <- paste("C:/Users/Ponde001/Desktop/Econmic Research/Oil & Gas/Rate Index/Data/f6_", year, sep = "") ####### Import Relevant Files setwd(filepath) F6_S0_ROW_LITERALS <- read.csv("F6_S0_ROW_LITERALS.csv", sep = ",", header = TRUE) %>% #filter(str_trim(table_name) == toupper(j)) %>% dplyr::select(row_number, report_year, row_literal) F6_S0_RESPONDENT_ID <- read.csv("F6_S0_RESPONDENT_ID.csv", sep = ",") %>% dplyr::select(respondent_id, respondent_name) for (j in toKeep){ X <- X %>% distinct(respondent_id, report_year, row_number, report_prd) %>% left_join(F6_S0_RESPONDENT_ID) %>% left_join(F6_S0_ROW_LITERALS) assign(paste(j,"_",year ,sep=""), X) }