Stock Price Data Analysis - 2

Let’s start by loading the data into memory first. The data is per minute stock prices for the SNP500 bucket for the 10 days prior to 17 Dec. This means we have 390 x 10 so about 3900 rows per stock. This, however, is not absolutely true since some data has holes in it - i.e. some stocks are missing some minute values. This needs to be taken care of after loading in the data.

Interpretation of PCA Results

Say we already have the results from the last post. Before going into the minute tickers data, let’s take a look at how PCA works and why we need it. PCA stands for Principle Components Analysis. It is one of the methods for reducing the dimensions in a dataset. It works by working out in which dimensions does the data vary most, and realigning the data along those axes, hence reducing the dimensions.

Stock Price Data Analysis - 1

Introduction I recently was able to get my hands on raw stock prices data (open, high, low, close and volume) for the Standard And Poor’s 500 index (SNP500). The data was for each stock in the SNP500 bucket (actually 504 stocks since some of the stocks are listed twice for different types of shares issued) at two types of intervals: Daily (from each market open date since Jan 1 to Dec 15 of 2016) Per minute for each of the 390 minutes the market is open for the past 10 days since 17 Dec ‘16.