仓库源文站点原文


layout: post title: "ML4T笔记 | 01-05 Incomplete data" date: "2019-01-12 01:12:12" categories: 计算机科学 auth: conge

tags: Machine_Learning Trading OMSCS

1 - Introduction

Time: 00:00:27

2 - Pristine data

image.png

what people think financial data is like.

The reality is that our data is an amalgamation created from many, many sources.

Time: 00:01:56

3 - Why data goes missing

SPY

JAVA

what to do about NAN

another two example.

FAKE1: it didn't exist before a certain time so the FAKE1 data is going to have NAN values before a certain date.

FAKE2: it didn't exist before this date, data was absent in between these two dates, and so on. data like this exists for thinly traded stocks.

Time: 00:04:04

4 - Why this is bad - what can we do

what do we do in situations where we don't have data between two separate dates?

How to treat missing data at the beginning?

Time: 00:03:12

5 - Pandas fillna quiz

Pandas fillna() function to fill the missing data.

Find the documentation of this function, on pandas Documentation site.

find the DataFrame.fillna() function, read and try to understand different options and how to call this fillna() function, and answer

How would you call fillna() to fill forward missing values?

answer: fillna(method='ffill')

Documentation: pandas

Documentation: pandas.DataFrame.fillna()

You could also use the 'pad' method, same as 'ffill': fillna(method='pad')

6 - Using fillna()

So, let's do some coding.

after reading the csv into the data frame and some plotting.

So now let's go and plot this data and see what turns up.

So, here is the graph.

a single statement to fill those gaps: df_data.fillna(method='ffill', inplace = TRUE)

ffill

Time: 00:01:20

7 - Fill missing values quiz

fillna quiz

fill these gaps using the fillna() method and yes, it can work for multiple stocks or in that case, multiple columns of the data frame simultaneously.

Documentation: pandas.DataFrame.fillna()

Instructions:

Time: 00:00:34

solution:

Time: 00:00:30

Total Time: 00:13:18

2019-01-12 初稿