2018-03-31|2018-03-31 开胃学习Data系列 - 基础知识1 2018-03-31开胃学习Data系列-基础

.csv 处理这里学习通过一个.csv文件进行基本的迭代，来创建字典和收集汇总统计。
不过总的用一个词来描述这个csv方法就是 tedious

import csv %precision 2#设置列印的浮点数据精度为2。 ? with open('mpg.csv') as csvfile: mpg = list(csv.DictReader(csvfile))# 使用csv.DictReader读取我们的mpg.csv 并将其转换为列表的字典。mpg[:1] # The first dictionaries in our list.#输出如下 >>> [OrderedDict([('', '1'), ('manufacturer', 'audi'), ('model', 'a4'), ('displ', '1.8'), ('year', '1999'), ('cyl', '4'), ('trans', 'auto(l5)'), ('drv', 'f'), ('cty', '18'), ('hwy', '29'), ('fl', 'p'), ('class', 'compact')])]

len(mpg) # 有234 个字典key mpg[233] 是最后一个 mpg[233].keys() >>> 234

【2018-03-31|2018-03-31 开胃学习Data系列 - 基础知识1】keys gives us the column names of our csv.

mpg[233].keys() odict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])

下面这个是如何找到每个城市的平均mpg，以及每个hwy的平均mpg：
因为字典里的内容都是string，所以需要转化成float才可以计算

sum(float(d['cty']) for d in mpg) / len(mpg) sum(float(d['hwy']) for d in mpg) / len(mpg)

现在尝试返回数据组中所有的汽缸的数据值:

cylinders = set(d['cyl'] for d in mpg) cylinders >>> {'4', '5', '6', '8'}

这里用气缸的数量来分组汽车，并找出每个组的平均mpg。

CtyMpgByCyl = [] ?# 创建一个listfor c in cylinders:# 循环这个汽缸的list summpg = 0 cyltypecount = 0 for d in mpg:# 迭代所有的字典元素 if d['cyl'] == c:# 如果找到了当下循环的汽缸值 summpg += float(d['cty'])# 把cty的mpg累加 cyltypecount += 1# increment the count CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg') ? CtyMpgByCyl.sort(key=lambda x: x[0]) CtyMpgByCyl [('4', 21.01), ('5', 20.50), ('6', 16.22), ('8', 12.57)]

其他变量分类的例子：

vehicleclass = set(d['class'] for d in mpg) # what are the class types vehicleclass >>> {'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}#average hwy mpg for each class of vehicleHwyMpgByClass = []for t in vehicleclass: # iterate over all the vehicle classes summpg = 0 vclasscount = 0 for d in mpg: # iterate over all dictionaries if d['class'] == t: # if the cylinder amount type matches, summpg += float(d['hwy']) # add the hwy mpg vclasscount += 1 # increment the count HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg')

HwyMpgByClass.sort(key=lambda x: x[1]) HwyMpgByClass HwyMpgByClass = []for t in vehicleclass: # iterate over all the vehicle classes summpg = 0 vclasscount = 0 for d in mpg: # iterate over all dictionaries if d['class'] == t: # if the cylinder amount type matches, summpg += float(d['hwy']) # add the hwy mpg vclasscount += 1 # increment the count HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg') ? HwyMpgByClass.sort(key=lambda x: x[1]) HwyMpgByClass #Output below [('pickup', 16.88), ('suv', 18.13), ('minivan', 22.36), ('2seater', 24.80), ('midsize', 27.29), ('subcompact', 28.14), ('compact', 28.30)]

time 和 datetime 前提：Python中的一些基本的知识：
应该意识到该日期和时间的存储有许多不同的方式。
用于存储日期最常用的传统方法之一，时间在网上系统是基于从纪元epoch 的偏移量offset。这个epoch是1970年1月1日。
所以如果看到很大的数字，而希望看到日期和时间，需要转换它们，使数据变得有意义。

import datetime as dt import time as tmtime returns the current time in seconds since the Epoch. (January 1st, 1970)tm.time() >>> 1523682711.76dtnow = dt.datetime.fromtimestamp(tm.time()) #Convert the timestamp to datetime. dtnow >>>datetime.datetime(2018, 4, 14, 4, 51, 12, 996246)#更方便的写法 dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second # get year, month, day, etc.from a datetime (2018, 4, 14, 4, 51, 12)

timedelta 是两个时间之间的差值，可以用来计算前后的时间，在datetime 包里
datetime 返回的是今天的日期

delta = dt.timedelta(days = 100) # create a timedelta of 100 days delta >>> datetime.timedelta(100)today = dt.date.today() # 返回一百天前的日期 today - delta datetime.date(2018, 1, 4)#比较日期 today > today-delta True