python|snowflake 数据库_Snowflake数据分析教程

snowflake 数据库
目录 (Table of Contents)

  • Introduction
    介绍
  • Creating a Snowflake Datasource
    创建雪花数据源
  • Querying Your Datasource
    查询数据源
  • Analyzing Your Data and Adding Visualizations
    分析数据并添加可视化
  • Using Drilldowns on Your Visualizations
    在可视化上使用明细
  • Using Search-Based Analytics to Query Your Data
    使用基于搜索的分析来查询数据
  • Summary
    摘要
介绍 (Introduction) Snowflake is a purpose-built SQL cloud data platform that has grown at a nearly unprecedented rate since launching in 2014; Okta Inc.’s 2020 Businesses @ Work report found that Snowflake was the world’s fastest-growing app. Snowflake’s growth is easy to reconcile given their unmatched flexibility, top-of-the-line security, automatic scaling of storage, and seamless integration with various BI tools.
Snowflake是一个专用SQL云数据平台,自2014年推出以来,其增长速度几乎达到前所未有的水平; Okta Inc.的2020年Businesss @ Work报告发现,雪花是世界上增长最快的应用程序。 凭借其无与伦比的灵活性,顶级安全性,存储的自动扩展以及与各种BI工具的无缝集成,Snowflake的增长易于协调。
Among the BI tools that offer Snowflake integration, only one is fully native to Snowflake with support for nested objects and arrays: Knowi. This allows users to simultaneously analyze their data while reaping the benefits of Snowflake’s scaling and security. If you’d like to learn more about using Knowi to analyze your Snowflake data, this tutorial is for you.
在提供Snowflake集成的BI工具中,只有一个完全是Snowflake原生的,支持嵌套对象和数组: Knowi 。 这使用户可以同时分析其数据,同时充分利用Snowflake的扩展性和安全性。 如果您想了解更多有关使用Knowi分析雪花数据的信息,本教程非常适合您。
创建雪花数据源 (Creating a Snowflake Datasource) Once you’ve set up your free Knowi trial account and logged in, follow these steps:
设置免费的Knowi试用帐户并登录后,请按照以下步骤操作:
1. Locate and click on “Data sources” on the panel on the left side of your screen.
1.在屏幕左侧的面板上找到并单击“数据源”。
2. Scroll down to “Data Warehouses” and click on Snowflake.
2.向下滚动到“数据仓库”,然后单击Snowflake。
3. Right now, the default Schema Name is set to TPC-DS which contains data on products, orders, and customers. We’re going to change the schema to TPC-H, which contains data on decision support systems. In order to do this, change Schema Name from TPCDS_SF100TCL to TPCH_SF1 and click “Test Connection.”
3.现在,默认的架构名称设置为TPC-DS,其中包含有关产品,订单和客户的数据。 我们将模式更改为TPC-H,其中包含有关决策支持系统的数据。 为此,请将Schema Name从TPCDS_SF100TCL更改为TPCH_SF1,然后单击“测试连接”。
4. In a few moments, Knowi should tell you that your connection was successful; click “Save” once it does.
4.稍后,Knowi应该告诉您您的连接已成功; 单击“保存”。
Congratulations on setting up your first Snowflake datasource!
祝贺您建立了第一个Snowflake数据源!
查询数据源 (Querying Your Datasource) Now that you’ve created a datasource, you can run queries on your data by following these steps:
现在,您已经创建了数据源,可以按照以下步骤对数据运行查询:
1. As soon as you saved your datasource, you should’ve received a “Datasource Added. Configure Queries.” alert at the top of your page. Click on the word Queries. (You can also just go back to the panel on the left side of your screen, go right below “Data Sources,” click on “Queries,” and select “New Query +” from the top right.)
1.保存数据源后,您应该立即收到“添加的数据源”。 配置查询。” 在页面顶部发出警报。 单击查询一词。 (您也可以返回屏幕左侧的面板,转到“数据源”下方,单击“查询”,然后从右上角选择“新建查询+”。)
2. Name your report inside the “Report Name*” bar on the very top left of your screen. The query that we’re using here will be closely modeled off of the default functional query that is provided in the TPC-H schema, so let’s name this one “Functional Query.”
2.在屏幕左上角的“报告名称*”栏中命名报告。 我们将在此处使用的查询与TPC-H模式中提供的默认功能查询紧密相似,因此,我们将其命名为“功能查询”。
3. This default query schema lists the totals and averages for extended price, discounted extended price, and discount extended price plus tax, as well as total charge and a count of the number of line items, and groups this data by return flag and line status. In order to enter this query, head over to “Snowflake Query” in your Query Builder and enter the following syntax:
3.此默认查询架构列出了扩展价格,扩展的折扣价,扩展的折扣价和税费的总计和平均值,以及总费用和行项目数的计数,并按返回标志和行将数据分组状态。 为了输入此查询,请转到查询生成器中的“雪花查询”,然后输入以下语法:
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1-l_discount)) as sum_disc_price,
sum(l_extendedprice * (1-l_discount) * (1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= dateadd(day, -90, to_date('1998-12-01'))
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;

4. Before you run this query, we want to add one more column that concatenates return flag and line status. This will make it much easier to visualize our data. In order to do this post-processing with Cloud9QL — Knowi’s powerful SQL style language — enter the following syntax into “Cloud9QL Query:”
4.在运行此查询之前,我们要再添加一列以将返回标志和行状态连接在一起。 这将使可视化我们的数据变得更加容易。 为了使用Cloud9QL(Knowi的强大SQL样式语言)进行此后处理,请在“ Cloud9QL Query:”中输入以下语法:
select concat(l_returnflag, " - ",l_linestatus) as Flag - Status, *

5. Head to the bottom of your screen and click the blue “Preview” button. This should return four rows and eleven columns worth of data. Once you’ve confirmed that it does, click the green “Save & Run Now” button on the bottom right corner of your screen.
5.转到屏幕底部,然后单击蓝色的“预览”按钮。 这将返回四行和十一列的数据。 确认确实如此后,请点击屏幕右下角的绿色“立即保存并立即运行”按钮。
When you ran your query, Knowi automatically saved your results as a virtual dataset and stored those results as a dataset in its elastic data warehouse. Every time you successfully run a query, Knowi does this.
运行查询时,Knowi自动将结果保存为虚拟数据集,并将这些结果作为数据集存储在其弹性数据仓库中 。 每次您成功运行查询时,Knowi都会这样做。
分析数据并添加可视化 (Analyzing Your Data and Adding Visualizations) As you saw in the data preview, there are 4 different combinations for return flag and line status: A-F, N-F, N-O, and R-F. Let’s say we want to visualize various metrics, such as the total sum of order quantity, grouped by these separate flag — status combinations. Knowi allows us to efficiently visualize this in just a few steps:
如您在数据预览中所看到的,返回标志和线路状态有4种不同的组合:AF,NF,NO和RF。 假设我们要可视化各种指标,例如订单数量的总和,并按这些单独的标记-状态组合进行分组。 Knowi使我们能够仅需几个步骤就可以有效地实现可视化:
1. Return to the top of the panel on the left side of your screen and click on “Dashboards.” Click the orange plus icon and name your dashboard “TPC-H Visualizations.”
1.返回屏幕左侧面板的顶部,然后单击“仪表板”。 单击橙色加号图标,然后将仪表板命名为“ TPC-H Visualizations”。
2. Go just below “Dashboards” on your panel and click “Widgets.” Drag your “Functional Query” widget onto your new dashboard.
2.转到面板上“仪表板”下方,然后单击“窗口小部件”。 将“功能查询”窗口小部件拖动到新的仪表板上。
3. Right now, your widget is just a data grid containing the results from our query. We want to keep this data grid, but to add a more compelling visualization that more quickly conveys a message. Head over to the top right corner of your widget, click on the 3 dot icon and click on “Analyze.”
3.现在,您的小部件只是一个数据网格,其中包含我们查询的结果。 我们希望保留此数据网格,但要添加更具吸引力的可视化效果,以更快地传达消息。 转到小部件的右上角,单击3点图标,然后单击“分析”。
4. Drag the “Flag — Status” and “Sum_QTY” bars over from the left side of your screen into the “Fields/Metrics:” box. This shows us the total quantity for each combination. Now, head to the top of your screen and click on “Visualization.” Scroll over to “Visualization Type” in the top left corner of your screen and change this from “Data Grid” to “Donut.”
4.将“标记-状态”和“ Sum_QTY”栏从屏幕左侧拖到“字段/指标:”框中。 这向我们显示了每种组合的总量。 现在,转到屏幕顶部,然后单击“可视化”。 滚动到屏幕左上角的“可视化类型”,然后将其从“数据网格”更改为“甜甜圈”。
5. This donut chart already rather clearly conveys that roughly one quarter of the total quantity falls into each of R-F and A-F, roughly falls into N-F, and roughly half falls into N-O, but it may still help to add value labels and percentages to the chart. To do this, scroll down to “Options” and click on it, then check the boxes underneath “Display as Percent” and “Label-Value.”
5.这个甜甜圈图已经很清楚地表明,大约总量的四分之一落入RF和AF中,大约落入NF中,大约一半落入NO,但仍可能有助于在标签中添加价值标签和百分比。图表。 为此,向下滚动到“选项”并单击它,然后选中“显示为百分比”和“标签值”下方的框。
6. Now, head back to the top right corner of your screen and click the “Clone” icon that looks like two small pieces of paper. When you do this, you’ll be asked to name your cloned widget; name it “Total Quantity by Flag — Status.”
6.现在,回到屏幕的右上角,然后单击看起来像两张小纸片的“克隆”图标。 执行此操作时,系统将要求您命名克隆的小部件。 将其命名为“按标志总计数量-状态”。
7. Click “Clone” and then click “Add to Dashboard” to add your new widget to your dashboard.
7.单击“克隆”,然后单击“添加到仪表板”,将新的小部件添加到仪表板。
There is a clear conclusion that comes from this visualization: roughly one quarter of the total quantity falls into each of R-F and A-F, roughly one half falls into N-O, and roughly one percent falls into N-F.
该可视化有一个明确的结论:RF和AF分别占总数的四分之一,NO占大约一半,NF则占大约百分之一。
在可视化上使用明细 (Using Drilldowns on Your Visualization) Drilldowns add an interactive component to your visualizations that allows you to dive into a filtered section of your data with just one click. Follow this process to add a drilldown:
向下钻取为您的可视化添加了一个交互式组件,使您只需单击一下即可进入数据的过滤部分。 请按照以下过程添加明细:
1. Click on the 3 dot icon in the top right corner of the “Total Quantity by Flag — Status” widget that you just created, then scroll down and click on “Drilldowns.”
1.单击刚刚创建的“按标志总计数量-状态”小部件右上角的3点图标,然后向下滚动并单击“向下钻取”。
2. Set “Widget” as your drilldown type, set it to drill into “Functional Query” when “SUM_QTY” is clicked. Then add an optional drilldown filter that sets “SUM_QTY” equal to “SUM_QTY” and click “SAVE” and “Close.”
2.将“窗口小部件”设置为钻取类型,单击“ SUM_QTY”后将其设置为钻取到“功能查询”。 然后添加一个可选的向下钻取过滤器,将“ SUM_QTY”设置为等于“ SUM_QTY”,然后单击“保存”和“关闭”。
3. Test your drilldown by clicking on N-O, the Flag — Status combination with the largest share of the total quantity. This shows you all of the raw data for that combination. To return to your original visualization, go to the top right corner of your widget and click on the left arrow icon in the middle.
3.单击“否”,即“标志—状态”组合,占总数的最大份额,以测试您的向下钻取。 这将显示该组合的所有原始数据。 要返回原始的可视化效果,请转到小部件的右上角,然后单击中间的左箭头图标。
使用基于搜索的分析来查询数据 (Using Search-Based Analytics to Query Your Data) Now that you’ve set up a dashboard and become familiar with creating visualizations in Knowi, the next step is to start querying your data using search-based analytics. This feature makes your dashboard accessible to all English speakers — even those who aren’t data savvy or familiar with Knowi. Here’s how to query your data with search-based analytics:
既然您已经设置了仪表板并熟悉在Knowi中创建可视化文件,那么下一步就是开始使用基于搜索的分析来查询数据。 此功能使所有讲英语的人都可以访问您的仪表板,即使是那些不懂数据或不熟悉Knowi的人也可以使用。 以下是使用基于搜索的分析查询数据的方法:
1. Head to the top right corner of your original “Functional Query” widget and click on the 3 dot icon. Scroll down and click on “Analyze.”
1.转到原始“功能查询”窗口小部件的右上角,然后单击3点图标。 向下滚动并单击“分析”。
2. Let’s say that you don’t care about the return flag for any of this information; you just want to group your data by line status, and look at the total amount that has been charged for those with a line status of F and O. Just head to the search bar at the top of your screen that currently says “Ask a question of your data” and type “total sum charge by line status.”
2.假设您不关心任何此类信息的返回标志; 您只想按行状态对数据进行分组,并查看对行状态为F和O的用户收取的总金额。只需转到屏幕顶部当前显示“请问一个您的数据问题”,然后输入“按行状态收取的总费用”。
3. Now, to visualize this data, we’re going to follow the exact same process that we did before. Return to “Visualization” and set the visualization type to “Donut.” As you did before, scroll down and click on “Options,” then check the boxes underneath “Display as Percent” and “Label — Value.”
3.现在,为了可视化此数据,我们将遵循与之前完全相同的过程。 返回“可视化”并将可视化类型设置为“甜甜圈”。 与以前一样,向下滚动并单击“选项”,然后选中“显示为百分比”和“标签-值”下方的框。
4. Return to the top right and click on the “Clone” icon again. Name this widget “Charge by Status” and add it to your dashboard.
4.返回右上角,然后再次单击“克隆”图标。 将此小部件命名为“按状态收费”并将其添加到您的仪表板。
This visualization clearly conveys that orders with a status of F and orders with a status of O both share roughly 50% of the total charge.
该可视化清楚地表明,状态为F的订单和状态为O的订单均约占总费用的50%。
摘要 (Summary) To review, we started off by connecting to a sample Snowflake Datasource and ran a functional query on it. The results of this query were stored as a dataset in Knowi’s elastic data warehouse. Afterwards, we asked a question about our data, created a visualization that clearly answered it, and added drilldowns to our visualization that let the user dive deeper into the data behind the visualization. Finally, we asked our data another question in plain English using search-based analytics, and created another visualization which conveyed our answer.
为了进行回顾,我们首先连接到样本Snowflake数据源,然后在其上运行功能查询。 该查询的结果作为数据集存储在Knowi的弹性数据仓库中。 之后,我们问了一个关于数据的问题,创建了一个清晰地回答了它的可视化文件,并在可视化文件中添加了向下钻取,使用户可以更深入地了解可视化文件背后的数据。 最后,我们使用基于搜索的分析以简单的英语问了我们的数据另一个问题,并创建了另一个可视化的图像,传达了我们的答案。
翻译自: https://towardsdatascience.com/snowflake-data-analytics-tutorial-c9d4dd9b06d
【python|snowflake 数据库_Snowflake数据分析教程】snowflake 数据库

    推荐阅读