Mengran's Blog

「We spent our lives making a living」

Python Data Science: Decision Tree

Algorithm in decision tree

Decision Tree Decision Tree: It will go through two stages: Constructing(构造) and Pruning(剪枝). Constructing 构造 构造就是生成一棵完整的决策树。 构造的过程就是选择什么属性作为节点的过程。 在构造过程中,会存在三种节点: 根节点:就是树的最顶端,最开始的那个节点。在上图中,...

Data Analysis: MySQL Basics - Common Table Expressions

Basics of Common Table Expressions

Common Table Expression or CTE A common table expression is a named temporary result set that exists only within the execution scope of a single SQL statement e.g.,SELECT, INSERT, UPDATE, or DELET...

Data Analysis: MySQL Basics - Subqueries

Basics of Sub queries

Subqueries Basic clauses: Subquery – show you how to nest a query (inner query) within another query (outer query) and use the result of the inner query for the outer query. Derived table – i...

Data Analysis: MySQL Basics - Grouping Data

Basics of Grouping data

Grouping Data Basic Clauses: GROUP BY – show you how to group rows into groups based on columns or expressions. HAVING – filter the groups by a specific condition. ROLLUP – generate multip...

Data Analysis: MySQL Basics - Joining Tables

Basics of using joining tables

Joining Tables Basic statements of Joining Tables: Table & Column Aliases – introduce you to table and column aliases. Joins – give you an overview of joins supported in MySQL including ...

Data Analysis: MySQL Basics - Querying, Sorting & Filtering Data

To Lean basic MySQL statements

MySQL Basics More details can be found here: MySQL Tutorial SQL is case-insensitive, you can write the SQL statement in lowercase, uppercase, etc. Example: 1 2 select select_list from table_na...

Python Data Science: Data Cleaning

Key principles of data cleaning

Data Cleaning Data cleaning is the process of ensuring that your data is correct, consistent and usable. Principles of Data Cleaning 数据清洗规则总结为以下 4 个关键点,统一起来叫“完全合一”: 完整性:单条数据是否存在空值,统计的字段是否完善。...

Python Data Science: How to automatically collect data?

How to collect data sources?

Data Collection Multiple Data Sources 一个数据的走势,是由多个维度影响的。 我们需要通过多源的数据采集,收集到尽可能多的数据维度,同时保证数据的质量,这样才能得到高质量的数据挖掘结果。 Four types of data sources 四种数据源 Open Data Sources 开放数据源: 一般是针对行业的数据库,来自政府机构或企...

Python Data Science: User Profiling in Data Analysis

How to do user profiling and user segmentation

User Profiling User Profiling Modeling: First Step: Unifying users 统一化 Second Step: Segmentation 标签化 Third Step: Business Operation 业务化 1.Unifying Users 用户唯一标识是整个用户画像的核心。 我们以一个 App 为例,它...

Python Data Science: Basic Concepts of Data Analysis

Learn the concepts of BI, DW, DM

BI, DW & DM Three Key Concepts: BI, Business Intelligence 商业智能: 利用数据预测用户购物行为 DW, Data Warehouse 数据仓库: 积累的顾客的消费行为习惯会存储在数据仓库 DM, Data Mining 数据挖掘: 通过对个体进行消费行为分析总结出来的规律 BI - Business Int...