Notes: Session 1: The rise of analytics

Traditional analytics courses and practice often emphasize descriptive and diagnostic analysis — for example through OLAP, dashboards, segmentation, and KPI monitoring — while giving comparatively limited attention to causal identification, counterfactual reasoning, and elasticity estimation. This can lead to a situation where organizations become very good at observing patterns but less equipped to determine whether an intervention caused an observed outcome or how demand, engagement, or participation would respond to changes in price, ranking, incentives, or platform rules.

Contrast this with platform companies:

In contrast, large platform companies often develop a more nuanced understanding of causality and elasticity because their business models depend on designing and evaluating interventions at scale. Pricing, ranking algorithms, recommender systems, incentive schemes, matching rules, and user-interface changes all require evidence about causal effects and behavioral responsiveness. As a result, experimentation, quasi-experimental designs, uplift modeling, and elasticity estimation become central to analytics practice rather than peripheral methodological concerns.

Preparation

Prepare/bring tags for groups
Prepare a Teams session for students to share their work

Lecture

Note: Orga: 20 min

Time (min)	Duration	Topic
20-35	15	More data
35-50	15	More computing power
50-65	15	New algorithms
65-80	15	New analytics processes

Objective

In this session, our goal is to explain why modern data analytics is successful, with reference to and examples from the four areas.

Focus on algorithms (enable more elaborate models and analyses)

AlphaFold: also illustrates how science and algorithmic competitions drive progress
AlphaFold enables a range of commercial use cases in the pharmaceutical and biotech industries ()

Analytical processes

Key trends of improvement:

Maturing: from descriptive to predictive and prescriptive (reducing ambiguity and the need for human involvement in business decisions)
Pervasive: extending to different areas (e.g., understanding customers with A/B testing), departments (logistics, financial, …) with specific disciplines refining more specialized models (forecasting, supply chain, scheduling, queueing, …)
Standardized: algorithms and processes are shared and standardized across companies and industries (e.g., ML/LLM; analytics software/environments, governance models like CRISP-DM) -> intensifies competition

CRISP-DM: most widely used analytics model (https://www.forbes.com/sites/metabrown/2015/07/29/what-it-needs-to-know-about-the-data-mining-process/#2065f3a3515f)

Transition: CRISP-DM is a well-established model for data analytics, so it also serves as a structure for this course…

Exercise

Time (min)	Duration	Topic	Additional materials
0-30	30	Introduction and setup
30-90	60	Data handling in Python

Divide class (one part does the setup/exercise, the other reads the paper, then switch)

Distribute tags (1.1, 1.1, 1.2, 1.2, 2.1, 2.1, 2.2, 2.2) -> work in pairs.

I will work with the notebook group.

Benefits of Jupyter notebooks

one document instead of multiple (easier to keep in sync, harder for files to get lost / matters more if the analyses are more complex, if there are more “moving parts”)
option to collapse/hide cells (communicating with business stakeholders)
Notebook scaffolding: Also gives context to LLMs

Jupyter notebooks are the standard environment in data science courses worldwide.

Setup

Ask students to create a GitHub account

Explain the Jupyter notebooks, GitHub setup. Mention that students can always use software like Spyder (see Tipps im Umgang mit Spyder.docx).

Start in groups of two (random?)

Introduce system (badges) - similar to https://eduki.com/de/material/306565/schilder-fragen-fertig-ich-arbeite check classrooms before! - can it be attached to the desks?

“simple” Amazon question: you could say yes, but you will get no points for it. All exam questions are about selecting the appropriate concepts from the lecture and applying them to the case. Explain: signals the need for a rationale

yes: 0 points
no - it is about data/computation/algorithms/processes. 2 points
yes, but its only one part of the equation. It is enabled by large data collection about customers, the computational resources in cloud centers like AWS, new algorithms such as Deep learning, and analytical processes like CRISP-DM and mature prescriptive capabilities. - 5 points.

TODO: explicitly address the question of why we select Python/Jupyter (give an overview of the landscape), argue that Python is challenging (not a low/no-code platform), very popular (supports many analytical use cases), and allows you to quickly learn other tools

-> LLMs are language models: they are good at handling language (not necessarily at handling data) - so if we use a programming language to analyze data, LLMs can help us. More than they could help us operate a GUI. -> LLMs are not directly trained with user-GUI interactions (workflows are weakly documented and harder to analyze/version/control)

Useful Jupyter Notebook Tricks

Autocompletion

Tab → autocomplete variables, functions, file paths
Shift + Tab → show function documentation
Shift + Tab (twice) → expanded documentation

Inspect objects

variable? → quick help
variable?? → show source code (if available)

Example:

pd.read_csv?

List variables in memory

%who → list variables
%whos → list variables with type and size

Similarly: “Jupyter variables” button

Reset notebook variables

%reset

Two particularly useful ones for beginners

If you only show two, I recommend:

Tab → autocomplete
%whos → see all variables

These immediately help students understand what data is currently in memory.

Line-by-line execution:

right-click, create console

Notebook exercise

See how far you get: helps me understand previous knowledge of students.

Ask students who shared their solutions whether they can send them to me / that I make them available

Expectation management: you bring different levels of experience with Jupyter Notebooks. So we can also learn from each other and I believe we should all be very comfortable with Jupyter notebooks at the end of the course.

Jupyter notebooks: Zoom in a lot (close the left sidebar)

Before starting with the question sets: throw your help-card back into the bucket.

Survey

Use the last field to give feedback on the setup - did Codespaces work for you? Would you prefer to work locally?
You have seen other courses. Let me know if there is anything (practice, tool, …) that I could learn from one of my colleagues.

Materials

TODO: Cover prediction of intervention effects as an important distinction: Maybe highlight the relevance of analytics competition with barchart