Do data analysts need to know Python, or is SQL enough?

SQL is sufficient for many analyst roles — particularly at companies with strong BI tools (Tableau, Looker) and data engineering teams who own the ETL layer. Python becomes essential when you need to do statistical modeling, automation, or work with unstructured data. As of 2026, most job postings at tech companies list Python as required or preferred, so investing in pandas fundamentals is worthwhile even if you primarily work in SQL. The ROI is high: 4-8 weeks of focused Python practice will clear more interview screens than the same time on SQL (assuming you already have basic SQL fluency).

Is the Google Data Analytics Certificate worth it for career changers?

Yes — for career changers with no prior analytics experience. It covers SQL, spreadsheets, Tableau, and R basics at a useful foundational level, and the Google brand is recognized by recruiters who screen large volumes of applications. The limitation: it does not go deep enough in SQL or Python to clear technical interview screens at competitive companies. Treat it as a starting point, not a destination. After completing it, build a portfolio of 2-3 real analyses and practice SQL on StrataScratch to reach interview-ready depth.

What SQL skills are actually tested in data analyst interviews?

Window functions are the most common screen. Specifically: ROW_NUMBER for deduplication, RANK for leaderboard-style problems, LAG/LEAD for period-over-period comparisons, and SUM/COUNT OVER (PARTITION BY) for running totals and group-level aggregations. CTEs are tested for code readability and multi-step logic. Interviewers also frequently ask you to optimize a slow query — explain the EXPLAIN output and identify missing indexes or unnecessary full table scans. Pure SELECT/JOIN knowledge will not clear a mid-level analytics screen.

Do I need a statistics or math degree to become a data analyst?

No, but you need to pass a statistics screen. The specific topics tested at most companies: p-values and statistical significance, confidence intervals, A/B test interpretation (including common mistakes like peeking and multiple comparisons), correlation vs. causation, and basic probability. These can be learned without a degree — Khan Academy Statistics and Probability covers the required content. The degree matters at research-oriented roles (quantitative research, experimentation teams at large tech) and less at product analytics, growth analytics, and business intelligence roles.

How important is Tableau or Power BI versus knowing Python visualization libraries?

Depends on the role. BI analyst and business analyst roles almost always prioritize Tableau, Power BI, or Looker — because stakeholder dashboards are the primary deliverable. Product analyst and data analyst roles at tech companies more often expect Python visualization fluency (matplotlib, seaborn, plotly) alongside or instead of a BI tool. Check the job description: "Tableau" explicitly listed means you need a portfolio in Tableau. "Data visualization" generically means Python libraries may be sufficient.

What is dbt and do I need to learn it to get hired as a data analyst?

dbt (data build tool) is software for managing SQL-based data transformations in a version-controlled, testable way. It has become the standard tool in the modern data stack and is explicitly required in a growing share of analytics engineering and senior data analyst job postings. For entry-level roles, it is a differentiator. For senior analytics engineering roles at data-mature companies, it is increasingly a requirement. The dbt Labs free learning curriculum is the fastest path to a working understanding — plan for 4-8 weeks to reach portfolio-demonstrable proficiency.

Technology

Skills for Data Analysts

SQL depth, storytelling, and the tools that actually matter

Data analyst job postings list everything from SQL to machine learning. Most of it is noise. The real signal is whether you can go from a vague business question to a defensible analysis with stakeholder-ready output — faster than the PM can Google it. This guide covers what the hiring bar actually looks like across analytics roles in 2026, which certifications are worth the time, and how to demonstrate analytical depth on a resume without overstating your experience.

Must-have

Nice-to-have

Emerging

Must have7 skills

SQL (window functions, CTEs, subqueries, query optimization)

technical

SQL is the non-negotiable floor for every data analyst role. But "knows SQL" is not a skill — hiring managers mean specifically: window functions (ROW_NUMBER, RANK, LAG, LEAD, SUM OVER), CTEs for readable multi-step queries, and the ability to explain why a query is slow and how to fix it. Surface-level SELECT/JOIN knowledge will not clear a technical screen at any company that takes analytics seriously.

How to prove it

A GitHub repository or data portfolio with a query file that uses at least one window function on a real dataset (NYC 311, NYC Taxi, Kaggle public datasets). Include a comment explaining why you structured the query this way, not just what it does.

Time to acquire

4-8 weeks of deliberate practice against a real database with 1M+ rows; Mode Analytics, StrataScratch, and DataLemur all have practice sets specifically calibrated to interview difficulty.

Excel or Google Sheets (VLOOKUP → INDEX/MATCH, pivot tables, named ranges, data validation)

tool

Despite what data science marketing says, most analysts spend 20-40% of their time in spreadsheets. The hiring bar is not basic cell editing — it is knowing the difference between VLOOKUP and INDEX/MATCH (and why the latter is usually better), building pivot tables stakeholders can filter without breaking, and designing a spreadsheet model someone else can audit.

How to prove it

A downloadable spreadsheet model (Google Sheets link in your portfolio) that demonstrates advanced functions, clean layout, and structured data validation — not a dump of raw data with color-coded cells.

Time to acquire

2-4 weeks of targeted practice; focus on pivot tables, INDEX/MATCH, and basic financial modeling exercises.

Data visualization (Tableau, Power BI, or Looker)

tool

Building a chart that accurately represents data is not the skill — the skill is building a dashboard that a non-technical stakeholder can use correctly without training and cannot easily misinterpret. Hiring managers evaluate chart choice, axis labeling, filter design, and whether you understand the difference between a chart for exploration and a chart for decisions.

How to prove it

A published Tableau Public or Power BI dashboard built on a real dataset with a coherent narrative ("here is the question, here is what I found, here is the recommended action"). Not a default template with different colors.

Time to acquire

4-8 weeks to reach proficiency with one tool; Tableau Public has free access and a gallery to benchmark your work against.

Statistics fundamentals (distributions, hypothesis testing, A/B test interpretation)

domain

The ability to design and interpret an A/B test — including statistical significance, p-values, confidence intervals, and common misinterpretations (peeking, multiple comparisons) — is tested in analytics interviews across consumer tech, e-commerce, and SaaS. Getting this wrong in practice costs companies millions in bad product decisions.

How to prove it

A written case study (blog post or portfolio piece) where you describe an A/B test, the statistical design decisions you made, how you handled the results, and one limitation of the test you would call out proactively.

Time to acquire

6-10 weeks; Khan Academy Statistics and Probability plus one practical project running a real experiment (even on a personal email list).

Python for data analysis (pandas, numpy, matplotlib/seaborn)

technical

Python has crossed from "nice-to-have" to table stakes at most mid-size and large tech companies. The specific expectation: using pandas for data manipulation (groupby, merge, pivot), numpy for numerical operations, and at least one visualization library. Not machine learning — pure data wrangling and EDA.

How to prove it

A Jupyter notebook (or GitHub-rendered) that does a full EDA on a real dataset: data loading, cleaning decisions with rationale, summary statistics, visualizations with labeled axes, and a written conclusion. Not just code — write the narrative.

Time to acquire

6-12 weeks; "Python for Data Analysis" (Wes McKinney, the pandas author) is the definitive reference.

Business acumen (translating data into decisions stakeholders act on)

soft

The most common failure mode for data analysts is producing technically correct analyses that no one uses. The skill is framing the output for the audience: what decision does this inform, what is the recommendation, and what is the one number a VP will remember in a meeting. This is consistently ranked as the hardest thing to hire for and the easiest thing to demonstrate in an interview.

How to prove it

In behavioral interviews: describe a time when your analysis changed a decision — not when you delivered an analysis. The distinction (changed a decision vs. delivered) is how interviewers calibrate business impact vs. reporting.

Time to acquire

Ongoing; develops through practice in cross-functional settings. Reading business press (FT, HBR) to understand how executives think accelerates the development.

Data cleaning and ETL fundamentals (null handling, deduplication, joins across sources)

technical

Real-world data is dirty. Analysts who cannot handle missing values intelligently (not just dropping nulls), deduplicate records across sources, or join tables with inconsistent keys produce analyses that are technically wrong. This is a hiring filter at companies with real data — not a nicety.

How to prove it

Document a data cleaning decision in your portfolio with specifics: "I found 3% null rate in the user_id column, determined from context that these were pre-registration events, and imputed 'anonymous' rather than dropping them — here is why." The decision-making matters more than the code.

Time to acquire

4-8 weeks; best learned through a messy real-world dataset (Kaggle competitions have many deliberately dirty ones).

Nice to have5 skills

dbt (data build tool) for analytics engineering

tool

dbt is the fastest-growing tool in the modern data stack and is now explicitly required in an increasing share of analytics job postings. It brings software engineering practices (version control, testing, documentation) to SQL transformations. Analytics engineers who know dbt can build reliable data models that scale; analysts who only know ad-hoc SQL cannot.

How to prove it

A public dbt project (GitHub) with at least one model, one test (schema test or custom test), and a completed schema.yml with column-level documentation. Not a demo — something that runs against a real or realistic dataset.

Time to acquire

4-8 weeks with the official dbt Learn courses (free) and a personal project using a cloud data warehouse.

R or advanced Python (statistical modeling, regression analysis)

technical

Relevant at companies with a strong quantitative culture (tech analytics, financial analytics, research-heavy orgs). R remains the standard for statistical modeling and publication-quality figures in academic-adjacent analytics work. For pure product analytics at tech companies, Python covers the same ground.

How to prove it

An analysis notebook using a regression model — linear or logistic — with proper train/test split, model evaluation metrics, and a plain-language interpretation of coefficients for a non-technical audience.

Time to acquire

8-12 weeks to reach useful proficiency in R; faster for Python users adding scikit-learn.

Stakeholder communication and presentation skills

soft

The ability to present an analysis to a room of non-technical stakeholders — verbally, not just in a slide — and field questions without getting defensive is one of the highest-value skills for senior analyst progression. Most analytics candidates underinvest in this relative to technical skills.

How to prove it

A recorded or written case study where you present findings: set up the question, show the key finding early (not at the end), use one visualization per insight, and state a clear recommendation. Share this in your portfolio or be ready to walk through it in an interview.

Time to acquire

Ongoing; practice with real internal or external presentations, not PowerPoint templates.

Git basics for versioning analysis work

tool

Analytics teams that version their SQL and Python in Git catch errors faster and collaborate better. Not a universal requirement — some companies still email CSVs — but increasingly expected at companies with a modern data stack. Shows engineering maturity.

How to prove it

A GitHub profile with SQL or Python analysis files that have meaningful commit messages (not just "update notebook") and at least a basic README explaining the analysis.

Time to acquire

1-2 weeks for the basics needed in an analytics context.

Product metrics fluency (DAU, retention, LTV, CAC, funnel analysis)

domain

Product analytics roles specifically look for candidates who speak the language of product metrics without needing a dictionary. Knowing what DAU/MAU ratio signals, how to calculate LTV correctly, and what a healthy retention curve looks like for a SaaS vs. e-commerce product is prerequisite knowledge at consumer tech companies.

How to prove it

A written or portfolio piece analyzing a publicly available product dataset (e.g., app store data, Steam review data) using at least two product metrics. Interpret what the metric tells you about the product health, not just its value.

Time to acquire

4-8 weeks reading product analytics content (Lenny's Newsletter covers this well) plus hands-on analysis.

Emerging3 skills

LLM-assisted analysis (using AI for data exploration, code generation, narrative writing)

tool

Analysts who use LLMs effectively (Claude, ChatGPT, Copilot) to accelerate SQL writing, interpret visualizations, draft stakeholder narratives, and explore datasets faster are measurably more productive. The skill is knowing when AI output is reliable and when to check it — not using it blindly.

How to prove it

In an interview: describe a specific workflow where you use an LLM — what prompts you use, what you verify manually, and one example where the AI was wrong and how you caught it.

Time to acquire

2-4 weeks of deliberate integration into your analysis workflow.

Cloud data warehouses (Snowflake, BigQuery, Redshift)

tool

The majority of modern analytics work runs on a cloud data warehouse, not on-premise databases. Snowflake, BigQuery, and Redshift have specific syntax and performance characteristics that differ from standard SQL. Companies increasingly list specific warehouse experience as a requirement.

How to prove it

A project or analysis that ran against one of these warehouses — even using a free tier (BigQuery has a generous free tier). Mention specific features you used: partitioning, clustering, query cost optimization.

Time to acquire

2-4 weeks; the SQL is largely portable, but warehouse-specific features take targeted study.

ML model basics (classification, regression — interpret outputs, not build from scratch)

technical

Senior analysts at data-mature companies are increasingly asked to evaluate or interpret ML model outputs — feature importance, model drift, precision-recall tradeoffs — even when a data scientist owns the model. Not building ML from scratch, but literacy in what the outputs mean and how to validate them.

How to prove it

A notebook where you trained a simple classification model (logistic regression or decision tree), evaluated it with a confusion matrix, and wrote a plain-English interpretation of what the model is doing and where it would fail.

Time to acquire

6-10 weeks; fast.ai or "Hands-On Machine Learning with Scikit-Learn" are practical starting points.

Certifications: what's worth it

Google Data Analytics Certificate (Coursera)

Google / Coursera • $200-$300 (Coursera subscription) • 80-120 hours over 6 months

Situational

The most recognizable entry-level analytics certification and genuinely useful for career-changers who need to establish foundational credibility. Covers SQL, spreadsheets, Tableau, and R basics. Does not move the needle for candidates with 2+ years of analytics experience — at that point, your portfolio and past projects are the signal. For someone pivoting from a non-technical background, this is the right starting point.

Tableau Desktop Specialist

Tableau (Salesforce) • $250 • 20-40 hours prep

Situational

Useful if Tableau is explicitly listed as a job requirement and you are competing against candidates who already have years of hands-on experience. Does not substitute for a strong Tableau Public portfolio — hiring managers look at your published dashboards before your cert. If you have neither, prioritize building the portfolio first, then consider the cert.

Microsoft Power BI Data Analyst Associate (PL-300)

Microsoft • $165 • 40-70 hours prep

Situational

Relevant in Microsoft-ecosystem environments (enterprise, finance, healthcare with Office365 shops). At tech companies running Looker, Tableau, or Metabase, this cert signals very little. If your target employers are in the Microsoft ecosystem, it is a credible signal. If not, invest the time in building dashboards instead.

dbt Analytics Engineering Certification

dbt Labs • $200 • 30-60 hours prep

Highly recommended

One of the few data analytics certifications that has genuine signal in 2026 — because dbt itself is genuinely valued and the cert validates real hands-on proficiency (not just multiple-choice knowledge). If you are targeting analytics engineering or senior analytics roles at tech companies with a modern data stack, this certification is worth it.

Certified Analytics Professional (CAP)

INFORMS • $695 (member) / $895 (non-member) • 100+ hours prep plus eligibility requirements

Overrated

Almost never mentioned by tech analytics hiring managers. Valuable in government contracting and traditional enterprise analytics roles where formal credentials matter more than portfolio work. For anyone targeting a tech or data-forward company, the time and cost are better spent on dbt, SQL depth, or a portfolio project.

Salesforce Certified Business Analyst

Salesforce • $200 • 40-60 hours prep

Situational

Relevant only if your target role involves CRM analytics or Salesforce-specific reporting. It signals Salesforce platform literacy more than general analytics depth. Do not pursue this if your goal is product analytics, growth analytics, or data engineering-adjacent work.

ATS keywords that get data analysts through screening

Group these correctly on your resume. The wrong section placement costs you the match.

Core analytics tools

SQLPythonRExcelGoogle SheetsTableauPower BILookerpandasdbt

Where on resume: List tools in a dedicated Technical Skills section. Include specific versions or modules where relevant ("Python: pandas, numpy, matplotlib"). Do not rely on bullets alone — ATS scanners often extract skills sections separately.

Why it matters: SQL is a disqualifying omission at most analytics roles. Resumes without SQL listed explicitly are often auto-filtered regardless of experience described in bullets.

Database and data warehouse platforms

PostgreSQLMySQLBigQuerySnowflakeRedshiftDatabricksSparkdata warehousedata lake

Where on resume: If you have experience with cloud data warehouses specifically, name them — "BigQuery" is more ATS-effective than "cloud data warehouse."

Why it matters: Senior and analytics engineering roles increasingly filter on specific warehouse keywords. Missing these at L4+ levels drops resume ranking.

Analytics and statistics terms

A/B testinghypothesis testingstatistical significanceregression analysiscohort analysisfunnel analysisdata visualizationpredictive modelingKPImetrics

Where on resume: Use these in bullet context, not just skills lists. "Designed and analyzed 12 A/B tests across the checkout funnel" is far stronger than listing "A/B testing" in a skills block.

Why it matters: "A/B testing" is a threshold keyword at consumer tech companies. Missing it from a growth or product analytics resume is a common auto-filter trigger.

Business and domain terminology

stakeholder managementbusiness intelligenceexecutive reportingdata-driveninsightsROIrevenue analysiscustomer segmentation

Where on resume: Use these sparingly and only when substantiated by bullets. "Data-driven" alone is filler; "Built data-driven customer segmentation model that identified a $2M upsell opportunity" is not.

Why it matters: Business and BI keywords distinguish data analysts from data scientists in ATS categorization. Missing them can cause mis-routing to data engineering pipelines.

Methodologies and practices

ETLdata pipelinedata modelingdata governancedata qualityAgileself-serve analyticsdashboard design

Where on resume: Include ETL and data pipeline in bullets describing work on data infrastructure. "Built ETL pipeline ingesting 500K daily event rows from three sources" is concrete and keyword-rich.

Why it matters: Data pipeline and ETL keywords filter heavily for analytics engineering and senior analyst roles. Missing them narrows your reach to pure reporting roles.

How to weave these skills into resume bullets

Demonstrates: SQL and data extraction

Wrote SQL queries to pull data for business reports.

Wrote SQL CTEs across three joined tables (orders, users, events) to build a weekly retention cohort report, replacing a manual Excel process that took 4 hours — new automated version ran in 90 seconds and caught a data discrepancy the manual process had missed for 3 months.

Demonstrates: A/B testing and experimentation

Ran A/B tests to improve conversion rates.

Designed and analyzed a checkout-flow A/B test (n=45K users, 14-day runtime) with pre-registered hypothesis and power analysis; found a 12% lift in completion rate for the simplified form variant (95% CI: 8-16%); flagged a secondary effect on mobile drop-off that the PM team had not anticipated.

Demonstrates: Data visualization and stakeholder communication

Created dashboards for the marketing team.

Built a Looker dashboard for the marketing VP tracking 8 paid-channel KPIs across 4 regions; replaced 6 disconnected weekly email reports, reducing time-to-insight from 3 days to 4 hours and becoming the primary decision tool for $1.2M monthly ad spend allocation.

Demonstrates: dbt and analytics engineering

Helped with data modeling using dbt.

Migrated 14 ad-hoc SQL views into tested dbt models with schema tests and column-level documentation; reduced analyst onboarding time from 2 weeks to 3 days and eliminated 3 recurring data discrepancies that had driven weekly escalations.

Demonstrates: Business impact of analysis

Performed customer segmentation analysis.

Segmented 180K active users by 90-day engagement and LTV quartile using K-means clustering (Python, scikit-learn); identified a high-value segment with 3x average LTV that the sales team had not targeted — subsequent campaign to that segment drove $340K in incremental ARR in Q3.

Portfolio signals that work for data analysts

A Tableau Public or Power BI published dashboard built on a real public dataset with a clear business question and recommendation

Why: Hiring managers look at Tableau Public profiles before interviews. A dashboard that tells a story (question → finding → recommendation) signals communication skill, not just tool proficiency. Generic sample dashboards from tutorials are immediately recognizable.

How to build it: Pick a topic you genuinely find interesting (sports, housing, public health). Find the corresponding public dataset (data.gov, Kaggle, Google Dataset Search). Build three views: an overview, a time-series trend, and a segment comparison. Write a text summary on the dashboard itself explaining what you found.

A GitHub repository with a Jupyter notebook doing a full end-to-end analysis on a real dataset — data cleaning decisions documented, not just code

Why: The notebook shows Python depth AND analytical thinking. Hiring managers care whether you explain the "why" of your cleaning decisions. Dropping null rows without comment signals inexperience; choosing to impute and documenting why signals maturity.

How to build it: Use a messy public dataset (NYC 311 complaints, Airbnb listings). Write the notebook in sections: (1) load and inspect, (2) cleaning decisions with explicit rationale, (3) EDA with labeled visualizations, (4) one finding written in plain English as a conclusion.

A written analysis case study (blog post or PDF) where you answer a specific business question with data — framed as a memo, not a notebook

Why: This is the closest thing to a job-relevant work sample. Most candidates can run code. Fewer can write a clear memo that states the question, the answer, the caveats, and the recommendation in under 500 words. That rarity is signal.

How to build it: Write it in Google Docs or Medium. Format: (1) question, (2) key finding in one sentence, (3) supporting evidence in 3 bullets, (4) recommendation, (5) limitations. Link to your supporting notebook.

Where to actually learn this

Python for Data Analysis (Wes McKinney)

book • paid

Written by the creator of pandas. More reference than tutorial — you will return to it repeatedly. Covers the full pandas API with examples on realistic datasets. The best single investment for Python data analysis depth.

Best for: Python for data analysis; must-have tier.

Mode Analytics SQL School

course • free

Freely available, structured SQL curriculum that covers window functions and advanced queries in a real query editor. Better than LeetCode for analytics-specific SQL because the problems are framed as business questions, not abstract algorithms.

Best for: SQL depth; must-have tier.

dbt Learn (official dbt Labs curriculum)

course • free

The official dbt learning path, freely accessible. Covers dbt fundamentals, testing, documentation, and deployment. Taught through hands-on projects, not lectures. The fastest path to a credible dbt portfolio project.

Best for: dbt analytics engineering; nice-to-have tier.

Storytelling with Data (Cole Nussbaumer Knaflic)

book • paid

The most useful book on data visualization for non-statisticians. Teaches which chart to use when, how to reduce chart junk, and how to direct the reader's attention. Before-and-after examples are directly actionable.

Best for: Data visualization and stakeholder communication; must-have tier.

StrataScratch (Python and SQL interview practice)

course • mixed

Unlike LeetCode, StrataScratch problems are framed as business questions with real company contexts (Airbnb, Lyft, Microsoft). Window function and cohort analysis problems map directly to what analytics interview screens test.

Best for: SQL and Python interview prep; must-have tier.

Google BigQuery sandbox (free tier for hands-on SQL and data warehouse practice)

documentation • free

BigQuery's sandbox gives you 1TB of free monthly query processing with access to public datasets. The best way to learn cloud data warehouse SQL patterns without a paid subscription. Use it with public datasets like Wikipedia or GitHub activity.

Best for: Cloud data warehouse skills; emerging tier.

Lenny's Newsletter (product metrics and analytics patterns)

community • mixed

Covers product metrics, growth frameworks, and experimentation design with examples from real consumer tech companies. Not a teaching resource — context-building for analysts who want to understand how PMs and executives think about data.

Best for: Product metrics fluency; nice-to-have tier.

Khan Academy Statistics and Probability

course • free

The best free statistics curriculum for non-statisticians. Goes from basic probability through hypothesis testing and confidence intervals. The explanations are visual and intuitive — better than most textbooks for building genuine understanding vs. procedural knowledge.

Best for: Statistics fundamentals; must-have tier.

Skills FAQs for data analysts

More for Data Analysts

Data Analyst resume example Data Analyst cover letter Data Analyst interview questions Data Analyst salary How to become a Data Analyst

Skills for related roles

Skills for Software Engineers

Technology