Skills for Data Analysts
SQL depth, storytelling, and the tools that actually matter
Data analyst job postings list everything from SQL to machine learning. Most of it is noise. The real signal is whether you can go from a vague business question to a defensible analysis with stakeholder-ready output — faster than the PM can Google it. This guide covers what the hiring bar actually looks like across analytics roles in 2026, which certifications are worth the time, and how to demonstrate analytical depth on a resume without overstating your experience.
Must-have
7
Nice-to-have
5
Emerging
3
SQL (window functions, CTEs, subqueries, query optimization)
technicalSQL is the non-negotiable floor for every data analyst role. But "knows SQL" is not a skill — hiring managers mean specifically: window functions (ROW_NUMBER, RANK, LAG, LEAD, SUM OVER), CTEs for readable multi-step queries, and the ability to explain why a query is slow and how to fix it. Surface-level SELECT/JOIN knowledge will not clear a technical screen at any company that takes analytics seriously.
How to prove it
A GitHub repository or data portfolio with a query file that uses at least one window function on a real dataset (NYC 311, NYC Taxi, Kaggle public datasets). Include a comment explaining why you structured the query this way, not just what it does.
Time to acquire
4-8 weeks of deliberate practice against a real database with 1M+ rows; Mode Analytics, StrataScratch, and DataLemur all have practice sets specifically calibrated to interview difficulty.
Excel or Google Sheets (VLOOKUP → INDEX/MATCH, pivot tables, named ranges, data validation)
toolDespite what data science marketing says, most analysts spend 20-40% of their time in spreadsheets. The hiring bar is not basic cell editing — it is knowing the difference between VLOOKUP and INDEX/MATCH (and why the latter is usually better), building pivot tables stakeholders can filter without breaking, and designing a spreadsheet model someone else can audit.
How to prove it
A downloadable spreadsheet model (Google Sheets link in your portfolio) that demonstrates advanced functions, clean layout, and structured data validation — not a dump of raw data with color-coded cells.
Time to acquire
2-4 weeks of targeted practice; focus on pivot tables, INDEX/MATCH, and basic financial modeling exercises.
Data visualization (Tableau, Power BI, or Looker)
toolBuilding a chart that accurately represents data is not the skill — the skill is building a dashboard that a non-technical stakeholder can use correctly without training and cannot easily misinterpret. Hiring managers evaluate chart choice, axis labeling, filter design, and whether you understand the difference between a chart for exploration and a chart for decisions.
How to prove it
A published Tableau Public or Power BI dashboard built on a real dataset with a coherent narrative ("here is the question, here is what I found, here is the recommended action"). Not a default template with different colors.
Time to acquire
4-8 weeks to reach proficiency with one tool; Tableau Public has free access and a gallery to benchmark your work against.
Statistics fundamentals (distributions, hypothesis testing, A/B test interpretation)
domainThe ability to design and interpret an A/B test — including statistical significance, p-values, confidence intervals, and common misinterpretations (peeking, multiple comparisons) — is tested in analytics interviews across consumer tech, e-commerce, and SaaS. Getting this wrong in practice costs companies millions in bad product decisions.
How to prove it
A written case study (blog post or portfolio piece) where you describe an A/B test, the statistical design decisions you made, how you handled the results, and one limitation of the test you would call out proactively.
Time to acquire
6-10 weeks; Khan Academy Statistics and Probability plus one practical project running a real experiment (even on a personal email list).
Python for data analysis (pandas, numpy, matplotlib/seaborn)
technicalPython has crossed from "nice-to-have" to table stakes at most mid-size and large tech companies. The specific expectation: using pandas for data manipulation (groupby, merge, pivot), numpy for numerical operations, and at least one visualization library. Not machine learning — pure data wrangling and EDA.
How to prove it
A Jupyter notebook (or GitHub-rendered) that does a full EDA on a real dataset: data loading, cleaning decisions with rationale, summary statistics, visualizations with labeled axes, and a written conclusion. Not just code — write the narrative.
Time to acquire
6-12 weeks; "Python for Data Analysis" (Wes McKinney, the pandas author) is the definitive reference.
Business acumen (translating data into decisions stakeholders act on)
softThe most common failure mode for data analysts is producing technically correct analyses that no one uses. The skill is framing the output for the audience: what decision does this inform, what is the recommendation, and what is the one number a VP will remember in a meeting. This is consistently ranked as the hardest thing to hire for and the easiest thing to demonstrate in an interview.
How to prove it
In behavioral interviews: describe a time when your analysis changed a decision — not when you delivered an analysis. The distinction (changed a decision vs. delivered) is how interviewers calibrate business impact vs. reporting.
Time to acquire
Ongoing; develops through practice in cross-functional settings. Reading business press (FT, HBR) to understand how executives think accelerates the development.
Data cleaning and ETL fundamentals (null handling, deduplication, joins across sources)
technicalReal-world data is dirty. Analysts who cannot handle missing values intelligently (not just dropping nulls), deduplicate records across sources, or join tables with inconsistent keys produce analyses that are technically wrong. This is a hiring filter at companies with real data — not a nicety.
How to prove it
Document a data cleaning decision in your portfolio with specifics: "I found 3% null rate in the user_id column, determined from context that these were pre-registration events, and imputed 'anonymous' rather than dropping them — here is why." The decision-making matters more than the code.
Time to acquire
4-8 weeks; best learned through a messy real-world dataset (Kaggle competitions have many deliberately dirty ones).
dbt (data build tool) for analytics engineering
tooldbt is the fastest-growing tool in the modern data stack and is now explicitly required in an increasing share of analytics job postings. It brings software engineering practices (version control, testing, documentation) to SQL transformations. Analytics engineers who know dbt can build reliable data models that scale; analysts who only know ad-hoc SQL cannot.
How to prove it
A public dbt project (GitHub) with at least one model, one test (schema test or custom test), and a completed schema.yml with column-level documentation. Not a demo — something that runs against a real or realistic dataset.
Time to acquire
4-8 weeks with the official dbt Learn courses (free) and a personal project using a cloud data warehouse.
R or advanced Python (statistical modeling, regression analysis)
technicalRelevant at companies with a strong quantitative culture (tech analytics, financial analytics, research-heavy orgs). R remains the standard for statistical modeling and publication-quality figures in academic-adjacent analytics work. For pure product analytics at tech companies, Python covers the same ground.
How to prove it
An analysis notebook using a regression model — linear or logistic — with proper train/test split, model evaluation metrics, and a plain-language interpretation of coefficients for a non-technical audience.
Time to acquire
8-12 weeks to reach useful proficiency in R; faster for Python users adding scikit-learn.
Stakeholder communication and presentation skills
softThe ability to present an analysis to a room of non-technical stakeholders — verbally, not just in a slide — and field questions without getting defensive is one of the highest-value skills for senior analyst progression. Most analytics candidates underinvest in this relative to technical skills.
How to prove it
A recorded or written case study where you present findings: set up the question, show the key finding early (not at the end), use one visualization per insight, and state a clear recommendation. Share this in your portfolio or be ready to walk through it in an interview.
Time to acquire
Ongoing; practice with real internal or external presentations, not PowerPoint templates.
Git basics for versioning analysis work
toolAnalytics teams that version their SQL and Python in Git catch errors faster and collaborate better. Not a universal requirement — some companies still email CSVs — but increasingly expected at companies with a modern data stack. Shows engineering maturity.
How to prove it
A GitHub profile with SQL or Python analysis files that have meaningful commit messages (not just "update notebook") and at least a basic README explaining the analysis.
Time to acquire
1-2 weeks for the basics needed in an analytics context.
Product metrics fluency (DAU, retention, LTV, CAC, funnel analysis)
domainProduct analytics roles specifically look for candidates who speak the language of product metrics without needing a dictionary. Knowing what DAU/MAU ratio signals, how to calculate LTV correctly, and what a healthy retention curve looks like for a SaaS vs. e-commerce product is prerequisite knowledge at consumer tech companies.
How to prove it
A written or portfolio piece analyzing a publicly available product dataset (e.g., app store data, Steam review data) using at least two product metrics. Interpret what the metric tells you about the product health, not just its value.
Time to acquire
4-8 weeks reading product analytics content (Lenny's Newsletter covers this well) plus hands-on analysis.
LLM-assisted analysis (using AI for data exploration, code generation, narrative writing)
toolAnalysts who use LLMs effectively (Claude, ChatGPT, Copilot) to accelerate SQL writing, interpret visualizations, draft stakeholder narratives, and explore datasets faster are measurably more productive. The skill is knowing when AI output is reliable and when to check it — not using it blindly.
How to prove it
In an interview: describe a specific workflow where you use an LLM — what prompts you use, what you verify manually, and one example where the AI was wrong and how you caught it.
Time to acquire
2-4 weeks of deliberate integration into your analysis workflow.
Cloud data warehouses (Snowflake, BigQuery, Redshift)
toolThe majority of modern analytics work runs on a cloud data warehouse, not on-premise databases. Snowflake, BigQuery, and Redshift have specific syntax and performance characteristics that differ from standard SQL. Companies increasingly list specific warehouse experience as a requirement.
How to prove it
A project or analysis that ran against one of these warehouses — even using a free tier (BigQuery has a generous free tier). Mention specific features you used: partitioning, clustering, query cost optimization.
Time to acquire
2-4 weeks; the SQL is largely portable, but warehouse-specific features take targeted study.
ML model basics (classification, regression — interpret outputs, not build from scratch)
technicalSenior analysts at data-mature companies are increasingly asked to evaluate or interpret ML model outputs — feature importance, model drift, precision-recall tradeoffs — even when a data scientist owns the model. Not building ML from scratch, but literacy in what the outputs mean and how to validate them.
How to prove it
A notebook where you trained a simple classification model (logistic regression or decision tree), evaluated it with a confusion matrix, and wrote a plain-English interpretation of what the model is doing and where it would fail.
Time to acquire
6-10 weeks; fast.ai or "Hands-On Machine Learning with Scikit-Learn" are practical starting points.
Certifications: what's worth it
Google Data Analytics Certificate (Coursera)
Google / Coursera • $200-$300 (Coursera subscription) • 80-120 hours over 6 months
The most recognizable entry-level analytics certification and genuinely useful for career-changers who need to establish foundational credibility. Covers SQL, spreadsheets, Tableau, and R basics. Does not move the needle for candidates with 2+ years of analytics experience — at that point, your portfolio and past projects are the signal. For someone pivoting from a non-technical background, this is the right starting point.
Tableau Desktop Specialist
Tableau (Salesforce) • $250 • 20-40 hours prep
Useful if Tableau is explicitly listed as a job requirement and you are competing against candidates who already have years of hands-on experience. Does not substitute for a strong Tableau Public portfolio — hiring managers look at your published dashboards before your cert. If you have neither, prioritize building the portfolio first, then consider the cert.
Microsoft Power BI Data Analyst Associate (PL-300)
Microsoft • $165 • 40-70 hours prep
Relevant in Microsoft-ecosystem environments (enterprise, finance, healthcare with Office365 shops). At tech companies running Looker, Tableau, or Metabase, this cert signals very little. If your target employers are in the Microsoft ecosystem, it is a credible signal. If not, invest the time in building dashboards instead.
dbt Analytics Engineering Certification
dbt Labs • $200 • 30-60 hours prep
One of the few data analytics certifications that has genuine signal in 2026 — because dbt itself is genuinely valued and the cert validates real hands-on proficiency (not just multiple-choice knowledge). If you are targeting analytics engineering or senior analytics roles at tech companies with a modern data stack, this certification is worth it.
Certified Analytics Professional (CAP)
INFORMS • $695 (member) / $895 (non-member) • 100+ hours prep plus eligibility requirements
Almost never mentioned by tech analytics hiring managers. Valuable in government contracting and traditional enterprise analytics roles where formal credentials matter more than portfolio work. For anyone targeting a tech or data-forward company, the time and cost are better spent on dbt, SQL depth, or a portfolio project.
Salesforce Certified Business Analyst
Salesforce • $200 • 40-60 hours prep
Relevant only if your target role involves CRM analytics or Salesforce-specific reporting. It signals Salesforce platform literacy more than general analytics depth. Do not pursue this if your goal is product analytics, growth analytics, or data engineering-adjacent work.
ATS keywords that get data analysts through screening
Group these correctly on your resume. The wrong section placement costs you the match.
Core analytics tools
Where on resume: List tools in a dedicated Technical Skills section. Include specific versions or modules where relevant ("Python: pandas, numpy, matplotlib"). Do not rely on bullets alone — ATS scanners often extract skills sections separately.
Why it matters: SQL is a disqualifying omission at most analytics roles. Resumes without SQL listed explicitly are often auto-filtered regardless of experience described in bullets.
Database and data warehouse platforms
Where on resume: If you have experience with cloud data warehouses specifically, name them — "BigQuery" is more ATS-effective than "cloud data warehouse."
Why it matters: Senior and analytics engineering roles increasingly filter on specific warehouse keywords. Missing these at L4+ levels drops resume ranking.
Analytics and statistics terms
Where on resume: Use these in bullet context, not just skills lists. "Designed and analyzed 12 A/B tests across the checkout funnel" is far stronger than listing "A/B testing" in a skills block.
Why it matters: "A/B testing" is a threshold keyword at consumer tech companies. Missing it from a growth or product analytics resume is a common auto-filter trigger.
Business and domain terminology
Where on resume: Use these sparingly and only when substantiated by bullets. "Data-driven" alone is filler; "Built data-driven customer segmentation model that identified a $2M upsell opportunity" is not.
Why it matters: Business and BI keywords distinguish data analysts from data scientists in ATS categorization. Missing them can cause mis-routing to data engineering pipelines.
Methodologies and practices
Where on resume: Include ETL and data pipeline in bullets describing work on data infrastructure. "Built ETL pipeline ingesting 500K daily event rows from three sources" is concrete and keyword-rich.
Why it matters: Data pipeline and ETL keywords filter heavily for analytics engineering and senior analyst roles. Missing them narrows your reach to pure reporting roles.
How to weave these skills into resume bullets
Demonstrates: SQL and data extraction
Wrote SQL queries to pull data for business reports.
Wrote SQL CTEs across three joined tables (orders, users, events) to build a weekly retention cohort report, replacing a manual Excel process that took 4 hours — new automated version ran in 90 seconds and caught a data discrepancy the manual process had missed for 3 months.
Demonstrates: A/B testing and experimentation
Ran A/B tests to improve conversion rates.
Designed and analyzed a checkout-flow A/B test (n=45K users, 14-day runtime) with pre-registered hypothesis and power analysis; found a 12% lift in completion rate for the simplified form variant (95% CI: 8-16%); flagged a secondary effect on mobile drop-off that the PM team had not anticipated.
Demonstrates: Data visualization and stakeholder communication
Created dashboards for the marketing team.
Built a Looker dashboard for the marketing VP tracking 8 paid-channel KPIs across 4 regions; replaced 6 disconnected weekly email reports, reducing time-to-insight from 3 days to 4 hours and becoming the primary decision tool for $1.2M monthly ad spend allocation.
Demonstrates: dbt and analytics engineering
Helped with data modeling using dbt.
Migrated 14 ad-hoc SQL views into tested dbt models with schema tests and column-level documentation; reduced analyst onboarding time from 2 weeks to 3 days and eliminated 3 recurring data discrepancies that had driven weekly escalations.
Demonstrates: Business impact of analysis
Performed customer segmentation analysis.
Segmented 180K active users by 90-day engagement and LTV quartile using K-means clustering (Python, scikit-learn); identified a high-value segment with 3x average LTV that the sales team had not targeted — subsequent campaign to that segment drove $340K in incremental ARR in Q3.
Portfolio signals that work for data analysts
A Tableau Public or Power BI published dashboard built on a real public dataset with a clear business question and recommendation
Why: Hiring managers look at Tableau Public profiles before interviews. A dashboard that tells a story (question → finding → recommendation) signals communication skill, not just tool proficiency. Generic sample dashboards from tutorials are immediately recognizable.
How to build it: Pick a topic you genuinely find interesting (sports, housing, public health). Find the corresponding public dataset (data.gov, Kaggle, Google Dataset Search). Build three views: an overview, a time-series trend, and a segment comparison. Write a text summary on the dashboard itself explaining what you found.
A GitHub repository with a Jupyter notebook doing a full end-to-end analysis on a real dataset — data cleaning decisions documented, not just code
Why: The notebook shows Python depth AND analytical thinking. Hiring managers care whether you explain the "why" of your cleaning decisions. Dropping null rows without comment signals inexperience; choosing to impute and documenting why signals maturity.
How to build it: Use a messy public dataset (NYC 311 complaints, Airbnb listings). Write the notebook in sections: (1) load and inspect, (2) cleaning decisions with explicit rationale, (3) EDA with labeled visualizations, (4) one finding written in plain English as a conclusion.
A written analysis case study (blog post or PDF) where you answer a specific business question with data — framed as a memo, not a notebook
Why: This is the closest thing to a job-relevant work sample. Most candidates can run code. Fewer can write a clear memo that states the question, the answer, the caveats, and the recommendation in under 500 words. That rarity is signal.
How to build it: Write it in Google Docs or Medium. Format: (1) question, (2) key finding in one sentence, (3) supporting evidence in 3 bullets, (4) recommendation, (5) limitations. Link to your supporting notebook.
Where to actually learn this
Python for Data Analysis (Wes McKinney)
book • paidWritten by the creator of pandas. More reference than tutorial — you will return to it repeatedly. Covers the full pandas API with examples on realistic datasets. The best single investment for Python data analysis depth.
Best for: Python for data analysis; must-have tier.
Mode Analytics SQL School
course • freeFreely available, structured SQL curriculum that covers window functions and advanced queries in a real query editor. Better than LeetCode for analytics-specific SQL because the problems are framed as business questions, not abstract algorithms.
Best for: SQL depth; must-have tier.
dbt Learn (official dbt Labs curriculum)
course • freeThe official dbt learning path, freely accessible. Covers dbt fundamentals, testing, documentation, and deployment. Taught through hands-on projects, not lectures. The fastest path to a credible dbt portfolio project.
Best for: dbt analytics engineering; nice-to-have tier.
Storytelling with Data (Cole Nussbaumer Knaflic)
book • paidThe most useful book on data visualization for non-statisticians. Teaches which chart to use when, how to reduce chart junk, and how to direct the reader's attention. Before-and-after examples are directly actionable.
Best for: Data visualization and stakeholder communication; must-have tier.
StrataScratch (Python and SQL interview practice)
course • mixedUnlike LeetCode, StrataScratch problems are framed as business questions with real company contexts (Airbnb, Lyft, Microsoft). Window function and cohort analysis problems map directly to what analytics interview screens test.
Best for: SQL and Python interview prep; must-have tier.
Google BigQuery sandbox (free tier for hands-on SQL and data warehouse practice)
documentation • freeBigQuery's sandbox gives you 1TB of free monthly query processing with access to public datasets. The best way to learn cloud data warehouse SQL patterns without a paid subscription. Use it with public datasets like Wikipedia or GitHub activity.
Best for: Cloud data warehouse skills; emerging tier.
Lenny's Newsletter (product metrics and analytics patterns)
community • mixedCovers product metrics, growth frameworks, and experimentation design with examples from real consumer tech companies. Not a teaching resource — context-building for analysts who want to understand how PMs and executives think about data.
Best for: Product metrics fluency; nice-to-have tier.
Khan Academy Statistics and Probability
course • freeThe best free statistics curriculum for non-statisticians. Goes from basic probability through hypothesis testing and confidence intervals. The explanations are visual and intuitive — better than most textbooks for building genuine understanding vs. procedural knowledge.
Best for: Statistics fundamentals; must-have tier.