Foundations: Data, Data, Everywhere
Weekly Challenge 1
Question 1
Which of the following options describes data analysis?
- Creating new ways of modeling and understanding the unknown by using raw data
- The collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making
- The various elements that interact with one another in order to provide, manage, store, organize, analyze, and share data
- Using facts to guide business strategy
Data analysis is the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making.
Question 2
A business collects and analyzes information about its employees in order to gain insights that unlock potential and create a more productive workplace. What practice does this describe?
- Workforce observation
- Employee retention
- Team collaboration
- People analytics
People analytics — also known as human resources or workforce analytics — involves collecting and analyzing information on a company’s employees in order to gain insights that unlock potential and create a more productive workplace.
Question 3
In data analytics, a model is a group of elements that interact with one another.
- True
- False
In data analytics, a data ecosystem is a group of elements that interact with one another.
Question 4
Fill in the blank: The term _____ is defined as an intuitive understanding of something with little or no explanation.
- awareness
- personal opinion
- rational thought
- gut instinct
Gut instinct is an intuitive understanding of something with little or no explanation.
Question 5
A company defines a problem it wants to solve. Then, a data analyst gathers relevant data, analyzes it, and uses it to draw conclusions. The analyst shares their analysis with subject-matter experts, who validate the findings. Finally, a plan is put into action. What does this scenario describe?
- Identification of trends
- Data-driven decision-making
- Customer service
- Data science
This company has put data at the heart of its business strategy in order to achieve data-driven decision-making.
Question 6
Fill in the blank: The people very familiar with a business problem are called _____. They are an important part of data-driven decision-making.
- subject-matter experts
- stakeholders
- competitors
- customers
Subject-matter experts are very familiar with the business problem and can look at the results of data analysis to validate the choices being made.
Question 7
A data analyst finishes analyzing data for a marketing project. The results are clear, so they present findings to the client and ask for conclusions and recommendations. What should they have done first?
- Archived the datasets in order to keep them secure
- Created a model based on the results of the analysis
- Shared the results with subject-matter experts from the marketing team for their input
- Surveyed customers about results, conclusions, and recommendations
Including insights from people who are familiar with the business problem is an example of data-driven decision-making.
Question 8
You have recently subscribed to an online data analytics magazine. You really enjoyed an article and want to share it in the discussion forum. Which of the following would be appropriate in a post? Select all that apply.
- Including an advertisement for how to subscribe to the data analytics magazine
- Checking your post for typos or grammatical errors
- Giving credit to the original author
- Including your own thoughts about the article
Sharing informative articles is an appropriate use of the forum as long as you give credit to the original author. Also, posts should be relevant to data analytics and checked for typos and grammatical errors.
Weekly Challenge 2
Question 1
Fill in the blank: Analytical skills are defined as _____.
- qualities and characteristics associated with solving problems using facts
- an analyst's intuition, inner voice, or gut instinct
- the management of people, processes, and tools
- the ability to break things down into smaller steps
Analytical skills are qualities and characteristics associated with solving problems using facts.
Question 2
A junior data analyst is seeking out new experiences in order to gain knowledge. They watch videos and read articles about data analytics. They ask experts questions. Which analytical skill are they using?
- Understanding context
- Curiosity
- Data strategy
- Having a technical mindset
Curious people seek out new experiences, which leads to knowledge.
Question 3
Identifying the motivation behind data collection and gathering additional information are examples of which analytical skill?
- Data design
- Understanding context
- Data strategy
- A technical mindset
Identifying the motivation behind data collection and gathering additional information are examples of understanding context. Context is the condition in which something exists.
Question 4
Having a technical mindset is an analytical skill involving what?
- Managing people, processes, and tools
- Breaking things down into smaller steps or pieces
- Understanding the condition in which something exists or happens
- Balancing roles and responsibilities
Having a technical mindset involves the ability to break things down into smaller steps or pieces and work with them in an orderly and logical way.
Question 5
Which analytical skill involves managing the people, processes, and tools used in data analysis?
- Curiosity
- Understanding context
- Data strategy
- Data design
Data strategy involves managing the people, processes, and tools used in data analysis.
Question 6
Correlation is the aspect of analytical thinking that involves figuring out the specifics that help you execute a plan.
- True
- False
Correlation involves being able to identify a relationship between two or more pieces of data.
Question 7
Fill in the blank: Detail-oriented thinking is about figuring out all of the _____ that will help you execute a plan.
- specifics
- datasets
- instructions
- information
Detail-oriented thinking is about figuring out all of the specifics that will help you execute a plan.
Question 8
The five whys is a technique that involves asking, “Why?” five times in order to achieve what goal?
- Identify the root cause of a problem
- Visualize how a process should look in the future
- Use facts to guide business strategy
- Put a plan into action
In the five whys, you ask, “Why?” five times to reveal the root cause of a problem.
Question 9
What method involves examining and evaluating how a process works currently in order to get it where you want it to be in the future?
- Gap analysis
- The five whys
- Strategy
- Data visualization
Gap analysis is a method for examining and evaluating how a process works currently in order to get where you want to be in the future.
Question 10
Data-driven decision-making involves five analytical skills: curiosity, understanding context, having a technical mindset, data design, and data strategy. Each plays a role in data-driven decision-making.
- True
- False
Data-driven decision-making involves curiosity, understanding context, having a technical mindset, data design, and data strategy.
Weekly Challenge 3
Question 1
The manage stage of the data life cycle is when a business decides what kind of data it needs, how the data will be handled, and who will be responsible for it.
- True
- False
During planning, a business decides what kind of data it needs, how it will be managed throughout its life cycle, who will be responsible for it, and the optimal outcomes.
Question 2
A data analyst has finished an analysis project that involved private company data. They erase the digital files in order to keep the information secure. This describes which stage of the data life cycle?
- Manage
- Destroy
- Plan
- Archive
This describes the destroy phase, during which data analysts use secure data-erasure software and shred paper files to protect private information.
Question 3
In the analyze phase of the data life cycle, what might a data analyst do? Select all that apply.
- Use a formula to perform calculations
- Use spreadsheets to aggregate data
- Create a report from the data
- Choose the format of spreadsheet headings
In the analyze phase of the data life cycle, a data analyst might use formulas to perform calculations, create a report from the data, or use spreadsheets to aggregate data.
Question 4
Describe how the data life cycle differs from data analysis.
- The data life cycle deals with transforming and verifying data; data analysis is using the insights gained from the data.
- The data life cycle deals with the stages that data goes through during its useful life; data analysis is the process of analyzing data.
- The data life cycle deals with making informed decisions; data analysis is using tools to transform data.
- The data life cycle deals with identifying the best data to solve a problem; data analysis is about asking effective questions.
The data life cycle involves stages for identifying needs and managing data. Data analysis involves process steps to make meaning from data.
Question 5
A company takes insights provided by its data analytics team, validates them, and finalizes a strategy. They then implement a plan to solve the original business problem. This describes which step of the data analysis process?
- Analyze
- Process
- Share
- Act
The act phase is when insights are put into action.
Question 6
Fill in the blank: Spreadsheets are _____ that can be used to store, organize, and sort data.
- digital worksheets
- formulas and functions
- interactive dashboards
- visual representations
Spreadsheets are digital worksheets that can be used to store, organize, and sort data.
Question 7
Fill in the blank: A formula is a set of instructions used to perform a specified calculation; whereas a function is _____.
- a question written by the user
- a predefined operation
- a computer programming language
- a particular value
A formula is a set of instructions used to perform a specified calculation; a function is a preset command that automatically performs a specified process.
Question 8
Fill in the blank: A query is used to _____ information from a database. Select all that apply.
- update
- request
- retrieve
- analyze
A query enables data analysts to request, retrieve, and update information from a database.
Question 9
Structured query language (SQL) enables data analysts to communicate with a database.
- True
- False
SQL allows a data analyst to communicate with a database in order to retrieve and manipulate data.
Question 10
The graphical representation of information helps stakeholders understand data insights. Formulas and functions make this possible.
- True
- False
The graphical representation of information is made possible by data visualization tools. These tools help stakeholders understand data insights.
Weekly Challenge 4
Question 1
In the following spreadsheet, the column labels in row 1 are called what?
A | B | C | D | |
---|---|---|---|---|
1 | Rank | Name | Population | County |
2 | 1 | Charlotte | 885,708 | Mecklenburg |
3 | 2 | Raleigh | 474,069 | Wake (seat), Durham |
4 | 3 | Greensboro | 296,710 | Guilford |
5 | 4 | Durham | 278,993 | Durham (seat), Wake, Orange |
6 | 5 | Winston-Salem | 247,945 | Forsyth |
7 | 6 | Fayetteville | 211,657 | Cumberland |
8 | 7 | Cary | 170,282 | Wake, Chatham |
9 | 8 | Wilmington | 123,784 | New Hanover |
10 | 9 | High Point | 112,791 | Guilford, Randolph, Davidson, Forsyth |
11 | 10 | Concord | 96,341 | Cabarrus |
- Attributes
- Characteristics
- Descriptors
- Criteria
The column labels in row 1 are attributes that refer to the data in the column. An attribute is a characteristic or quality of data used to label a column in a table.
Question 2
In the following spreadsheet, the observation of Greensboro describes all of the data in row 4.
A | B | C | D | |
---|---|---|---|---|
1 | Rank | Name | Population | County |
2 | 1 | Charlotte | 885,708 | Mecklenburg |
3 | 2 | Raleigh | 474,069 | Wake (seat), Durham |
4 | 3 | Greensboro | 296,710 | Guilford |
5 | 4 | Durham | 278,993 | Durham (seat), Wake, Orange |
6 | 5 | Winston-Salem | 247,945 | Forsyth |
7 | 6 | Fayetteville | 211,657 | Cumberland |
8 | 7 | Cary | 170,282 | Wake, Chatham |
9 | 8 | Wilmington | 123,784 | New Hanover |
10 | 9 | High Point | 112,791 | Guilford, Randolph, Davidson, Forsyth |
11 | 10 | Concord | 96,341 | Cabarrus |
- True
- False
The observation of Greensboro describes all of the data in row 4. An observation is all of the attributes for something contained in a row of a data table.
Question 3
If a data analyst wants to list the cities in this spreadsheet alphabetically, instead of numerically, what feature can they use in column B?
A | B | C | D | |
---|---|---|---|---|
1 | Rank | Name | Population | County |
2 | 1 | Charlotte | 885,708 | Mecklenburg |
3 | 2 | Raleigh | 474,069 | Wake (seat), Durham |
4 | 3 | Greensboro | 296,710 | Guilford |
5 | 4 | Durham | 278,993 | Durham (seat), Wake, Orange |
6 | 5 | Winston-Salem | 247,945 | Forsyth |
7 | 6 | Fayetteville | 211,657 | Cumberland |
8 | 7 | Cary | 170,282 | Wake, Chatham |
9 | 8 | Wilmington | 123,784 | New Hanover |
10 | 9 | High Point | 112,791 | Guilford, Randolph, Davidson, Forsyth |
11 | 10 | Concord | 96,341 | Cabarrus |
- Organize range
- Sort range
- Name range
- Randomize range
Sort range would be used to alphabetize the city names in column B. Sorting a range of data from A to Z helps data analysts organize and find data more quickly.
Question 4
A data analyst types =POPULATION(C2:C11) to find the average population of the cities in this spreadsheet. However, they realize that have used the wrong formula. What syntax will correct this function? Type your answer below.
A | B | C | D | |
---|---|---|---|---|
1 | Rank | Name | Population | County |
2 | 1 | Charlotte | 885,708 | Mecklenburg |
3 | 2 | Raleigh | 474,069 | Wake (seat), Durham |
4 | 3 | Greensboro | 296,710 | Guilford |
5 | 4 | Durham | 278,993 | Durham (seat), Wake, Orange |
6 | 5 | Winston-Salem | 247,945 | Forsyth |
7 | 6 | Fayetteville | 211,657 | Cumberland |
8 | 7 | Cary | 170,282 | Wake, Chatham |
9 | 8 | Wilmington | 123,784 | New Hanover |
10 | 9 | High Point | 112,791 | Guilford, Randolph, Davidson, Forsyth |
11 | 10 | Concord | 96,341 | Cabarrus |
=AVERAGE(C2:C11)
The correct AVERAGE function syntax is =AVERAGE(C2:C11). AVERAGE returns an average of values from a selected range. C2:C11 is the specified range.
Question 5
In the following query, what is the asterisk (*) telling the database to do?
SELECT *
- Return one specific field.
- Select all of the data from the table.
- Filter certain information.
- Use proper syntax.
In a query, data analysts use SELECT and then an asterisk (*) to select all of the data from the table.
Question 6
In the following query, what is FROM telling the database to do?
SELECT * FROM Orders
- From which field data should be stored
- From which table to select data
- From which filter data should be selected
- From which field data should be updated
In a query, data analysts use FROM to indicate the table from which the data will be retrieved.
Question 7
You are writing a query that asks a database to retrieve data about the customer with identification number 5656. The column name for customer identification numbers is customer_id. What is the correct WHERE clause syntax? Type your answer below.
WHERE customer_id = 5656
The correct WHERE clause syntax is WHERE customer_id = 5656. WHERE is used to extract only those records that meet a specified criteria. Customer_id = 5656 tells the database to return only information about the customer whose ID is 5656.
Question 8
Fill in the blank: A data analyst creates a table, but they realize this isn’t the best visualization for their data. To fix the problem, they decide to use the _____ feature to change it to a column chart.
- image
- rename
- chart editor
- filter view
The chart editor enables data analysts to choose the type of chart you're making and customize its appearance.
Question 9
A data analyst wants to demonstrate how the population in Charlotte has increased over time. They create the chart below. What is this type of chart called?
- Column chart
- Area chart
- Line chart
- Bar chart
This is a line chart. Line charts are effective for demonstrating trends and patterns, such as how population changes over time.
Weekly Challenge 5
Question 1
An online gardening magazine wants to understand why its subscriber numbers have been increasing. What kind of reports can a data analyst provide to help answer that question? Select all that apply.
- Reports that predict the success of sales leads to secure future subscribers
- Reports that compare past weather patterns to the number of people taking up gardening recently
- Reports that show how many customers shared positive comments about the gardening magazine on social media in the past year
- Reports that examine how a recent 50%-off sale affected the number of subscription purchases
Analyzing historical data such as weather patterns, social media comments, and past sales would provide useful insights into the increase in subscription numbers.
Question 2
A doctor’s office has discovered that patients are waiting 20 minutes longer for their appointments than in past years. A data analyst could help solve this problem by analyzing how many doctors and nurses are on staff at a given time compared to the number of patients with appointments.
- True
- False
Analyzing staffing and patient numbers would likely provide useful insights about why patients are waiting longer for their appointment times and to help solve this problem.
Question 3
What is the process of using facts to guide business strategy?
- Data programming
- Data visualization
- Data ethics
- Data-driven decision-making
Data-driven decision-making is using facts to guide business strategy.
Question 4
Fill in the blank: A business task is described as the problem or _____ a data analyst answers for a business.
- solution
- comment
- question
- complaint
A business task is described as the problem or question a data analyst answers for a business.
Question 5
Data-driven decision-making is using facts to guide business strategy. The benefits include which of the following? Select all that apply.
- Getting a complete picture of a problem and its causes
- Using data analytics to find the best possible solution to a problem
- Making the most of intuition and gut instinct
- Combining observation with objective data
Data-driven decision-making enables companies to use data analytics to find the best possible solution to a problem, complement observation with objective data, and get a complete picture of a problem and its causes.
Question 6
It’s possible for conclusions drawn from data analysis to be both true and unfair.
- True
- False
Sometimes, a conclusion may be true, but it’s unfair because it doesn’t represent all groups or it ignores social context and other systemic factors.
Question 7
Fill in the blank: Fairness is achieved when data analysis doesn't create or _____ bias.
- resolve
- reinforce
- constrain
- highlight
Fairness is achieved when data analysis doesn’t create or reinforce bias.
Question 8
A gym wants to start offering exercise classes. A data analyst plans to survey 10 people to determine which classes would be most popular. To ensure the data collected is fair, what steps should they take? Select all that apply.
- Collect data anonymously.
- Survey only people who don’t currently go to the gym.
- Increase the number of participants.
- Ensure participants represent a variety of profiles and backgrounds.
Ensuring participants represent a variety of profiles and backgrounds, collecting data anonymously, and surveying more than just 10 people would all help ensure the data analysis is fair.
Course challenge
Scenario 1, questions 1-5
Question 1
You’ve just started a new job as a data analyst. You’re working for a midsized pharmacy chain with 38 stores in the American Southwest. Your supervisor shares a new data analysis project with you.
She explains that the pharmacy is considering discontinuing a bubble bath product called Splashtastic. Your supervisor wants you to analyze sales data and determine what percentage of each store’s total daily sales come from that product. Then, you’ll present your findings to leadership.
You know that it's important to follow each step of the data analysis process: ask, prepare, process, analyze, share, and act. So, you begin by defining the problem and making sure you fully understand stakeholder expectations.
One of the questions you ask is where to find the dataset you’ll be working with. Your supervisor explains that the company database has all the information you need.
Next, you continue to the prepare step. You access the database and write a query to retrieve data about Splashtastic. You notice that there are only 38 rows of data, representing the company’s 38 stores. In addition, your dataset contains six columns: Store Number, Average Daily Customers, Average Daily Splashtastic Sales (Units), Average Daily Splashtastic Sales (Dollars), and Average Total Daily Sales (All Products).
Considering the size of your dataset, you decide a spreadsheet will be the best tool for your project. You proceed by downloading the data from the database. Describe why this is the best choice.
- Only spreadsheets let you download and upload data.
- Databases can’t be used for analysis.
- Spreadsheets work well for processing and analyzing a small dataset, like the one you’re using.
- Spreadsheets are most effective when working with queries.
A spreadsheet is a smart choice when working with a dataset of 38 rows and six columns.
Question 2
You may click the link to create a copy of the spreadsheet: Pharmacy Data. Please refer to Pharmacy Data - Part 1 tab.
Now, it’s time to process the data. As you know, this step involves finding and eliminating errors and inaccuracies that can get in the way of your results. While cleaning the data, you notice there’s an issue you need to fix. Identify the problem.
- Column E is formatted for currency.
- The data in column A is sorted alphabetically.
- There is missing information in row 16.
- The headers in row 1 are bold.
Part of the process step is identifying any missing information and ensuring your dataset is complete.
Question 3
Once you’ve found the missing information, you analyze your dataset. You use a formula to determine how much of each store’s daily sales come from sales of Splashtastic.
You may click the link to create a copy of the spreadsheet: Pharmacy Data. Please refer to Pharmacy Data - Part 2 tab.
During analysis, you create a new column F. At the top of the column, you add: Average Percentage of Total Sales - Splashtastic. In data analytics, this column label is called an attribute.
- True
- False
This column label is an attribute, which is a characteristic or quality of data used to label a column.
Question 4
Next, you determine the average percentage of sales that Splashtastic sales represent for all 38 stores. To do this, you use the AVERAGE function in cell H2. Identify the correct way to write your function.
- =AVERAGE (F:F)
- =AVERAGE (C:D)
- *AVERAGE (E:F)
- *AVERAGE (D:D)
The function begins with an equal sign (=), and the range is all of column F, represented by F:F.
Question 5
You’ve reached the share phase of the data analysis process. It involves which of the following? Select all that apply.
- Present your findings about Splashtastic to stakeholders.
- Create a data visualization to highlight the Splashtastic sales insights you've discovered.
- Stop selling Splashtastic because it doesn't represent a large percentage of total sales.
- Prepare a slideshow about Splashtastic’s sales and practice your presentation.
The share phase involves creating data visualizations, preparing your presentation, and communicating your findings to stakeholders.
Scenario 2, questions 6-10
Question 6
You’ve been working for the nonprofit National Dental Society (NDS) as a junior data analyst for about two months. The mission of the NDS is to help its members advance the oral health of their patients. NDS members include dentists, hygienists, and dental office support staff.
The NDS is passionate about patient health. Part of this involves automatically scheduling follow-up appointments after crown replacement, emergency dental surgery, and extraction procedures. NDS believes the follow-up is an important step to ensure patient recovery and minimize infection.
Unfortunately, many patients don’t show up for these appointments, so the NDS wants to create a campaign to help its members learn how to encourage their patients to take follow-up appointments seriously. If successful, this will help the NDS achieve its mission of advancing the oral health of all patients.
Your supervisor has just sent you an email saying that you’re doing very well on the team, and he wants to give you some additional responsibility. He describes the issue of many missed follow-up appointments. You are tasked with analyzing data about this problem and presenting your findings using data visualizations.
An NDS member with three dental offices in Colorado offers to share its data on missed appointments. So, your supervisor uses a database query to access the dataset from the dental group. The query instructs the database to retrieve all patient information from the member’s three dental offices, located in zip code 81137.
The table is dental_data_table, and the column name is zip_code. You have written the following query, but received an error when it ran. What is the proper WHERE clause syntax that will correct this query?
SELECT *
FROM dental_data_table
WHERE dental_data_table = 81137
Type your answer below.
WHERE zip_code = 81137
The correct syntax is WHERE zip_code = 81137. WHERE indicates where to look for information. The column name is zip_code. And the database is being asked to return only records matching zip code 8137.
Question 7
The dataset your supervisor retrieved and imported into a spreadsheet includes a list of patients, their demographic information, dental procedure types, and whether they attended their follow-up appointment.
You may click the link to create a copy of the spreadsheet: Dental Patient Data.
The patient demographic information includes data such as age, gender, and home address. The fact that the dataset includes people who all live in the same zip code might get in the way of what?
- Spreadsheet formulas or functions
- Future dental procedures
- Data visualization
- Fairness
It’s your responsibility as a data analyst to make sure your analysis is fair. Although many zip codes do reflect diverse populations, a better choice would be to include data about people who live in multiple zip codes.
Question 8
As you’re reviewing the dataset, you notice that there are a disproportionate number of senior citizens. So, you investigate further and find out that this zip code represents a rural community in Colorado with about 800 residents. In addition, there’s a large assisted-living facility in the area. Nearly 300 of the residents in the 81137 zip code live in the facility.
You recognize that’s a sizable number, so you want to find out if age has an effect on a patient’s likelihood to attend a follow-up dental appointment. You analyze the data, and your analysis reveals that older people tend to miss follow-ups more than younger people.
So, you do some research online and discover that people over the age 60 are 50% more likely to miss dentist appointments. Sometimes this is because they’re on a fixed income. Also, many senior citizens lack transportation to get to and from appointments.
With this new knowledge, you write an email to your supervisor expressing your concerns about the dataset. He agrees with your concerns, but he’s also impressed with what you’ve learned and thinks your findings could be very important to the project. He asks you to change the business task. Now, the NDS campaign will be about educating dental offices on the challenges faced by senior citizens and finding ways to help them access quality dental care.
Fill in the blank: Changing the business task involves defining a new _____.
- graphical representation of the data
- question or problem to be solved
- gap analysis plan
- data-cleaning strategy
A business task is the question or problem data analysis answers for a business.
Question 9
You continue with your analysis. In the end, your findings support what you discovered during your online research: As people get older, they’re less likely to attend follow-up dental visits.
But you’re not done yet. You know that data should be combined with human insights in order to lead to true data-driven decision-making. So, your next step is to share this information with people who are familiar with the problem. They’ll help verify the results of your data analysis.
The people who are familiar with a problem and help verify the results of data analysis are called subject-matter experts. What are their roles in the process? Select all that apply.
- Collect, transform, and organize data
- Offer insights into the business problem
- Validate the choices being made
- Identify inconsistencies in the analysis
Subject-matter experts can offer insights into the business problem, identify inconsistencies in the analysis, and validate the choices being made.
Question 10
The subject-matter experts are impressed by your analysis. The team agrees to move to the next step: data visualization. You know it’s important that stakeholders at NDS can quickly and easily understand that older people are less likely to attend important follow-up dental appointments. This will help them create an effective campaign for members.
It’s time to create your presentation to stakeholders. It will include a data visualization that demonstrates the trend of people being less likely to attend follow-up appointments as they get older. Which type of chart will be most effective?
- A doughnut chart
- A table
- A pie chart
- A line chart
A line chart is effective for tracking trends over time, such as people attending fewer follow-up appointments as they get older.
Ask Questions to Make Data-Driven Decisions
Weekly Challenge 1
Question 1
Structured thinking involves which of the following processes? Select all that apply.
- Organizing available information
- Recognizing the current problem or situation
- Asking SMART questions
- Revealing gaps and opportunities
Structured thinking involves recognizing the current problem or situation, organizing available information, revealing gaps and opportunities, and identifying the options.
Question 2
The prepare step of the data analysis process involves defining the problem you're trying to solve and understanding stakeholder expectations.
- True
- False
The ask step involves defining the problem you're trying to solve and understanding stakeholder expectations.
Question 3
The share phase of the data analysis process typically involves which of the following activities? Select all that apply.
- Putting analysis into action to solve a problem
- Creating a slideshow to present to stakeholders
- Summarizing results using data visualizations
- Communicating findings
The share phase of the data analysis process typically involves communicating findings, summarizing results using data visualizations, and creating a slideshow to present to stakeholders.
Question 4
A garden center wants to attract more customers. A data analyst in the marketing department suggests advertising in popular landscaping magazines. This is an example of what practice?
- Developing a data analytics case study
- Collecting customer information
- Monitoring social media feedback
- Reaching your target audience
This is an example of reaching your target audience. In this scenario, people who read landscaping magazines are the target audience because they’re likely to be interested in shopping at the garden center.
Question 5
A data analyst is working for a local power company. Recently, many new apartments have been built in the community, so the company wants to determine how much electricity it needs to produce for the new residents in the future. A data analyst uses data to help the company make a more informed forecast. This is an example of which problem type?
- Spotting something unusual
- Discovering connections
- Identifying themes
- Making predictions
This is an example of making predictions. Making predictions deals with making informed decisions about how things may be in the future.
Question 6
Describe the key difference between the problem types of categorizing things and identifying themes.
- Categorizing things involves determining how items are different from each other. Identifying themes brings different items back together in a single group.
- Categorizing things involves assigning items to categories. Identifying themes takes those categories a step further, grouping them into broader themes.
- Categorizing things involves assigning grades to items. Identifying themes involves creating new classifications for items.
- Categorizing things involves taking inventory of items. Identifying themes deals with creating labels for items.
Categorizing things involves assigning items to categories. Identifying themes takes those categories a step further, grouping them into broader themes.
Question 7
Which of the following examples are closed-ended questions? Select all that apply.
- What are your thoughts about math?
- Is math your favorite subject?
- What grade did you get in your math class?
- How old are you?
Closed-ended questions don’t encourage people to elaborate and share valuable details.
Question 8
The question, “Why don’t our employees complete their timesheets each Friday by noon?” is not action-oriented. Which of the following questions are action-oriented and more likely to lead to change? Select all that apply.
- What functionalities would make our timesheet web page more user-friendly?
- What features could we add to our calendar app as a weekly timesheet reminder to employees?
- Why don’t employees prioritize filling out their timesheets by noon on Fridays?
- How could we simplify the time-keeping process for our employees?
These questions are action-oriented. That means they’re more likely to result in specific answers that can be acted on to lead to change.
Question 9
In the SMART methodology, time-bound questions are simple, significant, and focused on a single topic or a few closely related ideas.
- True
- False
In the SMART methodology, specific questions are simple, significant, and focused on a single topic or a few closely related ideas.
Question 10
Which of the following questions make assumptions? Select all that apply.
- It must be frustrating waiting on hold for so long, right?
- Wouldn’t you agree that product A is better than product B?
- Did you get through to customer service?
- Keeping employees engaged is important, isn’t it?
A common example of an unfair question is one that makes assumptions. Unfair questions assume the respondent’s answer to the question.
Weekly Challenge 2
Question 1
Fill in the blank: In data analytics, a process or set of rules to be followed for a specific task is _____.
- an algorithm
- a domain
- a pattern
- a value
In data analytics, a process or set of rules to be followed for a specific task is an algorithm.
Question 2
Fill in the blank: In data analytics, qualitative data _____. Select all that apply.
- measures numerical facts
- measures qualities and characteristics
- is always time bound
- is subjective
In data analytics, qualitative data is subjective and measures qualities and characteristics.
Question 3
In data analytics, reports use live, incoming data from multiple datasets; dashboards use static collections of data.
- True
- False
Dashboards monitor live, incoming data from multiple datasets; reports use static collections of data.
Question 4
A pivot table is a data-summarization tool used in data processing. Which of the following tasks can pivot tables perform? Select all that apply.
- Group data
- Calculate totals from data
- Clean data
- Reorganize data
Pivot tables are used to reorganize, group, and calculate totals from data.
Question 5
A metric is a single, quantifiable type of data that can be used for what task?
- Defining a problem type
- Setting and evaluating goals
- Sorting and filtering data
- Cleaning data
A metric is a single, quantifiable type of data used when setting and evaluating goals.
Question 6
Fill in the blank: A _____ goal is measurable and evaluated using single, quantifiable data.
- metric
- finite
- benchmark
- conceptual
A metric goal is measurable and evaluated using single, quantifiable data.
Question 7
If a data analyst compares the cost of an investment to the net profit of that investment over a period of time, they’re analyzing the investment scope.
- True
- False
If a data analyst compares the cost of an investment to the net profit of that investment over a period of time, they’re analyzing the return on investment.
Question 8
Fill in the blank: A data analyst is using data to address a large-scale problem. This type of analysis would most likely require _____. Select all that apply.
- small data
- data that reflects change over time
- data represented by a limited number of metrics
- big data
A data analyst using data to address a large-scale problem would most likely require big data that reflects change over time.
Weekly Challenge 3
Question 1
Both formulas and functions in spreadsheets begin with what symbol?
- Vertical line (|)
- Equals sign (=)
- Plus-minus sign (±)
- Lowercase x
Both formulas and functions in spreadsheets begin with an equals sign.
Question 2
Attributes are used in spreadsheets for what purpose?
- Label the data in each column
- Insert data into each column
- Analyze the data in a row
- Add a new column
Attributes are used to label the type of data in each column in a spreadsheet.
Question 3
Which of the following tasks might be performed using spreadsheets?
- Land a new client
- Develop communication skills
- Maintain information about accounts
- Write a sales pitch
A spreadsheet could be used to maintain information about accounts.
Question 4
Fill in the blank: Combining formulas and functions enables the function to run based on a _____ set by the formula.
- change
- cell
- count
- criteria
Combining formulas and functions enables the function to run based on a criteria set by the formula.
Question 5
Which of the following statements describes a key difference between formulas and functions?
- Formulas are used in graphs, and functions are not.
- Formulas span two or more cells, and functions exist in only one cell.
- Formulas contain words and numbers, and functions contain numbers only.
- Formulas are written by the user, and functions are already defined.
Formulas are written by the user, and functions are already defined.
Question 6
Fill in the blank: Putting data into context helps data analysts eliminate _____.
- fairness
- intolerance
- labels
- bias
Putting data into context helps data analysts eliminate bias.
Question 7
Defining the problem domain is part of which data analytics process?
- Balanced thinking
- Logical thinking
- Organized thinking
- Structured thinking
Defining the problem domain is part of the structured-thinking process.
Question 8
A data analyst uses structured thinking to recognize the current problem or situation. Select the final step to structured thinking.
- Identify options
- Monitor options
- Clean data
- Sort data
The final step in the structured-thinking process is to identify options.
Weekly Challenge 4
Question 1
A data analytics team is working on a project to measure the success of a company’s new financial strategy. The vice president of finance is most likely to be the _____.
- project manager
- analyst
- secondary stakeholder
- primary stakeholder
The vice president of finance is most likely to be the primary stakeholder.
Question 2
A data analyst is researching the buying behavior of people who shop at a company’s retail store and those who might shop there in the future. During the analysis, it will be important to stay in communication with the team that most often interacts with these shoppers. What is the name of this team?
- Project management team
- Executive team
- Data science team
- Customer-facing team
The customer-facing team includes anyone in an organization who interacts with customers or potential customers, such as the shoppers at a company’s retail store.
Question 3
To communicate clearly with stakeholders and team members, there are four key questions data analysts ask themselves. One of them is: What does my audience need to know? Identify the remaining three questions. Select all that apply.
- How can I communicate effectively to my audience?
- What does my audience already know?
- Who is my audience?
- Why are stakeholders and team members important?
The four key questions data analysts ask themselves when communicating with stakeholders are: Who is my audience? What do they already know? What do they need to know? And how can I communicate effectively with them?
Question 4
A data analyst feels overworked. They often stay late to finish work, and have started missing deadlines. Their supervisor emails them another project to complete, and this causes the analyst even more stress. How should they handle this situation?
- Respond immediately, letting the supervisor know the expectations at this company are unreasonable.
- Accept the new project right away and hope to not miss another deadline.
- Walk into the supervisor’s office and tell them to give the project to someone else.
- Wait a few minutes to think it over, then respond with a meeting request to discuss this project and the general workload.
They should wait a few minutes to think it over, then respond with a meeting request to discuss this project and the general workload. When people are feeling angry or emotional, it’s best to wait until things calm down. Then, give everyone the opportunity to share their perspectives.
Question 5
Data analysts pay attention to sample size in order to achieve what goals? Select all that apply.
- To make sure a few unusual responses don’t skew results
- To make sure the data represents a diverse set of perspectives
- To avoid a small sample size leading to inaccurate judgements
- To fully understand the scope of the analytics project
Data analysts pay attention to sample size in order to represent a diverse set of perspectives and avoid skewed results or inaccurate judgements.
Question 6
A data analyst has been invited to a meeting. They review the agenda and notice that their data analysis project is one of the topics that will be discussed. They plan to arrive on time and have a pen and paper to take notes. But they do not spend time considering project updates they could share or questions they may be asked. This is okay because they’re not the one running the meeting.
- True
- False
Even if the data analyst isn’t running the meeting, if their project is on the agenda, it’s a good idea to prepare to share updates and answer questions. This helps keep everyone informed and ensures effective communication.
Question 7
Which of the following steps are key to leading a professional online meeting? Select all that apply.
- Maintaining control of the meeting by keeping everyone else on mute.
- Sitting in a quiet area that’s free of distractions
- Keeping an eye on your inbox during the meeting in case of an important email
- Making sure your technology is working properly before starting the meeting
When leading an online meeting, acting professionally involves encouraging others to contribute, testing technology beforehand, and eliminating distractions.
Question 8
Conflict is a natural part of working on a team. What are some ways to help shift a situation from problematic to productive? Select all that apply.
- Identify the person who caused the issue so they can take responsibility.
- Ask for a conversation to help you better understand the big picture.
- Take a moment to check your emotions before engaging in an argument.
- Reframe the question by asking, “How can I help?”
To help shift a situation from problematic to productive, reframe the question, keep your emotions in check, and establish open lines of communication.
Course challenge
Scenario 1, questions 1-5
Question 1
You’ve just started a job as a data analyst at a small software company that provides data analytics and business intelligence solutions. Your supervisor asks you to kick off a project with a new client, Athena’s Story, a feminist bookstore. They have four existing locations, and the fifth shop has just opened in your community.
Athena’s Story wants to produce a campaign to generate excitement for an upcoming celebration and introduce the bookstore to the community. They share some data with your team to help make the event as successful as possible.
Your task is to review the assignment and the available data, then present your approach to your supervisor.
Then, review the email, and review the Customer Survey and Historical Sales datasets:
- You may click the link to create a copy of the dataset: Customer Survey
- You may click the link to create a copy of the dataset Historical Sales
After reading the email, you notice that the acronym WHM appears in multiple places. You look it up online, and the most common result is web host manager. That doesn’t seem right to you, as it doesn’t fit the context of a feminist bookstore. How do you proceed?
- Call the client to ask what WHM means and inform them that using acronyms is not a professional business practice.
- Proceed with the project assuming WHM must mean web host manager.
- Schedule a meeting with your supervisor, the client, and another analyst on your team to figure out the meaning.
- Send your supervisor a polite, concise email, asking them to confirm the meaning of WHM.
You should send your supervisor a polite, concise email, asking them to confirm the meaning of WHM.
Question 2
Scenario 1 continued Now that you know WHM stands for Women’s History Month, you continue reviewing the datasets. You notice the Customer Survey dataset contains both qualitative and quantitative data.
The qualitative data includes information from which columns? Select all that apply.
- Column B (Survey Q2: If answered "Yes" to Q1, how do you plan to celebrate?)
- Column F (Survey Q6: What types of books would you like to see more of at Athena's Story?)
- Column E (Survey Q5: What do you like most about Athena's Story?)
- Column D (Survey Q4: If answered "Yes" to Q3, how many books do you typically purchase during March?)
The qualitative data includes information from columns B, E, and F.
Question 3
Next, you review the customer feedback in column F of the Customer Survey (link to download CSV instead below). CustomerSurvey - CustomerSurvey.csv
The attribute of column F is, “Survey Q6: What types of books would you like to see more of at Athena's Story?” In order to verify that children’s literature and feminist zines are among the most popular genres, you create a visualization. This will help you clearly identify which genres are most likely to sell well during the Women’s History Month campaign.
Fill in the blank: The visualization you create demonstrates the percentages of each book genre that make up the total number of survey responses. It’s called a _____ chart.
- pie
- area
- doughnut
- bubble
The visualization is called a pie chart.
Question 4
Now that you’ve confirmed that children’s literature and feminist zines are among the most requested book genres, you review the Historical Sales.
You’re pleased to see that columns D and E have something in common: They both contain data that’s specific to children’s literature and feminist zines. This will provide you with the information you need to make data-inspired decisions. In addition, the children’s literature and feminist zines metrics will help you organize and analyze the data about each genre in order to determine if they’re likely to be profitable.
Next, you use the SUM function to calculate the total sales over 52 weeks for feminist zines. What is the correct syntax? Type your answer below.
=SUM(E2:E53)
The correct syntax is =SUM(E2:E53). The SUM function adds the values of a range of cells. E2:E53 is the specified range.
Question 5
After familiarizing yourself with the project and available data, you present your approach to your supervisor. You provide a scope of work, which includes important details, a schedule, and information on how you plan to prepare and validate the data. You also share some of your initial results and the pie chart you created.
In addition, you identify the problem type, or domain, for the data analysis project. You decide that the historical sales data can be used to provide insights into the types of books that will sell best during Women’s History Month this coming year. This will also enable you to determine if Athena’s Story should begin selling more children’s literature and feminist zines.
Using historical data to make informed decisions about how things may be in the future is an example of discovering connections.
- True
- False
Using historical data to make informed decisions about how things may be in the future is an example of making predictions.
Scenario 2, questions 6-10
Question 6
You’ve completed this program and are now interviewing for your first junior data analyst position. You’re hoping to be hired by an event planning company, Patel Events Plus.
So far, you’ve successfully completed the first round of interviews with the human resources manager and director of data and strategy. Now, the vice president of data and strategy wants to learn more about your approach to managing projects and clients.
You arrive Thursday at 1:45 PM for your 2 PM interview. Soon, you’re taken into the office of Mila Aronowicz, vice president of data and strategy. After welcoming you, she begins the behavioral interview.
First, she hands you a copy of Patel Events Plus’s organizational chart.
As you’ve learned in this course, stakeholders are people who invest time, interest, and resources into the projects you’ll be working on as a data analyst. Let’s say you’re working on a project involving data and strategy. Based on what you find in the organizational chart, if you need information from the secondary stakeholders, who can you ask? Select all that apply.
- Project manager, analytics
- Vice president, data and strategy
- Chief executive officer
- Data analytics coordinator
If you need information from the secondary stakeholders, you can ask the project manager and the data analytics coordinator.
Question 7
Next, the vice president wants to understand your knowledge about asking effective questions. Consider and respond to the following question. Select all that apply.
Let’s say we just completed a big event for a client and wanted to find out if they were satisfied with their experience. Provide some examples of measurable questions that you could include in the customer feedback survey.
- Why did you enjoy the event planned by Patel Events Plus?
- Would you recommend Patel Events Plus to a colleague or friend? Yes or no?
- On a scale from 1 to 5, please rate your satisfaction with the event we planned for you.
- How would you describe your event experience?
In the SMART methodology, measurable questions can be quantified and assessed. This might include a 1-to-5 scale or questions with yes-or-no responses.
Question 8
Now, the vice president presents a situation having to do with resolving challenges and meeting stakeholder expectations. Consider and respond to the following question.
You’re working with a dataset that the data analytics coordinator should have cleaned, but it turns out that it wasn’t. Your supervisor thought the dataset was ready for use, but you discover nulls, redundant data, and other issues. The project is due in less than two weeks. How would you handle that situation?
- Contact the associate data analyst and insist they clean the dataset immediately so you don’t miss your project deadline.
- Call a formal meeting with the data analytics team to solve the problem. Do not invite the associate data analyst, as they clearly don’t have time to help.
- Email your supervisor to let them know the associate data analyst did not complete their assigned task.
- Communicate with the associate data analyst about the issue and offer to work together to clean the data so the project doesn’t fall behind.
This situation presents an opportunity to communicate, collaborate, and foster positive working relationships.
Question 9
Your next interview question deals with sharing information with stakeholders. Consider and respond to the following question.
Let’s say you want to share information about an upcoming event with stakeholders. It’s important that they’re able to access and interact with the data in real time. Would you create a report or a dashboard?
- Dashboard
- Report
Dashboards offer live monitoring of incoming data and enable stakeholders to interact with the data.
Question 10
Your final behavioral interview question involves using metrics to answer business questions. Your interviewer hands you a copy of PatelEventsData.
Then, she asks:
Recently, Patel Events Plus purchased a new venue for our events. If we asked you to calculate the return on investment of this purchase, which metrics would you use?
- Purchase date
- 2019 events held at new venue (column D)
- Net profit in 2019 (column F)
- Purchase price (column C)
Return on investment is made up of two metrics: the net profit over a period of time and the cost of the investment. By comparing these two metrics, you can determine the profitability of the investment.
Prepare Data for Exploration
Weekly Challenge 1
Question 1
If you have a short time frame for data collection and need an answer immediately, you would have to use historical data.
- True
- False
If you have a short time frame for data collection and need an answer immediately, you would have to use historical data.
Question 2
Which of the following is an example of continuous data?
- Box office returns
- Movie run time
- Movie budget
- Leading actors in movie
Movie run time is an example of continuous data.
Question 3
Which of the following questions collects nominal qualitative data?
- Is this your first time dining at this restaurant?
- How many people do you usually dine with?
- On a scale of 1-10, how would you rate your service today?
- How many times have you dined at this restaurant?
“Is this your first time dining at this restaurant?” is a question that collects nominal qualitative data.
Question 4
Which of the following is a benefit of internal data?
- Internal data is less vulnerable to biased collection.
- Internal data is more relevant to the problem.
- Internal data is more reliable and easier to collect.
- Internal data is less likely to need cleaning.
A benefit of internal data is that it’s more reliable and easier to collect than external data.
Question 5
A social media post is an example of structured data.
- True
- False
A social media post is an example of unstructured data.
Question 6
Fill in the blank: A Boolean data type can have _____ possible values.
- two
- three
- infinite
- 10
A Boolean data type can have two possible values.
Question 7
In long data, separate columns contain the values and the context for the values, respectively. What does each column contain in wide data?
- A specific constraint
- A unique format
- A specific data type
- A unique data variable
In wide data, each column contains a unique data variable. In long data, separate columns contain the values and the context for the values, respectively.
Question 8
A data analyst is working in a spreadsheet application. They use Save As to change the file type from .XLS to .CSV. This is an example of a data transformation.
- True
- False
A data analyst using Save As to change a file type from .XLS to .CSV is an example of a data transformation.
Weekly Challenge 2
Question 1
Fill in the blank: A preference in favor of or against a person, group of people, or thing is called _____. It is an error in data analytics that can systematically skew results in a certain direction.
- data interoperability
- data collection
- data anonymization
- data bias
Data bias is a type of error that systematically skews results in a certain direction.
Question 2
A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias is this an example of?
- Interpretation bias
- Confirmation bias
- Sampling bias
- Observer bias
This is an example of sampling bias, which is when a sample isn’t representative of the population as a whole.
Question 3
Which of the following are qualities of unreliable data? Select all that apply.
- Biased
- Vetted
- Inaccurate
- Incomplete
Unreliable data is inaccurate, incomplete, and biased.
Question 4
In data ethics, consent gives an individual the right to know the answers to which of the following questions? Select all that apply.
- How will my data be used?
- Why am I being forced to share my data?
- Why is my data being collected?
- How long will my data be stored?
In data ethics, consent gives individuals the right to know why their data is being collected, how it will be used, and how long it will be stored.
Question 5
An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This concept refers to which aspect of data ethics?
- Transaction transparency
- Ownership
- Consent
- Currency
This refers to transaction transparency, which is the idea that an individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data.
Question 6
What is data privacy?
- Providing free access, usage, and sharing of data
- Applying well-founded standards of right and wrong that dictate how data is collected, shared, and used
- Searching for or interpreting supporting information
- Preserving a data subject’s information and activity for all data transactions
Data privacy refers to preserving a data subject’s information and activity for all data transactions.
Question 7
Data anonymization applies to both text and images.
- True
- False
Data anonymization applies to all personally identifiable information, including text and images.
Question 8
The government of a large city collects data on the quality of the city’s infrastructure. Any business, nonprofit organization, or citizen can access the government’s databases and re-use or redistribute the data. Is this an example of open data?
- Yes
- No
This is an example of open data. Everyone must be able to use, re-use, and redistribute open data.
Weekly Challenge 3
Question 1
Primary and foreign keys are two connected identifiers within separate tables. These tables exist in what kind of database?
- Primary
- Relational
- Normalized
- Metadata
Primary and foreign keys are two connected identifiers within separate tables in a relational database.
Question 2
Metadata is data about data. What kinds of information can metadata offer about a particular dataset? Select all that apply.
- How to combine the data with another dataset
- Which analyses to perform on the data
- If the data is clean and reliable
- What kinds of data it contains
Metadata helps data analysts identify the type of data, if it is clean and reliable, and how it can be combined with another dataset.
Question 3
Think about data as a student at a high school. In this metaphor, which of the following are examples of metadata? Select all that apply.
- Classes the student is enrolled in
- Student’s ID number
- Grades the student earns
- Student’s enrollment date
The student ID number, enrollment date, and classes the student is enrolled in represent structural metadata.
Question 4
Think about data as a refrigerator. Which kind of metadata is the refrigerator’s product number?
- Redundant
- Administrative
- Structural
- Descriptive
The refrigerator’s product number is descriptive metadata because it is information that can help identify the refrigerator at a later date.
Question 5
What is the process that data analysts use to ensure the formal management of their company’s data assets?
- Data integrity
- Data governance
- Data mapping
- Data aggregation
Data governance is the process of ensuring the formal management of a company’s data assets.
Question 6
Describe the key differences between a star and a snowflake schema. Select all that apply.
- A star schema enables very fast data processing.
- A snowflake schema enables very fast data processing. This should not be selected
- A snowflake schema has one or more fact tables referencing any number of dimension tables. A star schema is an extension of a snowflake schema, with more dimensions and subdimensions.
- A star schema has one or more fact tables referencing any number of dimension tables. A snowflake schema is an extension of a star schema, with more dimensions and subdimensions.
A star schema has one or more fact tables referencing any number of dimension tables. A snowflake schema is an extension of a star schema, with more dimensions and subdimensions. It also enables very fast data processing.
Question 7
What are some key benefits of using external data? Select all that apply.
- External data is always reliable.
- External data is free to use.
- External data has broad reach.
- External data provides industry-level perspectives.
Some key benefits of using external data are that it has a broad reach and it provides industry-level perspectives.
Question 8
A data analyst reviews a database of Wisconsin car sales to find the last five car models sold in Milwaukee in 2019. How can they sort and filter the data to return the last five cars at the top? Select all that apply.
- Filter out sales outside of Milwaukee
- Filter out sales not in 2019
- Sort by date in ascending order
- Sort by date in descending order
The analyst can filter out sales outside of Milwaukee in 2019 and sort by date in descending order.
Weekly Challenge 4
Question 1
Fill in the blank: Naming conventions are _____ that describe a file's content, creation date, or version.
- frequent suggestions
- common verifications
- general attributes
- consistent guidelines
Naming conventions are consistent guidelines that describe a file's content, creation date, or version.
Question 2
A data analytics team uses data about data to indicate consistent naming conventions for a project. What type of data is involved in this scenario?
- Metadata
- Long data
- Aggregated data
- Big data
Metadata is data about data. Metadata practices can help analytics teams create consistent naming conventions and storage practices for their files.
Question 3
A data analyst creates a file that lists people who donated to their organization’s fund drive. An effective name for the file is: FundDriveDonors_Feb2022_V3.
- True
- False
FundDriveDonors_Feb2022_V3 is an effective file name because it is an appropriate length and references the project name, creation date, version.
Question 4
Foldering may be used by data analysts to organize folders into what?
- Databases
- Subfolders
- Versions
- Tables
Foldering may be used by data analysts to organize folders into subfolders.
Question 5
Data analysts use archiving to separate current from past work. What does this process involve?
- Reviewing current data files to confirm they’ve been cleaned
- Moving files from completed projects to another location
- Reorganizing and renaming current files
- Using secure data-erase software to destroy old files
Archiving involves moving files from completed projects to a separate location.
Question 6
Fill in the blank: Data analysts create _____ to structure their folders.
- hierarchies
- ladders
- sequences
- scales
Data analysts create hierarchies to structure their folders.
Question 7
A data analyst wants to ensure only people on their analytics team can access, edit, and download a spreadsheet. They can use which of the following tools? Select all that apply.
- Sharing permissions
- Encryption
- Templates
- Filtering
To control who can access or edit a spreadsheet, data analysts use encryption and sharing permissions.
Question 8
To reduce clutter, a data analyst hides cells that contain long, complex formulas. To view the formulas again, the analyst will need to adjust the spreadsheet sharing or encryption settings.
- True
- False
Hidden cells can be easily unhidden using the unhide feature. Hiding does not protect data.
Course challenge
Scenario 1, questions 1-5
Question 1
You’ve been working at a data analytics consulting company for the past six months. Your team helps restaurants use their data to better understand customer preferences and identify opportunities to become more profitable.
To do this, your team analyzes customer feedback to improve restaurant performance. You use data to help restaurants make better staffing decisions and drive customer loyalty. Your analysis can even track the number of times a customer requests a new dish or ingredient in order to revise restaurant menus.
Currently, you’re working with a vegetarian sandwich restaurant called Garden. The owner wants to make food deliveries more efficient and profitable. To accomplish this goal, your team will use delivery data to better understand when orders leave Garden, when they get to the customer, and overall customer satisfaction with the orders.
Before project kickoff, you attend a discovery session with the vice president of customer experience at Garden. He shares information to help your team better understand the business and project objectives. As a follow-up, he sends you an email with datasets.
Click below to read the email: C3 Scenario 1_Client Email.pdf
And click below to access the datasets:
Course 3 Final Challenge Data Sets - Customer survey data (1).csv
Course 3 Final Challenge Data Sets - Delivery times_distance (1).csv
Reviewing the data enables you to describe how you will use it to achieve your client’s goals. First, you notice that all of the data is first-party data. What does this mean?
- It’s subjective data that measures qualities and characteristics.
- It’s data that was collected by Garden employees using their own resources.
- It’s a type of data that’s categorized without a set order.
- It’s data that was collected from outside sources.
First-party data is data collected by an individual or group using their own resources.
Question 2
Next, you review the customer satisfaction survey data:
CustomerSurveyData - Customer survey data.csv
The question in column E asks, “Was your order accurate? Please respond yes or no.” What kind of data is this?
- Clean data
- Ordinal data
- Second-party data
- Boolean data
This is Boolean data, which has only two possible values, such as yes or no.
Question 3
Now, you review the data on delivery times and the distance of customers from the restaurant:
DeliveryTimes_DistanceData - Delivery times_distance.csv
The data in column E shows the duration of each delivery. What type of data is this? Select all that apply.
- Quantitative data
- Qualitative data
- Discrete data
- Continuous data
This is an example of discrete data, which is counted and has a limited number of values. It is also quantitative data, which is specific and measures numerical facts.
Question 4
The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is an example of structured data.
- True
- False
This is an example of unstructured data, which is not organized in an easily identifiable manner.
Question 5
Now that you’re familiar with the data, you want to build trust with the team at Garden.
What actions should you take when working with their data? Select all that apply.
- Keep the data safe by implementing data-security measures, such as password protection and user permissions.
- Organize the data using effective naming conventions.
- Share the client’s data with other delivery restaurants to compare performance.
- Post on social media that you’re working with Garden and would like feedback from any of your contacts who have ordered there before.
You can build trust by showing a client that you will organize their data effectively and keep it safe by implementing appropriate data-security measures.
Scenario 2, questions 6-10
Question 6
You’ve completed this program and are interviewing for a junior data scientist position at a company called Sewati Financial Services.
Click below to review the job description:
C3 Course Challenge Junior Data Scientist Job Description .pdf
So far, you’ve successfully completed the first interview with a recruiter. They arrange your second interview with the team at Sewati Financial Services.
Click below to read the email from the human resources director:
Course 3 Scenario 2_Second Interview Email.pdf
You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Kai Harvey, the senior manager of strategy. After welcoming you, he begins the behavioral interview.
Consider and respond to the following question. Select all that apply.
Our data analytics team often surveys clients to get their feedback. If you were on the team, how would you ensure the results do not favor a particular person, group of people, or thing?
- Instruct participants to share their name and contact information.
- Ensure the survey sample represents the population as a whole.
- Make sure the wording of the survey question does not encourage a specific response from participants.
- Give participants enough time to answer each survey question.
The way questions are written, the amount of time given to answer each question, and the inclusivity of the participants can help ensure survey results are unbiased.
Question 7
Consider and respond to the following question. Select all that apply.
Our data analytics team often uses both internal and external data. Describe the difference between the two.
- Internal data lives within a company’s own systems. External data lives outside the organization.
- External data is typically generated from within the company. Internal data is generated outside the organization.
- Internal data is typically generated from within the company. External data is generated outside the organization.
- External data lives within a company’s own systems. Internal data lives outside the organization.
Internal data lives within a company’s own systems and is typically generated from within the company. External data lives in and is generated outside the organization.
Question 8
Consider and respond to the following question. Select all that apply.
Our analysts often work with the same spreadsheet, but for different purposes. How would you use filtering to help in this situation?
- Use filters to highlight the header row
- Use filters to simplify a spreadsheet by only showing you only the information you need.
- Use filters to sort the data in a meaningful order
- Use filters to show only the data that meets a specific criteria while hiding the rest
Filters enable data analysts on the same team to use the same dataset for different purposes.
Question 9
Next, your interviewer wants to better understand your knowledge of basic SQL commands. He asks: How would you write a query that retrieves only data about people with the last name Hassan from the Clients table in our database?
- SELECT DATA FROM Clients WHERE 'Hassan'
- SELECT Clients WHERE Last_Name= 'Hassan' FROM *
- SELECT * FROM Clients WHERE Last_Name= 'Hassan'
- SELECT All WHERE Last_Name 'Hassan' FROM Clients
To write a query that retrieves only data about people with the last name Hassan from the Clients table, type SELECT * FROM Clients WHERE Last_Name='Hassan'.
Question 10
For your final question, your interviewer explains that Sewati Financial Services cares about its clients’ trust, and this is an important responsibility for the data analytics team. They do this by:
- protecting clients from unauthorized access to their private data
- ensuring freedom from inappropriate use of client data
- giving consent to use someone’s data
He asks: Which data analytics practice does this describe?
- Encryption
- Data privacy
- Sharing permissions
- Bias
This describes data privacy, which involves protecting an individuals’ private data.
Process Data from Dirty to Clean
Weekly Challenge 1
Question 1
Which of the following conditions are necessary to ensure data integrity? Select all that apply.
- Statistical power
- Completeness
- Accuracy
- Privacy
Accuracy and completeness are necessary to ensure data integrity.
Question 2
What is one potential problem associated with data manipulation that analysts must be aware of?
- Data manipulation can help organize a dataset.
- Data manipulation can separate a dataset among different locations.
- Data manipulation can make a dataset easier to read.
- Data manipulation can introduce errors.
Data manipulation is the process of changing data to make it more organized and easier to read. However, it can sometimes introduce errors.
Question 3
A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Based on the available data, an analyst will be able to determine which country was the most populous from 2016 to 2017.
- True
- False
Based on the available data, an analyst will be able to determine which country was the most populous from 2016 to 2017.
Question 4
A data analyst is given a dataset for analysis.
June 2014 Invoices - Sheet1.csv
Which of the following has duplicate data?
- Data for Valando on 2/18/2014
- Data for Valando on 1/1/2014
- Data for Symteco on 5/20/2014
- Data for Symteco on 2/21/2014
Valando on 2/18/2014 contains duplicate data because the spreadsheet contains the same data in two different rows.
Question 5
A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decide to generate new data that represents all continents. What type of insufficient data does this scenario describe?
- Data that keeps updating
- Data that's outdated
- Data that's geographically limited
- Data from only one source
This example describes data that is insufficient because it’s geographically limited. If the analytics project has a global focus, the dataset should also be global.
Question 6
A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey?
- A sample of car owners who most recently bought an electric car
- A sample of all electric car owners
- A sample of car owners who have owned more than one electric car
- The entire population of electric car owners
The company should survey a sample of all electric car owners.
Question 7
Fill in the blank: Sampling bias in data collection happens when a sample isn’t representative of _____.
- a dataset about the population
- the population most affected by the data
- a subset of the population
- the population as a whole
Sampling bias in data collection happens when a sample isn’t representative of the population as a whole.
Question 8
Which of the following processes helps ensure a close alignment of data and business objectives?
- Completing data replication
- Transferring data multiple times
- Having data update automatically during analysis
- Maintaining data integrity
Maintaining data integrity helps ensure a close alignment of data and business objectives because the data is likely to be accurate, complete, consistent, and trustworthy.
Weekly Challenge 2
Question 1
Which of the following terms describe dirty data? Select all that apply.
- Irrelevant
- Incomplete
- Infallible
- Incorrect
Dirty data is incomplete, incorrect, and irrelevant to the problem being solved.
Question 2
Field length is a spreadsheet tool for determining if a field has been duplicated.
- True
- False
Field length determines the number of characters that may be typed into a field.
Question 3
A data analyst notices that the customer in row 2 shares the same Customer ID as the customer in row 6. What does this scenario describe?
A | B | C | D | D |
---|---|---|---|---|
1 | Last name | First name | Middle initial | Customer ID |
2 | Smith | Leonardo | R. | 64078 |
3 | Lee | Natasha | E. | 92862 |
4 | Wallace | Luciana | M. | 55107 |
5 | Xiao | Hua | A. | 88492 |
6 | Smith | Leo | R. | 64078 |
7 | Chaudhuri | Toby | T. | 34694 |
8 | Lee | Tasha | P. | 18295 |
9 | Walton | Mason | Q. | 58239 |
10 | Richards | Felix | S. | 12765 |
11 | Guillermo | Beth | I. | 27593 |
12 | Walton | Nadine | J. | 67292 |
12 | Walton | Nadine | J. | 67292 |
- Duplicate data
- Mislabeled data
- Inconsistent data
- Obsolete data
This is duplicate data because the customer data in row 2 is a duplicate of the customer data in row 6.
Question 4
Fill in the blank: Conditional formatting is a spreadsheet tool that changes how _____ appear when values meet a specific condition.
- filters
- cells
- queries
- charts
Conditional formatting is a spreadsheet tool that changes how cells appear when values meet a specific condition.
Question 5
A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called?
- Delimiter
- Unit
- Partition
- Substring
When using the SPLIT function, the specified character separating each item is called a delimiter.
Question 6
For a function to work properly, data analysts must follow each function’s predetermined structure. What is this structure called?
- Syntax
- Validation
- Summary
- Algorithm
This structure is called syntax. Syntax is a predetermined structure that includes all required information and its proper placement.
Question 7
You are working with the following selection of a spreadsheet:
A | B | |
---|---|---|
1 | Customer | Address |
2 | Sally Stewart | 9912 School St. North Wales, PA 19454 |
3 | Lorenzo Price | 8621 Glendale Dr. Burlington, MA 01803 |
4 | Stella Moss | 372 W. Addison Street Brandon, FL 33510 |
5 | Paul Casey | 9069 E. Brickyard Road Chattanooga, TN 37421 |
In order to extract the five-digit postal code from Burlington, MA, what is the correct function?
- =LEFT(5,B3)
- =RIGHT(B3,5)
- =RIGHT(5,B3)
- =LEFT(B3,5)
The correct syntax is =RIGHT(B3,5). The RIGHT function returns a set number of characters from the right side of a text string. B3 is the specified cell. And 5 is the number of characters to return.
Question 8
A data analyst in a human resources department is working with the following selection of a spreadsheet:
A | B | C | D | |
---|---|---|---|---|
1 | Year Hired | Last 4 of SS# | Department | Employee ID |
2 | 2019 | 1192 | Marketing | |
3 | 2014 | 2683 | Operations | |
4 | 2020 | 1939 | Strategy | |
5 | 2009 | 3208 | Graphics |
They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20093208 for the employee in row 5?
- =CONCATENATE(A5,B5)
- =CONCATENATE(A5+B5)
- =CONCATENATE(A5:B5)
- =CONCATENATE(A5*B5)
To create the ID 20093208 for the employee in row 5, the function is =CONCATENATE(A5,B5). CONCATENATE joins together two or more text strings. (A5,B5) are the locations of the strings to be joined.
Question 9
An analyst is cleaning a new dataset containing 500 rows. They want to make sure the data contained from cell B2 through cell B300 does not contain a number greater than 50. Which of the following COUNTIF function syntaxes could be used to answer this question? Select all that apply.
- =COUNTIF(B2:B300,>50)
- =COUNTIF(B2:B300,”<=50”)
- =COUNTIF(B2:B300,<=50)
- =COUNTIF(B2:B300,">50")
One possible syntax is =COUNTIF(B2:B300,">50"). This returns the number of cells that are greater than 50. Another option is =COUNTIF(B2:B300,<=50). This returns the number of cells that are less than or equal to 50. Either one can confirm that the data does not contain a number greater than 50.
Question 10
The V in VLOOKUP stands for what?
- Virtual
- Vertical
- Visual
- Variable
The V in VLOOKUP stands for vertical. VLOOKUP is a spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information.
Question 11
Fill in the blank: Data mapping is the process of _____ fields from one data source to another.
- matching
- linking
- merging
- extracting
Data mapping is the process of matching fields from one data source to another.
Question 12
Describe the relationship between a primary key and a foreign key.
- A primary key references a row in which each value is unique. A foreign key is a column within a table that is a primary key in another table.
- A primary key is a field within a table that is a foreign key in another table. A foreign key references a column in which each value is unique
- A primary key references a column in a table in which each value is unique. A foreign key is a field within a table that is a primary key in another table.
- A primary key references a field within a table that is a foreign key in another table. A foreign key references a row in which each value is unique. Correct
A primary key references a column in a table in which each value is unique. A foreign key is a field within a table that is a primary key in another table.
Weekly Challenge 3
Question 1
Data analysts choose SQL for which of the following reasons? Select all that apply.
- SQL is a programming language that can also create web apps
- SQL is a powerful software program
- SQL is a well-known standard in the professional community
- SQL can handle huge amounts of data
Data analysts choose SQL because it can handle huge amounts of data. SQL is also a well-known standard in the professional community.
Question 2
In which of the following situations would a data analyst use spreadsheets instead of SQL? Select all that apply.
- When visually inspecting data
- When working with a dataset with more than 1,000,000 rows
- When working with a small dataset
- When using a language to interact with multiple database programs
An analyst would choose to use spreadsheets instead of SQL when visually inspecting data or working with a small dataset.
Question 3
A data analyst creates many new tables in their company’s database. When the project is complete, the analyst wants to remove the tables so they don’t clutter the database. What SQL commands can they use to delete the tables?
- INSERT INTO
- CREATE TABLE IF NOT EXISTS
- UPDATE
- DROP TABLE IF EXISTS
The analyst can use the DROP TABLE IF EXISTS query to delete the tables so they don’t clutter the database.
Question 4
A data analyst is cleaning customer data for an online retail company. They are working with the following section of a database:
The analyst wants to find out if the state data is consistent and if any text strings contain more than two characters. What is the correct SQL clause to use to find any text strings containing more than two characters?
- WHERE(state) > 2
- DISTINCT(state) > 2
- SUBSTR(state) > 2
- LENGTH(state) > 2
The correct LENGTH statement is LENGTH(state) > 2.
Question 5
Fill in the blank: The _____ function counts the number of characters a string contains.
- SUBSTR
- CAST
- LENGTH
- TRIM
The LENGTH function counts the number of characters the string contains.
Question 6
In SQL databases, what data type refers to a number that contains a decimal?
- Integer
- String
- Boolean
- Float
In SQL databases, the float data type refers to a number that contains a decimal.
Question 7
Fill in the blank: In SQL databases, the _____ function can be used to convert data from one datatype to another.
- TRIM
- LENGTH
- SUBSTR
- CAST
The CAST function can be used to convert data from one datatype to another.
Question 8
Fill in the blank: The _____ function can be used to return non-null values in a list.
- CONCAT
- TRIM
- COALESCE
- CAST
The COALESCE function can be used to return non-null values in a list.
Weekly Challenge 4
Question 1
The data collected for an analysis project has just been cleaned. What are the next steps for a data analyst? Select all that apply.
- Verification
- Reporting
- Certification
- Validation
Verification and reporting are the next steps for a data analyst after the data is cleaned.
Question 2
A data analyst is in the verification step. They consider the business problem, the goal, and the data involved in their analytics project. What scenario does this describe?
- Reporting on the data
- Seeing the big picture
- Considering the stakeholders
- Visualizing the data
To see the big picture when verifying data cleaning, consider the business problem, the goal, and the data.
Question 3
Which function removes leading, trailing, and repeated spaces in data?
- CUT
- CROP
- TRIM
- TIDY
TRIM is a function that removes leading, trailing, and repeated spaces in data.
Question 4
A data analyst uses the COUNTA function to count which of the following?
- The total number of headers in a specific range.
- The total number of values within a specified range.
- The total number of entries in a changelog.
- The specific numbers in a dataset.
A data analyst uses the COUNTA function to count the total number of values within a specified range.
Question 5
A WHEN statement considers one or more conditions and returns a value as soon as that condition is met.
- True
- False
A CASE statement considers one or more conditions and returns a value as soon as that condition is met.
Question 6
What is the process of tracking changes, additions, deletions, and errors during data cleaning?
- Recording
- Documentation
- Observation
- Cataloging
Documentation is the process of tracking changes, additions, deletions, and errors during data cleaning.
Question 7
Fill in the blank: A changelog contains a _____ list of modifications made to a project.
- approximate
- random
- chronological
- synchronized
A data analyst uses a changelog to access the information needed. A changelog is a file that contains a chronological list of modifications made to a project.
Question 8
Reviewing version history is an effective way to view a changelog in SQL.
- True
- False
Reviewing version history is an effective way to view a changelog in spreadsheets.
Course challenge
Scenario 1, questions 1-5
Question 1
You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.
Meer-Kitty Interior Design About Us Page.pdf
Meer-Kitty Interior Design Business Plan.pdf
Meer-Kitty Interior Design has two goals. They want to expand their online audience, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.
Kitty Survey Feedback - Meer-Kitty survey feedback.csv
You are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.
As the survey has too few responses and numerous duplicates that are skewing results, what are your options? Select all that apply.
- Repeat the survey in order to create a new, improved dataset.
- Locate another dataset about indoor paint.
- Remove the duplicates from the data and proceed with analysis.
- Talk with stakeholders and ask for more time.
With numerous duplicates, the best option is to talk with stakeholders and ask for more time. Then, you can repeat the survey in order to create a new, improved dataset.
Question 2
During the meeting, you also learn that Meer-Kitty videos are hosted on their website. For each product offered, there is an accompanying video for customers to learn more. So, more views for a video suggests greater consumer interest.
Your goal is to identify which videos are most popular, so Meer-Kitty knows what topics to explore in the future. Unfortunately, Meer-Kitty has just three months of data available because they only recently launched the videos on their site.
Without enough data to identify long-term trends about the video subjects that people prefer, what should you do?
- Find an alternate data source that will still enable you to meet your objective.
- Watch the videos and use your gut instinct to identify which are most successful.
- Tell the client you’re sorry, but there is no way to meet their objective.
- Move ahead with the data you have to determine the top video subjects.
Without enough data to identify long-term trends, one option is to find an alternate data source that will still enable you to meet your objective. In this case, you could find data from a similar company and learn about its consumer interest and trends.
Question 3
Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.
Clearly, one particular respondent, the superfan, is overrepresented. This means the data doesn’t represent the population as a whole.
When surveying people for Meer-Kitty in the future, what are some best practices you can use to address some of the issues associated with sampling bias? Select all that apply.
- Increase sample size
- Use data that keeps updating
- Use data from only one source
- Use random sampling
To address some of the issues associated with sampling bias, random sampling helps select a sample from a population so that every possible type of the sample has an equal chance of being chosen. In addition, by increasing sample size, you’re more likely to survey part of a population that is representative of the whole.
Question 4
The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.
Kitty Survey Feedback - New Meer-Kitty survey feedback.csv
You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.
You decide to use a spreadsheet tool that changes how cells appear when they contain the word Yes. Which tool do you use?
- Data validation
- Conditional formatting
- Filtering
- CONCATENATE
To change how cells appear when they meet a certain value, use conditional formatting.
Question 5
You continue cleaning the data. You use tools such as remove duplicates and COUNTIF to ensure the dataset is complete, correct, and relevant to the problem you’re trying to solve. Then, you complete the verification and reporting processes to share the details of your data-cleaning effort with your team.
While reviewing, your team notes one aspect of data cleaning that would improve the dataset even more. They point out that the new survey also has a new question in Column G: “What are your favorite indoor paint colors?” This was a free-response question, so respondents typed in their answers. Some people included multiple different colors of paint. In order to determine which colors are most popular, it will be necessary to put each color in its own cell.
What spreadsheet function enables you to put each of the colors in Column G into a new, separate cell?
- Delimit
- MID
- Divide
- SPLIT
To put each of the colors in Column G into a new, separate cell, use SPLIT. SPLIT is a spreadsheet function that divides text around a specified character and puts each fragment into a new, separate cell.
Scenario 2, questions 6-10
Question 6
You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:
C4 B.Spoke Market Research Job Description.pdf
So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:
C4 S2 Email from Recruiter.pdf
You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.
For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use a spreadsheet function to find the information we need.
There is a spreadsheet function that searches for a value in the first column of a given range and returns the value of a specified cell in the row in which it is found. It is called SEARCH.
- True
- False
The VLOOKUP function searches for a certain value in a column to return a corresponding piece of information.
Question 7
Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.
She says: Spreadsheets have a great tool for that called remove duplicates. In SQL, you can include DISTINCT to do the same thing. In which part of the SQL statement do you include DISTINCT?
- The FROM statement
- The WHERE statement
- The UPDATE statement
- The SELECT statement
To remove duplicates in SQL, include DISTINCT in your SELECT statement.
Question 8
Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.
She asks: What function would you use to convert data in a SQL table from one datatype to another?
- CONVERT
- CHANGE
- CAST
- COALESCE
The CAST function is used to convert data in a SQL table from one datatype to another.
Question 9
Next, your interviewer explains that one of their clients is an online retailer that needs to create product numbers for a vast inventory. Her team does this by combining the text strings for product number, manufacturing date, and color.
She asks: Which SQL function would you use to add strings together to create new text strings?
- COMBINE
- CREATE
- COALESCE
- CONCAT
To add strings together to create new text strings, use the CONCAT function.
Question 10
For your final question, your interviewer explains that her team often comes across data with extra spaces.
She asks: Which function would enable you to eliminate those extra spaces? You respond: To eliminate extra spaces for consistency, use the TRIM function.
- True
- False
To eliminate extra spaces for consistency, use the TRIM function.
Analyze Data to Answer Questions
Weekly Challenge 1
Question 1
In the data analysis process, which of the following refers to a phase of analysis? Select all that apply.
- Visualize the data
- Organize data into understandable sections
- Get input from others
- Format data using sorts and filters
There are four phases of analysis: organize data, format and adjust data, get input from others, and transform data by observing relationships between data points and making calculations.
Question 2
During which phase of analysis can you find a correlation between two variables?
- Format and adjust data
- Get input from others
- Organize data
- Transform data
Finding a correlation between two variables occurs while transforming data.
Question 3
You are performing a calculation during your analysis of a dataset. Which phase of analysis are you in?
- Transform data
- Get input from others
- Organize data
- Format and adjust data
You are the transform data phase of analysis. This is an example of identifying relationships and patterns between data.
Question 4
Typically, a data analyst uses filters when they want to expand the amount of data they are working with.
- True
- False
Typically, a data analyst uses filters when they want to narrow down the amount of data they are working with.
Question 5
A data analyst is sorting spreadsheet data. They want to make sure that, when they rearrange the data, data across rows is kept together. What technique should they use to sort the data?
- Sort Column
- Sort Sheet
- Sort Together
- Sort Rows
Sort sheet sorts all of the data in a spreadsheet by a specific sorted column. Data across rows is kept together during the sort.
Question 6
A data analyst uses a function to sort a spreadsheet range between cells H1 and K65. They sort in ascending order by the first column, Column H. What is the syntax they are using?
- =SORT(H1:K65, 1, TRUE)
- =SORT(H1:K65, A, FALSE)
- =SORT(H1:K65, A, TRUE)
- =SORT(H1:K65, 1, FALSE)
The syntax is
=SORT(H1:K65, 1, TRUE)
. The first part of the function sorts the data in the specified range. The 1 represents the first column. And a TRUE statement sorts in ascending order.
Question 7
A data analyst is querying a database that contains data about dental equipment inventory. They are only interested in data related to cleaning products. Which of the following sections of an SQL statement would return the correct result?
- WHERE "Cleaning"
- WHERE product = "Cleaning"
- ORDER BY "Cleaning"
- ORDER BY product = "Cleaning"
The correct section is
WHERE Product = "Cleaning"
. A WHERE statement in SQL includes the name of the column, an equals sign, and the value(s) in the column to include.
Question 8
A data analyst would write the following section of a SQL query to sort Golden Retrievers, ordered by birth date, in ascending order:
WHERE Breed = "Golden Retriever" ORDER BY Birth_date
- True
- False
The query will return Golden Retrievers, ordered by birth date, in ascending order.
Weekly Challenge 2
Question 1
An analyst notes that the “160” in cell A9 is formatted as text, but it should be Australian dollars. What spreadsheet tool can help them select the right format?
- CURRENCY
- Format as Currency
- EXCHANGE
- Format as Dollar
The Format as Currency tool can be used to change the text to Australian dollars.
Question 2
You are creating a spreadsheet to help you with your job search. Every time you find an interesting job, you add it to the spreadsheet. Then, you want to indicate two possible options: Need to Apply or Applied. What spreadsheet tool will save you time by enabling you to create a dropdown list with Need to Apply and Applied as the possible options?
- Data validation
- FIND
- Conditional formatting
- Pop-up menus
Data validation can be used to add drop-down lists with predetermined options for Need to Apply and Applied.
Question 3
You are using a spreadsheet to keep track of your newspaper subscriptions. You add color to indicate if a subscription is current or has expired. Which spreadsheet tool changes how cells appear when values meet each expiration date?
- Add color
- CONVERT
- Data validation
- Conditional formatting
You are using conditional formatting. Conditional formatting changes how cells appear when values meet specific conditions.
Question 4
A data analyst wants to write a SQL query to combine data from two columns and into a new column. What function can they use?
- CONCAT
- JOIN
- COMBINE
- GROUP
They can use CONCAT, which joins multiple text strings from multiple sources.
Question 5
You are querying a database of ice cream flavors to determine which stores are selling the most mint chip. For your project, you only need the first 80 records. What clause should you add to the following SQL query?
SELECT flavors FROM ice_cream_table WHERE flavor = "mint_chip"
- LIMIT = 80
- LIMIT_80
- LIMIT,80
- LIMIT 80
To return only the first 80 records, type LIMIT 80.
Question 6
A data analyst is working with a spreadsheet that has very long text strings. They use a function to count the number of characters in cell G11. What is the correct syntax?
- =LEN(G,11)
- =LEN(G11)
- =LEN(G:G11)
- =LEN(“G11”)
The correct syntax is =LEN(G11). The LEN function counts the number of characters in a text string and the parameter for the function is the cell reference.
Question 7
Spreadsheet cell L6 contains the text string “Function.” To return the substring “Fun,” what is the correct syntax?
- =RIGHT(3,L6)
- =LEFT(L6, 3)
- =RIGHT(L6, 3)
- =LEFT(3,L6)
The function =LEFT(L6, 3) will return “Fun.” The LEFT function returns a set number of characters from the left side of a text string. In this case, it returns a three-character substring from the end of the string in L6, starting from the left.
Question 8
Fill in the blank: When working with a database, data analysts can use the _____ function to locate specific characters in a string.
- IDENTIFY
- WHERE
- FIND
- FROM
When working with a database, data analysts can use the FIND function to locate specific characters in a string.
Weekly Challenge 3
Question 1
Fill in the blank: Data aggregation involves creating a _____ collection of data that originally came from multiple sources.
- modified
- summarized
- localized
- expanded
Data aggregation involves creating a summarized collection of data from multiple sources.
Question 2
A data analyst uses the SUM function to add together numbers from a spreadsheet. However, after getting a zero result, they realize the numbers are actually text. What function can they use to convert the text to a numeric value?
- FIGURE
- DIGIT
- VALUE
- CONVERT
The analyst can use the VALUE function to convert the text that represents a number to a numeric value.
Question 3
When using VLOOKUP, there are some common limitations that data analysts should be aware of. One of these limitations is that VLOOKUP can only return a value from the data to the left of the matched value.
- True
- False
One limitation of VLOOKUP is that it can only return a value from the data to the right of the matched value.
Question 4
Fill in the blank: When writing a function, a data analyst wraps a table array in dollar signs. This is an _____ , which is used to lock the array so rows and columns don’t change if the function is copied.
- arbitrary reference
- accurate reference
- absolute reference
- authentic reference
Wrapping a table array in dollar signs creates an absolute reference, which locks the array so rows and columns don’t change if the function is copied.
Question 5
The following is a selection from a spreadsheet:
A | B | C | |
---|---|---|---|
1 | Country | Population in 2020 (millions) | Growth in population 2000-2020 |
2 | China | 1,439,323,776 | 13.4 % |
3 | India | 1,380,004,385 | 37.1 % |
4 | United States | 331,002,651 | 17.3 % |
5 | Indonesia | 273,523,615 | 27.7% |
6 | Pakistan | 220,892,340 | 44.9% |
7 | Brazil | 212,559,417 | 21.9% |
8 | Nigeria | 206,139,589 | 66.3% |
9 | Bangladesh | 164,689,383 | 27.9% |
10 | Russia | 145,934,462 | -0.8% |
To search for the population of Pakistan, what is the correct VLOOKUP syntax?
- =VLOOKUP(Pakistan, A2:B10, 3, false)
- =VLOOKUP("Pakistan", A2:B10, 3, false)
- =VLOOKUP(Pakistan, A2*B10, 2, false)
- =VLOOKUP("Pakistan", A2:B10, 2, false)
To search for the population of Pakistan, the syntax is =VLOOKUP("Pakistan", A2:B10, 2, false). “Pakistan” is the reference. A2:B10 is the table array. The 2 indicates the number of the column from which the value should be returned. And the word false instructs the function to return an exact match.
Question 6
When creating a SQL query, which JOIN clause returns all matching records in two or more database tables?
- OUTER
- RIGHT
- INNER
- LEFT
The INNER JOIN clause returns all matching records in two or more database tables.
Question 7
A data analyst writes a query that asks a database to return only distinct values in a specified range, rather than including repeating values. Which function do they use?
- RETURN
- COUNT DISTINCT
- RETURN VALUES
- COUNT
When writing SQL queries, an analyst can use the COUNT DISTINCT function to return only distinct values in a range.
Question 8
Which of the following terms describe a subquery? Select all that apply.
- Inner select
- Nested query
- Inner query
- Small query
A subquery can also be called an inner query, inner select, or nested query.
Weekly Challenge 4
Question 1
You are analyzing sales data in a spreadsheet. Which of the following could you find out by using the MAX function?
- Total sales for the year
- Difference between two months of sales
- The month with the highest sales
- Sales per month over a year
You could find out the month with the highest sales using the MAX function. The MAX function returns the largest numeric value from a range of cells.
Question 2
A data analyst is working with a spreadsheet from a furniture company.
The analyst inputs a function to find the number of product prices that are less than $150.00. Which formula will return that result?
- =SUMIF(G2:G30, ">150")
- =COUNTIF(G2:G30, "<150")
- =SUMIF(G2:G30, "<150")
- =COUNTIF(G2:G30, ">=150")
The COUNTIF formula =COUNTIF(G2:G30, "<150") will allow the analyst to count all product price values in Column G that are less than $150.
Question 3
A data analyst is working in a spreadsheet and uses the SUMIF function in the formula below as part of their analysis.
=SUMIF(A1:A25, "<10", C1:C25)
Which part of this formula is the criteria or condition?
- "<10"
- A1:A25
- C1:C25
- =SUMIF
The criteria or condition for this SUMIF formula is "<10". This means that if any values in the range A1 through A25 are less than 10, their corresponding values in the range C1 through C25 will be added together.
Question 4
A data analyst is working in a spreadsheet and uses the SUMPRODUCT function in the formula below as part of their analysis.
=SUMPRODUCT(A2:A10,B2:B10)
How does the SUMPRODUCT function calculate the cell ranges identified in the parentheses?
- It multiplies the values in the first range, then multiplies the values in the second range.
- It adds the ranges, then multiplies them by the last value in the second array.
- It adds the values in the first range, then adds the values in the second range.
- It multiplies the ranges, then adds the sum of the products of the two ranges.
=SUMPRODUCT(A2:A10,B2:B10)
calculates the cell ranges by multiplying each value in the first range by its corresponding value in the second range (the results are the products). Then, the formula adds those products together.
Question 5
A data analyst creates a pivot table in a spreadsheet containing movie data.
If the analyst wants to summarize the data using the AVERAGE function in the Values menu, which spreadsheet columns could they add data from? Select all that apply.
- Box Office Revenue
- Budget
- Movie Title
- Genre
To summarize the data using the AVERAGE function, the analyst could use the Budget column or the Box Office Revenue column. Both have numeric values that the AVERAGE function could calculate.
Question 6
A data analyst uses the following SQL query to perform basic calculations on their data. Which types of operators is the analyst using in this SQL query? Select all that apply.
SELECT
Yes_Responses,
No_Responses,
Total_Surveys,
(Yes_Responses + No_Responses) / Total_Surveys AS Responses_Per_Survey
FROM
Survey_1
- Subtraction
- Multiplication
- Addition
- Division
The analyst is using the division operator (/) in this SQL query to divide the sum of "yes" and "no" responses by the total number of surveys.
Question 7
A data analyst uses the following query to perform a calculation on a company's inventory. Which of the following will be the return in the "Overstock" column for this query?
SELECT
Total_Inventory % Total_Stores AS Overstock
FROM
Shipment_1
- The remainder when the values in "Total_Inventory" are divided by the values in "Total_Stores"
- The percentage of the "Total_Inventory" that is located in "Total_Stores"
- The difference between the values in "Total_Inventory" and the values in "Total_Stores"
- The combined total of the values in "Total_Inventory" and the values in "Total_Stores"
The return for this query will be the remainder when the total inventory is divided by the total number of stores. The modulo operator (%) calculates the remainder when two values are divided.
Question 8
A data analyst completes a calculation in a SQL query using the AVG function. Which of the following best describes the return for this query?
SELECT
AVG (salary) AS avg_employee_salary
FROM
employees
WHERE
salary < 30000
- The number of all salaries in the "employees" table
- A single average of all of the salaries less than $30,000
- A single count of salaries that average less than $30,000
- The annual salary for each employee
The return for this query would be a single average of all of the salaries less than $30,000. The AVG function is an aggregate function that returns the average value of a group. In this query, the group is "salary" and the condition is salaries less than $30,000.
Question 9
Use the following SQL query to answer the question:
SELECT
location,
SUM(customer_orders) AS total_orders
FROM
bulk_orders
Which statement should you add after the FROM statement to organize rows by location?
- EXTRACT location
- WHERE location
- AS location
- GROUP BY location
You should add the GROUP BY statement to organize rows by location. In this query, GROUP BY groups rows from the Bulk_orders table with the same location value into summary rows.
Question 10
Fill in the blank: The data validation process involves checking and rechecking the quality of your data to make sure that it is complete and _____. Select all that apply.
- cited
- accurate
- consistent
- secure
Data validation involves checking and rechecking the quality of your data to make sure it is complete, accurate, secure, and consistent.
Course challenge
Scenario 1, Questions 1-7
For the past six months, you have been working for a direct-mail marketing firm as a junior marketing analyst. Direct mail is advertising material sent to people through the mail. These people can be current or prospective customers, clients, or donors. Many charities depend on direct mail for financial support.
Your company, Directly Dynamic, creates direct-mail pieces with its in-house staff of graphic designers, expert mail list services, and on-site printing. Your team has just been hired by a local nonprofit, Food Justice Rock Springs. The mission of Food Justice Rock Springs is to eliminate food deserts by establishing local gardens, providing mobile pantries, educating residents, and more. Click below to read the email from Tayen Bell, vice president of marketing and outreach.
C5 Course Challenge, Email From Tayen Bell, Directly Dynamic.pdf
You begin by reviewing the dataset: Dynamic Dataset.
The client has asked you to send two separate mailings: one to people within 50 miles of Rock Springs; the other to anyone outside that area. So, to research each donor’s distance from the city, you first need to find out where all of these people live.
You could scroll through 209 rows of data, but you know there is a more efficient way to organize the cities.
Which of the following tools will enable you to sort your spreadsheet by city (Column K) in ascending order?
- Sort Sheet by Column K from Z to Z
- Sort Sheet by Column K from A to Z
- Sort Range by Column K from A to Z
- Sort Range by Column K from Z to A
To sort your spreadsheet by city in ascending order, Sort Sheet by Column K from A to Z. You can also use the SORT function syntax =SORT(A2:R210, 11, TRUE).
Question 2
You notice that many cells in the city column, Column K, are missing a value. So, you use the zip codes to research the correct cities. Now, you want to add the cities to each donor’s row. However, you are concerned about making a mistake, such as a spelling typo.
Fill in the blank: To add drop-down lists to your worksheet with predetermined options for each city name, you decide to use _____.
- the LIST function
- VLOOKUP
- data validation
- the find tool
You decide to use data validation. Data validation allows you to control what can and cannot be entered in your worksheet in order to avoid typos. It does this by adding drop-down lists with predetermined options, such as each city name.
Question 3
Now, you decide to address Tayen’s request to include a handwritten note in the direct-mail piece for anyone who gave at least $100 last year.
Which of the following spreadsheet tools will enable you to change how cells appear if they contain a value of $100 or more?
- Conditional formatting
- The MAX function
- The COUNTA function
- Data validation
To change how cells appear, use conditional formatting. Choose to format cells if they are greater than or equal to 100.
Question 4
At this point, you notice that the information about state and zip code is in the same row. However, your company’s mailing list software requires states to be on a separate line from zip codes.
What function will enable you to move the 2-character state abbreviation in cell L2 into its own column?
- =LEFT(L2,2)
- =LEFT(2,L2)
- =RIGHT(2,L2)
- =RIGHT(L2,2)
To move the 2-character state abbreviations in Column L into their own column, use the LEFT function: =LEFT(L2,2).
Question 5
Next, you duplicate your dataset twice using the Sheet Menu. You rename the first sheet Donation Form List, and you remove the cities that are further than 50 miles from Rock Springs. You rename the second sheet Postcard List, and you remove the cities that are within 50 miles of Rock Springs.
Then, you import these datasets into your company’s mailing list database. In a mailing list database, you create two tables: Donation_Form_List and Postcard_List. You decide to clean the Donation_Form_List first.
Your company’s mailing list software requires units to be on the same line as street addresses. However, they are currently in two separate columns (street_address and unit).
What portion of your SQL statement will instruct the database to combine these two columns into a new column called "address"?
- JOIN(street_address to unit) AS address
- JOIN(street_address, " to ", unit) AS address
- CONCAT(street_address to unit) AS address
- CONCAT(street_address, " to ", unit) as address
The portion of your SQL statement used to instruct the database to combine these two columns into a new column called "address" is CONCAT(street_address, " to ", unit) AS address.
Question 6
Your database contains people who live in many areas of Wyoming. However, it’s important to align your in-house data with the data from Food Justice Rock Springs. You also need to separate your data into the two lists: Donation_Form_List and Postcard_List. They will be based on each city’s distance from Rock Springs.
The zip codes are in a column called zip_code. To select all data from the Donation_Form_List organized by zip code, you use the ORDER BY function. The syntax is:
SELECT * FROM Donation_Form_List ORDER BY zip_code
- True
- False
To organize your data by zip code, the correct query is:
SELECT * FROM Donation_Form_List ORDER BY zip_code
Question 7
You finish cleaning your datasets, so you decide to review Tayen’s email one more time to make sure you completed the task fully. It’s a good thing you checked because you forgot to identify people who have served on the board of directors or board of trustees. She wants to write them a thank-you note, so you need to locate them in the database.
To retrieve only those records that include people who have served on the board of trustees or on the board of directors, you use the WHERE function. The syntax is:
SELECT *
FROM Donation_From_List
WHERE Board_Member = "TRUE" AND Trustee = "TRUE"
- True
- False
To retrieve only those records that include people who have served on the board of trustees or on the board of directors, the syntax must include "OR." Including "AND" will only retrieve records of people who served on both boards. The syntax is:
SELECT * FROM Donation_From_List WHERE Board_Member = "TRUE" AND Trustee = "TRUE"
Scenario 2, continued
Question 8
Your company’s direct-mail campaign was very successful, and Food Justice Rock Springs has continued partnering with Directly Dynamic. One thing you’ve been working on is assigning all donors identification numbers. This will enable you to clean and organize the lists more effectively.
Meanwhile, another team member has been creating a prospect list that contains data about people who have indicated interest in getting involved with Food Justice Rock Springs. These people are also assigned a unique ID. Now, you need to compare your donor list with the dataset in your database and collect certain data from both.
What SQL function will return all records from the left table and only the matching records from the right?
- OUTER JOIN
- INNER JOIN
- RIGHT JOIN
- LEFT JOIN
A LEFT JOIN function will return all records from the left table and only the matching records from the right.
Question 9
Your next task is to identify the average contribution given by donors over the past two years. Tayen will use this information to set a donation minimum for inviting donors to an upcoming event.
You start with 2019. To return average contributions in 2019 (contributions_2019), you use the AVG function. What portion of your SQL statement will instruct the database to find this average and store it in the AvgLineTotal variable?
- AVG ("contributions_2019") IN AvgLineTotal
- AVG ("contributions_2019") AS AvgLineTotal
- AVG (contributions_2019) AS AvgLineTotal
- AVG (contributions_2019) = "AvgLineTotal"
To return average contributions in 2019, the correct portion of the SQL query is:
AVG (contributions_2019) AS AvgLineTotal
Question 10
Now that you provided her with the average donation amount, Tayen decides to invite 50 people to the grand opening of a new community garden. You return to your New Donor List spreadsheet to determine how much each donor gave in the past two years. You will use that information to identify the 50 top donors and invite them to the event.
What is the correct syntax to add the contribution amounts in cells O2 and P2?
- =SUM("O2,P2")
- =SUM(O2*P2)
- =SUM(O2/P2)
- =SUM(O2,P2)
To add cells O2 and P2, use the function =SUM(O2,P2). You can also use the formula =O2+P2.
Question 11
Tayen informs you that she’s thinking about inviting anyone who donated at least $100 in 2018, as well. However, she only has five open spaces. She asks you to report how many people gave at least $100 so she can determine if they can also be invited to the event.
What is the correct syntax to count how many donations of $100 or greater appear in Column Q (Contributions 2018)?
- =COUNTIF(Q2:Q210,">=100")
- =COUNTIF(Q2:Q210>=100)
- =COUNTIF(Q2:Q210,>=100)
- =COUNTIF(Q2:Q210">=100")
To count how many donations of $100 or greater appear in Column Q, the correct syntax is =COUNTIF(Q2:Q210,">=100").
Question 12
The community garden grand opening was a success. In addition to the 55 donors Food Justice Rock Springs invited, 20 other prospects attended the event. Now, Tayen wants to know the percentage of donations that came in that day from the new prospects compared to the original donors.
Which SQL query can be used to calculate the percentage of contributions from prospects?
- SELECT event_contributions, Total_donors, Total_prospects, ("Total_prospects" / "Total_donors" * 100) AS Prospects_Percent FROM contributions_data
- SELECT event_contributions, Total_donors, Total_prospects, (Total_prospects / Total_donors x 100) AS Prospects_Percent FROM contributions_data
- SELECT event_contributions, Total_donors, Total_prospects, (Total_prospects / Total_donors * 100) AS Prospects_Percent FROM contributions_data
- SELECT event_contributions, Total_donors, Total_prospects, (Total_prospects AND Total_donors = 100) AS Prospects_Percent FROM contributions_data
To identify the percentage of contributions from prospects, the correct query is:
SELECT event_contributions, Total_donors, Total_prospects, (Total_prospects / Total_donors * 100) AS Prospects_Percent FROM contributions_data
Question 13
Your team creates a highly effective prospects list for Food Justice Rock Springs. After a few months, many of these prospects become donors. Now, Tayen wants to know the top three cities in which these new donors live. She will use that information to determine if it’s still true that people who live closer to Rock Springs are more likely to donate.
What clause do you add to the following query to sort the donors in each city from high to low?
SELECT COUNT (DonorID), City
FROM new_donor_list
GROUP BY City
- ORDER BY CITY(DonorID) ASC
- ORDER BY COUNT(DonorID) ASC
- ORDER BY CITY(DonorID) DESC
- ORDER BY COUNT(DonorID) DESC
To retrieve the number of donors in each city, sorted high to low, the correct SQL query is:
SELECT COUNT (DonorID), City FROM new_donor_list GROUP BY City ORDER BY COUNT(DonorID) DESC
Share Data Through the Art of Visualization
Weekly Challenge 1
Question 1
A data analyst wants to create a visualization that demonstrates how often data values fall into certain ranges. What type of data visualization should they use?
- Scatter plot
- Histogram
- Correlation chart
- Line graph
To demonstrate how often data values fall into certain ranges, the data analyst should use a histogram.
Question 2
A data analyst notices that two variables in their data seem to rise and fall at the same time. They recognize that these variables are related somehow. What is this an example of?
- Causation
- Tabulation
- Visualization
- Correlation
When a data analyst notices that two variables rise and fall at the same time, this is an example of correlation. Correlation is the measure of the degree to which two variables change in relationship to each other.
Question 3
Fill in the blank: A data analyst creates a presentation for stakeholders. They include _____ visualizations because they want them to be interactive and automatically change over time.
- geometric
- aesthetic
- dynamic
- static
They include dynamic visualizations. Dynamic visualizations are interactive and can automatically change over time.
Question 4
What are the key elements of effective visualizations you should focus on when creating data visualizations? Select all that apply.
- Sophisticated use of contrast
- Refined execution
- Visual form
- Clear meaning
The elements for effective visualization are clear meaning, sophisticated use of contrast, and refined execution.
Question 5
Fill in the blank: Design thinking is a process used to solve problems in a _____ way.
- critical
- design-centric
- analytical
- user-centric
Design thinking is a process used to solve complex problems in a user-centric way.
Question 6
You are in the ideate phase of the design process. What are you doing at this stage?
- Generating visualization ideas
- Sharing data visualizations with a test audience
- Making changes to their data visualization
- Creating data visualizations
There are five phases of the design process: empathize, define, ideate, prototype, and test. The ideate phase is when you start to generate your data visualization ideas.
Question 7
A data analyst wants to make their visualizations more accessible by adding text explanations directly on the visualization. What is this called?
- Labeling
- Subtitling
- Simplifying
- Distinguishing
This is labeling. Labeling data directly instead of relying on legends can make data visualizations more accessible.
Question 8
Distinguishing elements of your data visualizations makes the content easier to see. This can help make them more accessible for audience members with visual impairments. What are some methods data analysts use to distinguish elements?
- Add a legend
- Ensure all elements are highlighted equally
- Separate the foreground and background
- Use contrasting colors and shapes
Data analysts distinguish elements of data visualizations by separating the foreground and background and using contrasting colors and shapes.
Weekly Challenge 2
Question 1
Fill in the blank: When using Tableau, people can control what data they see in a visualization. This is an example of Tableau being _____.
- interpretive
- interactive
- indefinable
- inanimate
People being able to control what data they see is an example of Tableau being interactive.
Question 2
A data analyst is using the Color tool in Tableau to apply a color scheme to a data visualization. They want the visualization to be accessible for people with color vision deficiencies, so they use a color scheme with lots of contrast. What does it mean to have contrast?
- The color scheme uses a range of different colors
- The color scheme is graphically pleasing
- The color scheme is monotone
- The color scheme is uniform
The data analyst makes sure the color scheme has contrast in order to make the visualization accessible for people with color vision deficiencies.
Question 3
What could a data analyst do with the Lasso tool in Tableau?
- Select a data point
- Zoom in on a data point
- Move a data point
- Pan across data points
A data analyst could use the Lasso tool to select a data point.
Question 4
A data analyst is using the Pan tool in Tableau. What are they doing?
- Moving a data point to another location in the visualization
- Rotating the perspective while keeping a certain object in view
- Deselecting a data point from within the visualization
- Taking a screenshot of the visualization
They are using the Pan tool to rotate the perspective while keeping a certain object in view.
Question 5
You are working with the World Happiness data in Tableau. To display the population of each country on the map, which Marks shelf tool do you use?
- Tooltip
- Detail
- Size
- Label
To display the population of each country on the map, you use the Label property.
Question 6
When working with the World Happiness data in Tableau, what could you use the Filter tool to do?
- Show only countries with a World Happiness score of 3.5 or lower
- Permanently delete countries without a happiness score
- Reformat every country in Asia
- Zoom out to reveal the entire world
You could use the Filter tool to show only those countries with a World Happiness score of 3.5 or lower.
Question 7
By default, all visualizations you create using Tableau Public are available to other users. What icon to you click to hide a visualization?
- Eye
- Show/Hide
- Close
- Source
To hide a visualization from other users, click the Eye icon.
Question 8
Fill in the blank: In Tableau, a _____ palette displays two ranges of values. It uses a color to show the range where a data point is from and color intensity to show its magnitude.
- diverging
- overlaying
- inverting
- contrasting
In Tableau, a diverging palette displays two ranges of values. It uses a color to show the range where a data point is from and color intensity to show its magnitude.
Weekly Challenge 3
Question 1
Engaging your audience, creating compelling visuals, and using an interesting narrative are all part of what practice?
- Data composition
- Data design
- Data strategy
- Data storytelling
Engaging your audience, creating compelling visuals, and using an interesting narrative are all part of data storytelling.
Question 2
A data analyst wants to communicate to others about their analysis. They ensure the communication has a beginning, a middle, and an end. Then, they confirm that it clearly explains important insights from their analysis. What aspect of data storytelling does this scenario describe?
- Takeaways
- Narrative
- Spotlighting
- Setting
This scenario describes the data storytelling narrative. An effective narrative has a beginning, a middle, and an end. It also clearly explains important insights from the analysis.
Question 3
You are preparing to communicate to an audience about an analysis project. You consider the roles that your audience members play and their stake in the project. What aspect of data storytelling does this scenario describe?
- Engagement
- Theme
- Discussion
- Takeaways
Considering the roles your audience members play and their stake in the project describes audience engagement. Engagement is capturing and holding someone’s interest and attention.
Question 4
When designing a dashboard, how can data analysts ensure that charts and graphs are most effective? Select all that apply.
- Include as many visual elements as possible
- Incorporate all of the data points from the analysis
- Make good use of available space
- Place them in a balanced layout
When designing a dashboard, data analysts can ensure that charts and graphs are most effective by placing them in a balanced layout and making good use of available space.
Question 5
A data analyst is creating a dashboard using Tableau. In order to layer objects over other items, which layout should they choose?
- Tiled
- Floating
- Itemized
- Layered
In order to layer objects over other items in a Tableau dashboard, they should choose a floating layout. Floating items can be layered over other objects.
Question 6
Which of the following are appropriate uses for filters in Tableau? Select all that apply.
- Highlighting individual data points
- Providing data to different users based on their particular needs
- Limiting the number of rows or columns in view
- Hiding outliers that do not support the hypothesis
Appropriate uses for filters in Tableau include highlighting individual data points, limiting the number of rows or columns in view, and providing data to different users based on their particular needs.
Question 7
A data analyst creates a dashboard in Tableau to share with stakeholders. They want to save stakeholders time and direct them to the most important data points. To achieve these goals, they can pre-filter the dashboard.
- True
- False
To achieve these goals, they can pre-filter the dashboard. Pre-filtering is useful because it saves time and effort while directing stakeholders to the most important data.
Question 8
An effective slideshow guides your audience through your main communication points. What are some best practices to use when writing text for a slideshow? Select all that apply.
- Choose a font size that audience members can read easily.
- Avoid slang terms.
- Use numerous different text colors and styles.
- Define unfamiliar abbreviations.
Best practices for writing text for a slideshow include choosing a readable font size, avoiding slang terms, and defining unfamiliar abbreviations.
Question 9
You are creating a slideshow for a client presentation. There is a pivot table in a spreadsheet that you want to include. In order for the pivot table to update whenever the spreadsheet source file changes, how should you incorporate it into your slideshow? Select all that apply.
- Insert a PDF of the pivot table
- Embed the pivot table
- Link the pivot table
- Copy and paste the pivot table
In order for the pivot table to update whenever the spreadsheet source file changes, you should link or embed it into the slideshow. This keeps the two files connected, so changes to the spreadsheet will automatically appear in your slideshow.
Weekly Challenge 4
Question 1
A data analyst gives a presentation about predicting upcoming investment opportunities. How does establishing a hypothesis help the audience understand their predictions?
- It visualizes the data clearly and concisely
- It provides context about the presentation’s purpose
- It describes the data thoroughly
- It summarizes the findings succinctly
Establishing a hypothesis provides the audience with context about the analyst’s presentation. In this scenario, it establishes what the analyst wants to prove or disprove about which investment opportunities are most promising.
Question 2
According to the McCandless Method, what is the most effective way to first present a data visualization to an audience?
- Introduce the graphic by name
- Answer obvious questions before they’re asked
- Tell the audience why the graphic matters
- State the insight of the graphic
According to the McCandless Method, the most effective way to introduce a data visualization is to state the name of the graphic.
Question 3
An analyst introduces a graph to their audience to explain an analysis they performed. Which strategy would allow the audience to absorb the data visualizations? Select all that apply.
- Starting with broad ideas
- Practicing breathing exercises
- Using the five-second rule
- Improving body language
When introducing a data visualization, an analyst can use the five-second rule to allow their audience to absorb the data visualizations presented. They can also start with broad ideas to simplify the explanation about the visualization’s purpose.
Question 4
You are preparing for a presentation and want to make sure your nerves don’t distract you from your presentation. Which practices can help you stay focused on an audience? Select all that apply.
- Use short sentences
- Speak as quickly and briefly as possible
- Be mindful of nervous habits
- Keep the pitch of your voice level
Some helpful ways to focus on an audience include being mindful of nervous habits, using short sentences, and speaking with an even pitch. By using these strategies, you can reduce the risk of getting distracted during your presentation.
Question 5
You run a colleague test on your presentation before getting in front of an audience. Your coworker asks a question about a section of your analysis, but addressing their concern would mean adding information you didn’t plan to include. How should you proceed with building your presentation?
- Expand your presentation by including the information
- Remove the section of the analysis that prompted the question
- Keep the concern in mind and anticipate that stakeholders may ask the same question
- Leave the presentation as-is
In this scenario, adding the information can help elaborate on important information. If your colleague has a question about your presentation, it is likely that your audience will too. Addressing concerns brought up during a Colleague Test can help you improve your presentation in ways you might not have anticipated.
Question 6
Your stakeholders are concerned about the source of your data. They are unfamiliar with the organization that ran the analyses you referenced in your presentation. Which kind of objection are they making?
- Data
- Presentation skills
- Analysis
- Findings
When a stakeholder is concerned about the source of your data, they are making an objection about your data. This is when someone objects to the source or relevance of the data you use.
Question 7
A stakeholder objects to the steps of your analysis. What are some appropriate ways to respond to this objection? Select all that apply.
- Explain why you think any discrepancies exist
- Take steps to investigate your analysis question further
- Communicate the assumptions you made in your analysis
- Defend the results of your analysis
When responding to a concerned or objecting stakeholder, you can communicate the assumptions you made to clarify if they are accurate. You can also explain why you think the discrepancies exist and promise to investigate the matter further.
Question 8
You are presenting to a large audience and want to keep everyone engaged during your Q&A. What can you do to ensure your audience doesn’t grow disinterested despite its size?
- Repeat your key findings
- Ask your audience for insights
- Wait longer for the audience to ask questions
- Keep your pitch level
One way to engage a large audience is to ask them if they know anything about the topic you’re presenting about. In a large audience, it is more likely that an audience member may have information or anecdotes to contribute. You can enrich the discussion if they would like to share their insights.
Course challenge
Scenario 1, questions 1-9
Question 1
You have been working as a junior data analyst at Bowling Green Business Intelligence for nearly a year. Your supervisor, Kate, tells you that she believes you are ready for more responsibility. She asks you to lead an upcoming client presentation. You will be responsible for creating the data story, identifying the right tools to use, building the slideshow, and delivering the presentation to stakeholders.
Your client is Gaea, an automotive manufacturer that makes eco-friendly electric cars. For the past year, you have been working with the data team in Gaea’s Bowling Green, Kentucky, headquarters. For the presentation, you will engage the data team, as well as its regional sales representatives and distributors. Your presentation will inform their business strategy for the next three-to-five years.
You begin by getting together with your team to discuss the data story you want to tell. You know the first step in data storytelling is to engage your audience.
You use spotlighting to help you identify the most important insights. Which of the following activities are involved with spotlighting? Select all that apply.
- Noticing repeated words or numbers
- Identifying connections or patterns
- Determining the data’s partiality
- Finding ideas or concepts that keep arising
Spotlighting enables data analysts to identify broad, universal ideas and messages. This may involve identifying connections or patterns, finding ideas or concepts that keep arising, or noticing repeated words or numbers.
Question 2
After you identify the most important insights, it’s time to create your primary message. Your team’s analysis has revealed three key insights:
- Electric vehicle sales demand is expected to grow by more than 400% by 2025.
- The number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Currently, there are many locations with so few charging stations that electric car owners would run out of power when traveling between stations.
- Vehicle battery range is also a significant factor for consumers. In 2020, the average battery range was 210 miles. However, the vast majority of survey respondents report they will not buy an electric car until the battery range is at least 300 miles per charge.
Based on these insights, you create your primary message. Which of the following reflect the expectations of a primary message?
- The number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Therefore, Gaea must begin building vehicle charging stations
- Electric vehicle sales demand is expected to grow by more than 400% by 2025. However, the number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Currently, there are many locations with so few charging stations that electric car owners would run out of power when traveling between stations. Vehicle battery range is also a significant factor for consumers. In 2020, the average battery range was 210 miles. However, the vast majority of people say they will not buy an electric car until the battery range is at least 300 miles per charge
- Electric vehicle demand is skyrocketing
- Although electric vehicle sales demand is on the rise, low availability of charging stations and short battery range are significant hurdles that Gaea must overcome
A primary message should be clear, direct, and succinct. Your primary message states: Although electric vehicle sales demand is on the rise, low availability of charging stations and short battery range are significant hurdles that Gaea must overcome.
Question 3
Next, you decide on your data narrative’s characters, setting, plot, big reveal, and aha moment. The characters are the people affected by your story. This includes your stakeholders, Gaea’s customers, and Gaea’s potential future customers. For the setting, you describe the current situation, potential tasks, and background information about the analysis project.
As you begin to work on the plot for the data narrative, which of the following ideas would you include? Select all that apply.
- How your data analysis can help Gaea solve its business problems
- The challenges associated with the current lack of vehicle charging stations
- A list of your recommendations and details about why they will help Gaea be successful
- Why it’s important for Gaea to increase its cars’ battery range by 2025
The plot, or conflict, is used to create tension in the data story. For Gaea’s situation, the plot would include two ideas: First, the plot should explain the challenges associated with lack of vehicle charging stations. Second, it should address why it’s important for Gaea to increase its cars’ battery range by 2025.
Question 4
Now, it’s time to consider which tools to use to create data visualizations that will clearly communicate the results of your analysis. You and your team decide to make both spreadsheet charts and Tableau data visualizations. In addition, you agree to build a dashboard to share live, incoming data with your stakeholders. This will help them achieve the following goals:
- Organize multiple datasets about electric vehicle battery ranges into a central location
- Enable tracking and analysis of electric vehicle data
- Simplify data visualizations about the number of available charging stations using maps of the different geographies
Another key benefit of dashboards is that they enable you to maintain control of your data narrative.
- True
- False
Sharing dashboards with stakeholders likely means that you will lose control of the narrative. This is because you won’t be there to tell the story of your data and share your key message.
Question 5
Now that you have finished planning the data story with your team, it’s time to create data visualizations. First, you consider electric vehicle sales worldwide in 2015 compared to 2020. You use a spreadsheet to create the following bar graph to compare the two values:
You add information on the x-axis to represent a scale of values for the total electric vehicle sales and on the y-axis to represent the time periods (2015 and 2020).
- True
- False
In bar graphs with vertical bars, the x-axis is used to represent time periods, categories, or other variables. The y-axis is used to represent a scale of values for the variables.
Question 6
Next, you explore how access to public car-charging stations is influencing electric vehicle purchases. As your analysis has revealed, there are many areas without enough places for people to plug in and charge their cars. This lack of charging stations has a negative impact on demand for electric cars and overall vehicle sales.
You use Tableau to create the following draft of a visualization, which organizes the charging station data geographically:
After reviewing your draft, you realize that it could be improved.
Fill in the blank: To improve your draft, you select more varied hues and make the color intensity stronger. In addition, you choose darker _____ in order to reflect more light.
- views
- variables
- visuals
- values
Value indicates how much light is being reflected.
Question 7
Now, you want to highlight what your team’s analysis discovered about the number of charging stations available compared to the number of cars purchased. Your data has confirmed that the lack of charging stations causes the effect of fewer car sales. To communicate this effectively, you will need to convey causation to the stakeholders.
You explain that causation is the measure of the degree to which two variables move in relationship to each other. In the case of Gaea’s business, charging station numbers and car sales move in the same direction.
- True
- False
You explain that causation is when an action directly leads to an outcome, such as a cause-effect relationship. In the case of Gaea’s business, the lack of charging stations directly leads to the outcome of fewer car sales.
Question 8
Once you finish creating data visualizations about the current state of the electric vehicle market, you turn to projections for the future. You want to communicate to stakeholders about the importance of longer vehicle battery range to consumers.
Your team’s data includes feedback from a consumer survey that investigated the importance of longer battery when choosing whether to purchase an electric car. The current average battery range is about 210 miles. By 2025, that distance is expected to grow to 450 miles per charge.
You create the following pie chart:
Fill in the blank: After reviewing your pie chart, you realize that it could be improved. You resize the _____ so they visually show the different values.
- axes
- segments
- labels
- values
To make this chart more effective, you resize the segments so they visually show the different values. When the segments are all the same size, even though they represent different values, this will confuse the audience.
Question 9
It’s time to build your Tableau dashboard for stakeholders. You consider what type of layout to use.
Describe the differences between vertical and horizontal layouts. Select all that apply.
- Vertical layouts adjust the height of the views and objects contained
- Vertical layouts prevent items from being layered over other objects
- Horizontal layouts adjust the width of the views and objects contained
- Horizontal layouts prevent items from being layered over other objects
Vertical layouts adjust the height of the views and objects contained. Horizontal layouts adjust the width of the views and objects contained.
Scenario 2, questions 10-15
Question 10
You have created your narrative and visuals, so now it’s time to build a professional and appealing slideshow. You choose a theme that matches the tone of your presentation. Then, you create a title slide with a title, subtitle, and the date.
Next, you create the following slide to communicate information about electric vehicle sales in 2015 compared to 2020:
To improve the slide, you remove the text box at the bottom. For what reasons will this make your slide more effective? Select all that apply.
- Slide text should be no more than 10 lines total
- The text shouldn’t simply repeat the words you say
- The font size is too small for your audience to read
- Slide text should be fewer than 25 words total
Removing the text box at the bottom improves your slide in three ways: First, it eliminates text with a font size that is too small to read. Second, it reduces the slide’s word count to fewer than 25 words. Third, it ensures that the text does not simply repeat the words you say.
Question 11
You then create the following slide to demonstrate the challenges associated with battery range and charging stations:
After reviewing your slide, you realize that the visual elements could be improved. You do this by first choosing one data visualization to share on this slide, then create another slide for the second data visualization.
In addition, you make sure to use _____ font sizes and colors for all of your data visualization titles.
- colorful
- consistent
- different
- unique
To make the visual elements more effective, you use a consistent font size and color for data visualization titles.
Question 12
You complete your slideshow and share it with your team. Once it is approved by your supervisor, you begin preparing to give your presentation. You consider maintaining good posture, being aware of nervous habits, and making eye contact. In addition, you think about how you will speak.
What strategies can help you speak effectively? Select all that apply.
- Using short words and sentences
- Speaking quickly so you are sure to have time to include all important data points
- Building in intentional pauses to give your audience time to think about what you have just said
- Keeping the pitch of your sentences level so that your statements are not confused for questions
To speak effectively, you should practice using short words and sentences, keeping the pitch of your sentences level, and building in intentional pauses.
Question 13
Next, you prepare for the question-and-answer session that will follow your presentation. To predict what questions they may ask, you do a colleague test of your presentation. You should choose a colleague who has deep expertise in the electric vehicle industry.
- True
- False
You choose a colleague who has no previous knowledge of the industry. This will help you confirm that you aren’t making any assumptions or including jargon your audience might not understand.
Question 14
Now that you have some idea of the questions the stakeholders will ask, you and a team member consider different objections that might arise.
Your team member asks you how you will respond if someone from Gaea questions your data-cleaning process. How do you prepare for this objection? Select all that apply.
- Be prepared to explain why data cleaning is not relevant at this stage of the project
- Keep a detailed log of your data-cleaning process
- Practice answering questions about your data-cleaning process
- Add your data-cleaning log to the slideshow appendix
You prepare by keeping a detailed log of your data-cleaning process. Then, you add your data-cleaning log to the slideshow appendix and practice answering questions about your data-cleaning process.
Question 15
Scenario 2, continued
As a final step in the data-sharing process, you think about how to respond during the Q&A session. What strategies will you employ when answering questions? Select all that apply.
- Provide detailed, comprehensive responses
- Understand the context of the question
- Listen to the whole question, and repeat it, if necessary
- Involve your whole audience
When answering questions, you should listen to the whole question, and repeat it, if necessary. You should also be sure you understand the context of the question and involve the whole audience.
Data Analysis with R Programming
Weekly Challenge 1
Question 1
A data analyst uses words and symbols to give instructions to a computer. What are the words and symbols known as?
- Syntax language
- Function language
- Programming language
- Coded language
Programming languages are the words and symbols you use to write instructions for computers to follow.
Question 2
Many data analysts prefer to use a programming language for which of the following reasons? Select all that apply.
- To choose a topic for analysis
- To easily reproduce and share an analysis
- To clarify the steps of an analysis
- To save time
Many data analysts prefer to use a programming language in order to easily reproduce and share an analysis, save time, and clarify the steps of an analysis.
Question 3
Which of the following are benefits of open-source code? Select all that apply.
- Anyone can fix bugs in the code
- Anyone can create an add-on package for the code
- Anyone can pay a fee for access to the code
- Anyone can use the code for free
The benefits of open-source code include the following: anyone can use the code for free, fix bugs in the code, and create add-on packages for the code.
Question 4
Fill in the blank: The benefits of using _____ for data analysis include the ability to quickly process lots of data and create high quality visualizations.
- the R programming language
- a dashboard
- a spreadsheet
- structured query language
The benefits of using the R programming language for data analysis include the ability to quickly process lots of data and create high quality visualizations.
Question 5
A data analyst needs to quickly create a series of scatterplots to visualize a very large dataset. What should they use for the analysis?
- Structured query language
- A slide presentation
- A dashboard
- R programming language
The analyst should use the R programming language to quickly create a series of scatterplots to visualize a very large dataset. R can quickly process lots of data and create high quality visualizations.
Question 6
RStudio’s integrated development environment lets you perform which of the following actions? Select all that apply.
- Install R packages
- Create data visualizations
- Import data from spreadsheets
- Stream online videos
RStudio’s integrated development environment lets you install R packages, import data from spreadsheets, and create data visualizations.
Question 7
In which two parts of RStudio can you execute code? Select all that apply.
- The environment pane
- The plots pane
- The source editor pane
- The R console pane
In RStudio, you can execute code in the R console pane and the source editor pane.
Question 8
Fill in the blank: In RStudio, the _____ is where you can find all the data you currently have loaded, and can easily organize and save it.
- environment pane
- plots pane
- R console pane
- source editor pane
In RStudio, the environment pane is where you can find all the data you currently have loaded, and can easily organize and save it.
Weekly Challenge 2
Question 1
Which of the following is an example of a piece of R code that contains both a function and an argument?
print("peaches")
weekly_sales <- 7450
#filter
mass > 1000
The piece of code
print("peaches")
is an example of R code that contains a function and an argument. The function is print and the argument in parentheses ("peaches"
) follows the function.
Question 2
A data analyst is assigning a variable to a value in their company’s sales dataset for 2020. Which variable name uses the correct syntax?
_2020sales
sales_2020
-sales-2020
2020_sales
The variable with the correct syntax is sales_2020. A variable name in R may contain numbers and underscores as well but not as the first character.
Question 3
You want to create a vector with the values 12, 23, 51, in that exact order. After specifying the variable, what R code chunk allows you to create the vector?
v(12, 23, 51)
c(12, 23, 51)
c(51, 23, 12)
v(51, 23, 12)
The code chunk
c(12, 23, 51)
allows you to create a vector with the values 12, 23, 51. A vector is a group of data elements of the same type stored in a sequence in R. You can create a vector by putting the values you want inside the parentheses of the combine function
Question 4
An analyst comes across dates listed as strings in a dataset, for example December 10th, 2020. To convert the strings to a date/time data type, which function should the analyst use?
- mdy()
- now()
- datetime()
- lubridate()
To convert the strings to date/time data types, the analyst should use the function mdy(). The mdy() function and other variations of the ymd() function convert string dates and times into date/time data types that are compatible with R.
Question 5
A data analyst inputs the following code in RStudio:
sales_1 <- (3500.00 * 12)
Which of the following types of operators does the analyst use in the code? Select all that apply.
- Assignment
- Arithmetic
- Logical
- Relational
In the code
sales_1 <- (3500.00 * 12)
, the analyst uses an assignment (<-
) and an arithmetic (*
) operator. The assignment operator assigns the calculated value in parentheses to the variable sales_1 and the arithmetic operator multiplies the values in parentheses to complete the calculation.
Question 6
A data analyst is deciding on naming conventions for an analysis that they are beginning in R. Which of the following rules are widely accepted stylistic conventions that the analyst should use when naming variables? Select all that apply.
- Use single letters, such as "x" to name all variables
- Use an underscore to separate words within a variable name
- Use all lowercase letters in variable names
- Begin all variable names with an underscore
The analyst should use all lowercase letters in variable names and should separate words with underscores. These are widely accepted stylistic conventions that help keep code readable.
Question 7
Which of the following are included in R packages? Select all that apply.
- Tests for checking your code
- Sample datasets
- Reusable R functions
- Naming conventions for R variable names
R packages include reusable R functions, sample datasets, and tests for checking your code. R packages also include documentation about how to use the included functions.
Question 8
Packages installed in RStudio are called from CRAN. CRAN is an online archive with R packages and other R-related resources.
- True
- False
Packages installed in RStudio are called from CRAN. CRAN is an online archive with R packages and other R-related resources.
Question 9
When programming in R, what is a pipe used as an alternative for?
- Variable
- Vector
- Nested function
- Installed package
A pipe can be used as an alternative for a nested function. You can use both pipes and nested functions to complete multiple operations on data. However, a pipe is often the preferred method because it makes your code easier to read and understand
Weekly Challenge 3
Question 1
A data analyst is creating a new data frame. Their dataset has dates, currency, and text strings. What characteristic of data frames is this an instance of?
- Data stored can be many different types
- Columns should contain the same number of items
- Columns should be named
- Variables should be named
A data frame is a collection of columns. Characteristics of data frames include: all columns should be named, data stored can be many different types, and all columns should contain the same number of items. The dataset in question has a variety of data types, which is related to the idea that data stored can be many different types.
Question 2
A data analyst is considering using tibbles instead of basic data frames. What are some of the limitations of tibbles? Select all that apply.
- Tibbles can overload a console
- Tibbles can never create row names
- Tibbles won’t automatically change the names of variables
- Tibbles can never change the input type of the data
Tibbles are useful when working with large datasets because they make printing easier. But tibbles can never change the input type of the data, create row names, or change the names of variables.
Question 3
A data analyst is working with a large data frame. It contains so many columns that they don’t all fit on the screen at once. The analyst wants a quick list of all of the column names to get a better idea of what is in their data. What function should they use?
- colnames()
- head()
- str()
- mutate()
The
colnames()
function will return a list of all the column names in a data frame for easy reference.
Question 4
A data analyst is working with the ToothGrowth dataset in R. What code chunk will allow them to get a quick summary of the dataset?
glimpse(ToothGrowth)
min(ToothGrowth)
separate(ToothGrowth)
colnames(ToothGrowth)
The code chunk is
glimpse(ToothGrowth)
. Theglimpse()
function provides the analyst with a quick summary of the data in the ToothGrowth dataset. This function shows what all of the column names are and how many rows there are.
Question 5
A data analyst is working with the penguins dataset. What code chunk does the analyst write to make sure all the column names are unique and consistent and contain only letters, numbers, and underscores?
drop_na(penguins)
clean_names(penguins)
rename(penguins)
select(penguins)
The code chunk is
clean_names(penguins)
. Theclean_names()
function ensures that there are only characters, numbers, and underscores in the names used in the data frame.
Question 6
A data analyst is working with the penguins data. They write the following code:
penguins %>%
The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. What code chunk does the analyst add to create a data frame that only includes the Gentoo species?
filter(Gentoo == species)
filter(species <- "Gentoo")
filter(species == "Gentoo")
filter(species == "Adelie")
The code chunk is filter(species == "Gentoo"). The filter function allows the data analyst to specify which part of the data they want to view. Two equal signs in an argument mean "exactly equal to." Using this operator instead of the assignment operator <- calls only the data about Gentoo penguins to the dataset.
Question 7
A data analyst is working with the penguins dataset. They write the following code:
penguins %>%
group_by(species) %>%
What code chunk does the analyst add to find the mean value for the variable body_mass_g?
summarize(=body_mass_g)
summarize(max(body_mass_g))
summarize(mean(body_mass_g))
summarize(body_mass_g(mean))
The code chunk is
summarize(mean(body_mass_g))
. Thesummarize
function gives high-level information about a dataset.
Question 8
A data analyst is working with a data frame named salary_data. They want to create a new column named wages that includes data from the rate column multiplied by 40. What code chunk lets the analyst create the wages column?
mutate(salary_data, rate = wages * 40)
mutate(wages = rate * 40)
mutate(salary_data, wages = rate * 40)
mutate(salary_data, wages = rate + 40)
The code chunk is
mutate(salary_data, wages = rate * 40)
. The analyst can use the mutate() function to create a new column called wages that includes data from the rate column multiplied by 40. The mutate() function can create a new column without affecting any existing columns.
Question 9
A data analyst is working with a data frame named customers. It has separate columns for area code (area_code) and phone number (phone_num). The analyst wants to combine the two columns into a single column called phone_number, with the area code and phone number separated by a hyphen. What code chunk lets the analyst create the phone_number column?
unite(customers, area_code, phone_num, sep="-")
unite(customers, "phone_number", area_code, phone_num)
unite(customers, "phone_number", area_code, sep="-")
unite(customers, "phone_number", area_code, phone_num, sep="-")
The code chunk
unite(customers, "phone_number", area_code, phone_num, sep="-")
. lets the analyst create the phone_number column. Theunite()
function lets the analyst combine the area code and phone number data into a single column. In the parentheses of the function, the analyst writes the name of the data frame, then the name of the new column in quotation marks, followed by the names of the two columns they want to combine. Finally, the argumentsep="-"
places a hyphen between the area code and phone number data in the phone_number column.
Question 10
A data analyst wants to summarize their data with the sd(), cor(), and mean(). What kind of measures are these?
- Statistical
- Numerical
- Summary
- Standard
Standard deviation, correlation, mean, maximum, and minimum are statistical measures which can be used to summarize data.
Question 11
In R, which statistical measure demonstrates how strong the relationship is between two variables?
- Standard deviation
- Correlation
- Average
- Maximum
Correlation measures how strong the relationship between two variables is. This is represented by the cor() function.
Question 12
A data analyst is studying weather data. They write the following code chunk:
bias(actual_temp, predicted_temp)
What will this code chunk calculate?
The minimum difference between the actual and predicted values
The maximum difference between the actual and predicted values
The average difference between the actual and predicted values
The total average of the values
The bias() function can be used to calculate the average amount a predicted outcome and actual outcome differ in order to determine if the data model is biased.
Weekly Challenge 4
Question 1
Which of the following are benefits of using ggplot2? Select all that apply.
- Automatically clean data before creating a plot
- Easily add layers to your plot
- Combine data manipulation and visualization
- Customize the look and feel of your plot
The benefits of using ggplot2 include easily adding layers to your plot, customizing the look and feel of your plot, combining data manipulation and visualization.
Question 2
In ggplot2, what symbol do you use to add layers to your plot?
- The equal sign (=)
- The ampersand symbol (&)
- The pipe operator (%>%)
- The plus sign (+)
In ggplot2, you use the plus sign (+) to add layers to your plot.
Question 3
A data analyst creates a plot using the following code chunk:
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
Which of the following represents a variable in the code chunk? Select all that apply.
body_mass_g
x
flipper_length_mm
y
The two variables in the code are flipper_length_mm and body_mass_g. The two variables are part of the penguins dataset. The aesthetic x maps the variable flipper_length_mm to the x-axis of the plot. The aesthetic y maps the variable body_mass_g to the y-axis of the plot.
Question 4
A data analyst uses the aes() function to define the connection between their data and the plots in their visualization. What argument is used to refer to matching up a specific variable in your data set with a specific aesthetic?
- Faceting
- Mapping
- Jittering
- Annotating
Mapping is an argument that matches up a specific variable in your data set with a specific aesthetic. You use the aes() function to define the mapping between your data and your plot.
Question 5
A data analyst is working with the penguins data. The analyst creates a scatterplot with the following code:
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g,alpha = species))
What does the alpha aesthetic do to the appearance of the points on the plot?
- Makes some points on the plot more transparent
- Makes the points on the plot more colorful
- Makes the points on the plot smaller
- Makes the points on the plot larger
The alpha aesthetic makes some points on a plot more transparent, or see-through, than others.
Question 6
You are working with the penguins dataset. You create a scatterplot with the following code chunk:
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
How do you change the second line of code to map the aesthetic size to the variable species?
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, species = size)
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, size = species))
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, species + size)
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, size + species))
You change the second line of code to
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, size = species))
to map the aesthetic size to the variable species. Inside the parentheses of the aes() function, add a comma after y = body_mass_g to add a new aesthetic attribute, then write size = species to map the aesthetic size to the variable species. The data points for each of the three penguin species will now appear in different sizes.
Question 7
Fill in the blank: The _____ creates a scatterplot and then adds a small amount of random noise to each point in the plot to make the points easier to find.
- geom_bar() function
- geom_jitter() function
- geom_smooth() function
- geom_point() function
The
geom_jitter()
function creates a scatterplot and then adds a small amount of random noise to each point in the plot to make the points easier to find.
Question 8
You have created a plot based on data in the diamonds dataset. What code chunk can be added to your existing plot to create wrap around facets based on the variable color?
facet_wrap(~color)
facet_wrap(color)
facet_wrap(color~)
facet(~color)
The code chunk is
facet_wrap(~color)
. Inside the parentheses of the facet_wrap() function, type a tilde symbol (~) followed by the name of the variable you want to facet.
Question 9
A data analyst uses the annotate() function to create a text label for a plot. Which attributes of the text can the analyst change by adding code to the argument of the annotate() function? Select all that apply.
- Change the size of the text
- Change the font style of the text
- Change the color of the text
- Change the text into a title for the plot
By adding code to the argument of the annotate() function, the analyst can change the font style, color, and size of the text.
Question 10
You are working with the penguins dataset. You create a scatterplot with the following lines of code:
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
What code chunk do you add to the third line to save your plot as a jpeg file with "penguins" as the file name?
ggsave(penguins)
ggsave("penguins.jpeg")
ggsave(penguins.jpeg)
ggsave("jpeg.penguins")
You add the code chunk
ggsave("penguins.jpeg")
to save your plot as a jpeg file with "penguins" as the file name. Inside the parentheses of the ggsave() function, type a quotation mark followed by the file name (penguins), then a period, then the type of file (jpeg), then a closing quotation mark.
Weekly Challenge 5
Question 1
A data analyst wants to create a shareable report of their analysis with documentation of their process and notes explaining their code to stakeholders. What tool can they use to generate this?
- Code chunks
- Filters
- Dashboards
- R Markdown
R Markdown is a file format for making dynamic documents with R. R Markdown documents can be used to save, organize, and document code; create a record of your cleaning process; and generate reports with executable code for stakeholders.
Question 2
Fill in the blank: R Markdown notebooks can be converted into HTML, PDF, and Word documents, slide presentations, and _____.
- dashboards
- spreadsheets
- tables
- YAML
R Markdown notebooks can be converted into HTML, PDF, and Word documents, slide presentations, and dashboards.
Question 3
A data analyst notices that their header is much smaller than they wanted it to be. What happened?
- They have too few hashtags
- They have too few asterisks
- They have too many hashtags
- They have too many asterisks
Hashtags can be used to change the font size of headers. The more hashtags you add, the smaller the header.
Question 4
A data analyst wants to include a line of code directly in their .rmd file in order to explain their process more clearly. What is this code called?
- Inline code
- YAML
- Documented
- Markdown
Inline code is code that can be inserted directly into a .rmd file.
Question 5
What symbol can be used to add bullet points in R Markdown?
- Backticks
- Asterisks
- Brackets
- Exclamation marks
Asterisks can be used to add bullet points to an .rmd file. Hyphens can also be used.
Question 6
A data analyst adds a section of executable code to their .rmd file so users can execute it and generate the correct output. What is this section of code called?
- Data plot
- YAML
- Documentation
- Code chunk
Code added to a .rmd file is usually referred to as a code chunk. Code chunks allow users to execute R code from within the .rmd file.
Question 7
A data analyst is inserting a line of code directly into their .rmd file. What will they use to mark the beginning and end of the code?
- Hashtags
- Delimiters
- Asterisks
- Markdown
A delimiter is a character that indicates the beginning or end of a data
Question 8
If an analyst creates the same kind of document over and over or customizes the appearance of a final report, they can use _____ to save them time.
- a filter
- a template
- an .rmd file
- a code chunk
A template can save time when creating the same kind of document over and over or when customizing the appearance of a final report.
Course challenge
Scenario 1, questions 1-7
Question 1
As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.
Your current client is Chocolate and Tea, an up-and-coming chain of cafes.
The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.
Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.
They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.
Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.
You create a short document about the benefits of using R for the project and share the document with your team. You write that the benefits include R’s ability to quickly process lots of data and easily reproduce and share an analysis. What is another benefit of using R for the project?
- Choose a topic for analysis
- Define a problem and ask the right questions
- Automatically clean data
- Create high-quality visualizations
Another benefit of using R for the project is R’s ability to create high-quality data visualizations.
Question 2
Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load the tidyverse library. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.
You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is bars_df and the .csv file is in the working directory. What code chunk lets you create the data frame?
bars_df <- read_csv("flavors_of_cacao.csv")
bars_df = read_csv("flavors_of_cacao.csv")
read_csv("flavors_of_cacao.csv") = bars_df
bars_df %>% read_csv("flavors_of_cacao.csv")
The code chunk
bars_df <- read_csv("flavors_of_cacao.csv")
lets you create the data frame. In this code chunk:
bars_df
is the name of the data frame that will store the data.<-
is the assignment operator to assign values to the data frame.read_csv()
is the function that will import the data to the data frame."flavors_of_cacao.csv"
is the file name that read.csv() function takes for its argument.
Question 3
Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.
Assume the name of your data frame is flavors_df. What code chunk lets you get a glimpse of the contents of the data frame?
glimpse(flavors_df)
glimpse %>% flavors_df
glimpse = flavors_df
glimpse <- flavors_df
You write the code chunk
glimpse(flavors_df)
. In this code chunk:
glimpse()
is the function that will give you a glimpse of the contents of the data frame, and give you high-level information like column names and the type of data contained in those columns.flavors_df
is the name of the data frame that the glimpse() function takes for its argument.
Question 4
Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company...Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Maker (without a period at the end).
Assume the first part of your code chunk is:
flavors_df %>%
What code chunk do you add to change the column name?
select(Rating, Cocoa.Percent, Company.Location)
filter(Rating & Cocoa.Percent & Company.Location)
arrange(Rating + Cocoa.Percent + Company.Location)
summarize(Rating, Cocoa.Percent, Company.Location)
You write the code chunk
select(Rating, Cocoa.Percent, Company.Location)
. In this code chunk:
select()
is the function that lets you select specific variables for your new data frame.select()
takes the names of the variables you want to choose as its argument: Rating, Cocoa.Percent, Company.Location.
Question 5
After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Company.Location. You decide to use a function to create a new data frame with only these three variables.
Assume the first part of your code chunk is:
trimmed_flavors_df <- flavors_df %>%
What code chunk do you add to choose the three variables?
select(Rating, Cocoa.Percent, Company.Location)
filter(Rating & Cocoa.Percent & Company.Location)
arrange(Rating + Cocoa.Percent + Company.Location)
summarize(Rating, Cocoa.Percent, Company.Location)
You write the code chunk
select(Rating, Cocoa.Percent, Company.Location)
. In this code chunk:
select()
is the function that lets you select specific variables for your new data frame.select()
takes the names of the variables you want to choose as its argument: Rating, Cocoa.Percent, Company.Location.
Question 6
Next, you select the basic statistics that can help your team better understand the ratings system in your data.
Assume the first part of your code chunk is:
trimmed_flavors_df %>%
What code chunk do you add to determine the mean rating for your data?
summarize(mean(Rating))
arrange <- mean(Rating)
arrange(mean, Rating)
summarize %>% mean(Rating))
You write the code chunk
summarize(mean(Rating))
. In this code chunk:
summarize()
is the function that lets you display summary statistics.- In this case, you calculate the mean statistic for the variable Rating.
Question 7
After completing your analysis of the rating system, you determine that any rating equal to or greater than 3.9 can be considered a high rating. You also know that Chocolate and Tea considers any bar that contains at least 75% cocoa to be super dark chocolate. You decide to use code to find out which chocolate bars meet these two conditions.
Assume the first part of your code chunk is:
best_trimmed_flavors <- trimmed_flavors_df %>%
What code chunk do you add to filter the data frame for chocolate bars that contain at least 75% cocoa and have a rating of at least 3.9 points?
filter(Cocoa.Percent >= 75, Rating > 3.9)
filter(Cocoa.Percent > 75, Rating > 3.9)
filter(Cocoa.Percent == 75, Rating >= 3.9)
filter(Cocoa.Percent >= 75, Rating >= 3.9)
You write the code chunk filter(Cocoa.Percent >= 75, Rating >= 3.9). In this code chunk:
filter()
is the function that lets you filter your data frame based on specific criteria.Cocoa.Percent
andRating
refer to the variables you want to filter.- The
>=
operator signifies "greater than or equal to." The new data frame will show all the values of Cocoa.Percent greater than or equal to 75, and all the values of Rating greater than or equal to 3.9.
Scenario 2, questions 8-13
Question 8
Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals.
Assume the first part of your code chunk is:
ggplot(data = best_trimmed_flavors_df) +
What code chunk do you add to the second line to create a bar chart with the variable Company.Location on the x-axis?
geom_bar(mapping = aes(x <- Company.Location))
geom_bar(mapping = x(Company.Location))
geom_bar(aes(Company.Location))
geom_bar(mapping = aes(x = Company.Location))
You write the code chunk geom_bar(mapping = aes(x = Company.Location))
. In this code chunk:
geom_bar()
is the geom function that uses bars to create a bar chart.- Inside the parentheses of the
aes()
function, the codex = Company.Location
maps thex
aesthetic to the variableCompany.Location
.Company.Location
will appear on the x-axis of the plot.- By default, R will put a count of the variable
Company.Location
on the y-axis.
Question 9
Your bar chart reveals the locations that produce the highest rated chocolate bars. To get a better idea of the specific rating for each location, you’d like to highlight each bar.
Assume that you are working with the code chunk:
ggplot(data = best_trimmed_flavors_df) +
geom_bar(mapping = aes(x = Company.Location))
How do you change the second line of code to outline each bar with a different color?
geom_bar(mapping = aes(x = Company.Location, alpha = Rating))
geom_bar(mapping = aes(x = Company.Location, color = Rating))
geom_bar(mapping = aes(x = Company.Location, fill = Rating)
geom_bar(mapping = aes(x = Company.Location, size = Rating)
You change the second line of code to
geom_bar(mapping = aes(x = Company.Location, color = Rating))
to outline each bar with a different color. In this code chunk:
- Inside the parentheses of the aes() function, add a comma after x = Company.Location to add a new aesthetic attribute, then write color = Rating to map the aesthetic color to the variable Rating.
- The specific rating of each location will appear as a specific color that outlines each bar of your bar chart.
Question 10
A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.
Assume your teammate shares the following code chunk:
ggplot(data = best_trimmed_flavors_df) +
geom_bar(mapping = aes(x = Company)) +
What code chunk do you add to the third line to create wrap around facets of the variable Company?
facet_wrap(=Company)
facet(Company)
facet_wrap(+Company)
facet_wrap(~Company)
You write the code chunk facet_wrap(~Company). In this code chunk:
facet_wrap()
is the function that lets you create wrap around facets of a variable.- Inside the parentheses of the
facet_wrap()
function, type a tilde symbol (~) followed by the name of the variable (Company
).
Question 11
Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.
Assume the first part of your code chunk is:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to add the title Recommended Bars to your plot?
labs(title = Recommended Bars)
labs("Recommended Bars")
labs(title + "Recommended Bars")
labs(title = "Recommended Bars")
You write the code chunk labs(title = "Recommended Bars"). In this code chunk:
labs()
is the function that lets you add a title to your plot.- In the parentheses of the labs() function, write the word title, then an equals sign, then the specific text of the title in quotation marks (
"Recommended Bars"
).
Question 12
Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.
Assume your first two lines of code are:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to save your plot as a pdf file with "chocolate" as the file name?
ggsave(chocolate.pdf)
ggsave("pdf.chocolate")
ggsave("chocolate.pdf")
ggsave("chocolate.png")
You add the code chunk
ggsave("chocolate.pdf")
to save your plot as a pdf file with "chocolate" as the file name. In this code chunk:
- Inside the parentheses of the ggsave() function, type a quotation mark followed by the file name (chocolate), then a period, then the type of file format (pdf), then a closing quotation mark.
Question 13
As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.
Fill in the blank: You want to record and share every step of your analysis, let teammates run your code, and display your visualizations. You decide to create _____ to document your work.
- a data frame
- an R Markdown notebook
- a spreadsheet
- a database
You use an R Markdown notebook to document your work. The notebook lets you record and share every step of your analysis, lets your teammates run your code, and displays your visualizations.
Google Data Analytics Capstone: Complete a Case Study
Professional case studies
Question 1
Fill in the blank: A _____ is a collection of case studies that you can share with potential employers.
- personal website
- portfolio
- problem statement
- capstone
A portfolio is a collection of case studies that you can share with potential employers. A capstone is a final project that brings everything you’ve learned together.
Question 2
Which of the following are important strategies when completing a case study? Select all that apply.
- Document the steps you’ve taken to reach your conclusion
- Communicate the assumptions you made about the data
- Answer the question being asked
- Use a programming language
When completing a case study, it’s important to answer the question being asked. It’s also important to communicate the steps you’ve taken to reach your conclusion and the assumptions you made about the data.
Question 3
To successfully complete a case study, your answer to the question the case study asks has to be perfect.
- True
- False
To successfully complete a case study, your answer to the question the case study asks does not have to be perfect. It’s more important to show off your thought process so that the interviewers can understand how you approach the problem.
Question 4
Which of the following are qualities of the best portfolios for a junior data analyst? Select all that apply.
- Personal
- Simple
- Large
- Unique
The best portfolios are personal, unique, and simple. Your portfolio’s a chance to show people who you are and what you’re interested in. You want to keep your portfolio pretty simple, and focus on your skills as a data analyst.
Question 5
Which of the following are places where you can store and share your portfolio? Select all that apply.
- RStudio
- Tableau
- Kaggle
- GitHub
Portfolios can be stored and shared on public websites, including Github, Kaggle and Tableau, or on your personal website.