Data Questions in the AP Computer Science Principles AP CSP Exam

student with ap csp exam prep

AP CSP Exam Preparation

In this series of articles we will explain some of the sample questions that are provided to help AP CSP students help with exam preparation.

Data and Information

This article focuses on the AP CSP exam questions related to data.

The AP CSP exam will contain questions about data. Common questions include tables with data that you need to be able to understand, with questions to test this understanding. We have examples below.

Other areas include big data, data in cloud storage, data security, data analysis, data mining and metadata.

Sample Exam Questions

We have taken examples from the sample questions from the two practice question sets from 2021 and from previous practice exercises provided by the college board.

Although it does not mean these are going to be in the exam, it is assumed that the primary areas of computing, programming, networks, data and algorithms will form most of the CSP exam.

These articles will present and explain questions and answers in these areas.

What type of data questions will be in the CSP exam?

Nobody knows. But we do know that during the AP CSP course we have learn about data and how to use data..

Practice Exam questions

Read the questions and check your answer with the explanation below. We want to help everybody so if you think the question is easy then move on to the next question without reading the solution if you wish.

Some questions have more than one answer – these multiple answer questions are also in the AP CSP exam.

Table data

A soccer league tracks certain stats by team as seen in the table above. Which of the following CANNOT be determined by the data?
  • A. If a team wins more often when the number of red cards is less than 1 per game
  • B. If a team wins more when they were the home team
  • C. If days where the temperature was 80OF were also rainy
  • D. Team winning percentage

Which is the one that can’t be determined? A can be, B compares the second and last columns.

C can’t be know, not only is it not possible to know what days had both, but also it is rain days before a game.

D is tricky as we don’t know if it is win or lose, or, win, draw or lose. But we already know the answer is C.

Answer: C


Which of the following can be determined using this data?

  • which city has the most permits
  • what start time are permits given
  • how many permits have an end date in January 2022
  • which city has the highest charge for permits

A store keeps transaction records as above. Which of the following can not be determined using this data?

  • the most expensive item sold
  • the date of the most sales
  • the average amount paid per transaction
  • the most popular payment method

Big data

Which of the following will help organizations gain insights about their business?

  • A. Collecting and analyzing big data to identify patterns and trends they can use to their advantage
  • B. Separating big data into smaller data sets and analyzing those for faster results
  • C. Developing decryption data techniques to be able to drill down and analyze data the government posts online
  • D. Creating copies of company data to let each division do their own analysis without impacting others

Identifying patterns and trends is what we want from big data, so A looks good. We could use smaller data sets but not for faster results.

Decryption is a different area, and dividing big data into divisions is counter-productive. So we know it is A.

Answer: A

Machine learning

A loan application system always refuses applicants in a particular zip code area. Why?

  • the training data of that zip code only contained refused applications
  • a programming error
  • training data was incomplete
  • the training data of that zip code only contained poor applicants

What type of machine learning finds patterns in unlabelled data?

  • supervised learning
  •  unsupervised learning
  •  reinforced learning
  •  early learning

Data mining

When listening to an online music service, you request songs “like” a specific song. How does the music site determine what to play?

  • A. It plays songs you have played previously
  • B. It plays the most requested songs from all listeners
  • C. It uses data-mining techniques to determine patterns in the music that are similar
  • D. It plays a random selection

To find songs ‘like’ the one you are listening there are several approaches, such as grouping listeners and grouping the songs they like. The most common songs in a group will be likely to be ‘similar’. Although this would be machine learning.

This is not A, as that is just your history, B is the most popular, C takes about patterns which data mining does, so C is possible. D is random so not similar.

A, B and D are all wrong, so this is a data mining  question, therefore C is the correct answer.

Answer: C

Which of the following techniques would be best to use to further analyze patterns that emerged during data mining?

i Classifying data to categories it into distinct groups,

ii Cleaning data to determine which data to include in the processing,

iii Clustering data to separate data with similarities into subclasses,

iv Filtering to set conditions so only records meeting the criteria are included

  • A. i, ii, iii
  • B. i, iii, iv
  • C. i, ii, iv
  • D. i, ii, iii, iv

What analyzes patterns here? Classifying is a valid technique, so i is ok, cleaning data is normally necessary for data sets, clustering similar data is also valid.

Filtering can help identify data that is relevant for a particular question. So all of the options are potential techniques to use, so D has all of the options.

Answer: D

Data storage

New data is available to add to a company’s existing data. The IT director wants to store the new data on the cloud. What is a concern that needs to be addressed before implementing the plan?

  • A. The security of the data being transmitted back and forth
  • B. The latency delay in requesting and receiving access to the data
  • C. The redundancy of the Internet increasing the cost
  • D. The cost the ISP will charge to access the cloud

Before we can put data on the internet we need to be sure it is secure, accessible, but only to those with permission (authorized), and accessible in the desired format when needed.

So we need security, there is not really a problem with internet speed these days for cloud storage, the internet is not ‘redundant’, and the costs are normally one-off or over a time period, rather than each time that you access data.

Answer: A

Which of the following daily shop records will use the most storage capacity?

  • the digital sales records
  •  recordings of customer phone conversations
  •  all forms completed by staff online
  •  the camera footage of shop exits


A group that is watching sea turtle nests records data about their nests. Which of the following is metadata?

  • A. Daily temperature of the nest
  • B. Date the eggs were laid
  • C. Nest tag
  • D. Number of data fields tracked

Metadata is data about the data. So which of these options describes the data about the data.

We have data about the nest temperature, data about the date, the nest tag, and the data fields tracked. The last option is about the data rather than the subject.

Answer: D

A camera records car speeds and keeps a record of the car number, date and time and car speed. Which of the following can be answered using the metadata rather than the camera recording?

  • the color and model of the car
  • highest speed of any car on a given day
  • the distance between cars
  • the total amount of cars on a given day

Data analysis

A new discovery has been made from analyzing data. Which of the following methods will most effectively share their discovery?

  • A. Create a video explaining the highlights and wait for it to go viral
  • B. Use diagrams and images and publish the discovery on a professional website for peer review
  • C. Post it on the Web but with a password to ensure only those in the field of study can view it
  • D. Publish the findings in a local newspaper

This question is about an effective way to share information. Creating a viral video about data is unlikely, Using images and publish online with peer review sounds good.

Using a password protected site is restrictive, whilst a local newspaper is also restricted to the local audience.

Answer: B


Based on the information in the table above, which of the following tasks is likely to take the longest amount of time when scaled up for a very large company of approximately 100,000 customers?

  • A. Backing up data
  • B. Deleting entries from data
  • C. Searching through data
  • D. Sorting data

This question is more complicated so we have left it to last, as you might do in the exam. answering easier questions first, and leaving the more difficult ones to after is a useful tactic in exams.

We can see how each option increases with size, so we can make the same calculations on growth for the 100K option.

Option A multiplies by 10, so 200*10 = 2000 hours.

Option B adds 100, so 300+100 = 400 hours.

Option C adds 50, so 350+50=400 hours.

Finally, option D multiplies by 100, so 100*100 = 10,000 hours.

Answer: D

Data privacy and security

A website lists products similar to others that you have recently purchased. What has been used?

  • web cookies
  • rogue access point
  • false password
  • copyright issue

If a data breach allowed credit card data to be exposed, what records would not be a concern?

  • the use of data to make online purchases
  • to sell the data illegally
  • to access personal records such as names and addresses
  • to use the data to access usernames and passwords

What two factors might be used for multi-factor authentication?

  • password & code sent to another device
  • username & password
  • username & PIN code
  • PIN code & password

More AP CSP exam practice questions

This is the last of a five part series of articles that take questions from the sample practice exams.

The first article was about computing questions. Click on the following link to see this article: Computing Questions in the Computer Science Principles AP CSP Exam

The second article was about network questions. Click on the following link to see this article: Network Questions in the Computer Science Principles AP CSP Exam

The third article was about programming questions. Click on the following link to see this article: Programming Questions in the Computer Science Principles AP CSP Exam

The fourth article was about algorithm questions. Click on the following link to see this article: Algorithm Questions in the Computer Science Principles AP CSP Exam

If you wish to see an overview of the sample questions related to the AP CSP exam, please visit the following article:

Ultimate Guide to AP Computer Science Principles Exam Questions

Big Ideas exam practice questions

There are also practice questions grouped by the ‘big ideas’ that are here in pdf format.

So, if you want free download AP CSP questions and answers, then click on the big idea:

  1. Creative Development
  2. Data
  3. Algorithms and Programming
  4. Computer Systems and Networks
  5. Impact of Computing

Multiples Choice Answers

  • how many permits have an end date in January 2022
  • the most expensive item sold
  • the training data of that zip code only contained refused applications
  • unsupervised learning
  • the camera footage of shop exits
  • highest speed of any car on a given day
  • web cookies
  • to use the data to access usernames and passwords
  • password & code sent to another device

Leave a Comment