讲解Analytics留学生、讲解Python/SQL scripts编程、辅导Moodle Site 解析C/C++编程|解析Haskell程序

2019.04.09 - 首页 >> 其他

ANALYTICS SPECIALIZATIONS & APPLICATIONS, 2019
COURSEWORK 1 - Customer Analytics Study
Final Report Deadline: Tuesday 9th Apr 2019, 11.59pm
Submission: Via the Analytics Specialization and Applications Moodle Site
Work Plan Deadline: Tuesday 26th Mar 2019, 11.59pm
Submission: Via the Analytics Specialization and Applications Moodle Site
1. The Problem Definition:
In this coursework you will perform a market segmentation on a transactional dataset that
has been provided by a national convenience store chain (4 files describing 3000 customers
over 6 months). Through analysis of the company’s point-of-sale data you will produce
profiles for 5-7 customer segments. These will include a statistical summary and a pen
profile for each segment (these would be used by the retailer to better understand their
customer-base and to inform future marketing campaigns). Your final deliverable in this task
will be a report angled as though it is to be read by the company’s chief data officer and
marketing director (and so which should be presented accordingly). This report will be
accompanied by your technical implementation (expected to be in the form of Python and/or
SQL scripts) and a csv file that list the customers assignments into the categories you have
generated (simply linking their customer ids to a segment id as indicated in your report).
2. Expected Approach:
Being a subjective process, this customer segmentation study has been deliberately left
relatively open ended. There are numerous ways of approaching a task of this sort and the key
is to simply justify the method you take. Nevertheless, it is broadly expected that you will
implement your segmentation study in three parts:
1. Exploration of the sample data provided to establish which indicators your believe are
important to describe customer behaviour (justifying your decisions);
2. A quantitative analysis, considering the customer base in light of selected/ engineered
features before then implementing a clustering approach to settle upon 5-7 customer
segments (justifying your chosen methodology);
3. An analysis of results in order to produce a tangible description of each cluster in the
form of a statistical summary, customer archetypes (pen profiles) and identification of
the two most attractive targets for the company.
You should also submit a very short bullet pointed work plan (of absolutely no more than one
page) to be submitted prior to implementation of the final task (deadline 11.59pm, Tuesday
26th March) and worth 5% of your submission. This brief bullet-pointed work plan should set
out the form your analytical approach is expected to take, including a list of the features that
you expect to use/engineer in order use to describe customers (whether RFM features,
product-based spends, temporal features, etc.), your proposed technical strategy for doing the
segmentation and brief details of how you intend to implement it (Note that this work plan is
not binding, and your final report may differ from it).3. Customer Data Provided
The company have asked you to directly use a sample of 3000 customers’ behavioural data,
as recorded by loyalty card transaction logs. You must use this data alone to provide the
client a summarization of their customer base in the form of 5-7 cluster archetypes -
selected in a manner and number that you feel best describes the variation in the different
types of customers the client interacts with. The files available to you are as follows:
FIle Feature Description
customers_sample.csv A summary file detailing the consumer behaviour of 3000 customers
(referenced by an anonymized but consistent “customer_id”). This
data details the total no. of “ baskets” (i.e. visits) the customer has
processed at retailers store over the 6 month period, the
“total_quantity” of items they have purchased, the
“average_quantity” per basket, the “total_spend” the customer has
made over the period, and “average_spend” per visit.
category_ spends_sample.csv This file again lists the 3000 customers in the sample, but this time
splits down their spend over the period into 20 item categories, which
represent the range of items sold across the retailer. The names of
these categories are included in the file’s header, and are selfexplanatory.
baskets_sample.csv This file details information about each individual visit made by the
3000 customer’s in the sample. Specifically it details the timestamp
for the visit ( “purchase_time”), quantity of items they purchased (
“basket_quantity”), the amount of money they spent on that visit (
“basket_spend”) and the number of different categories their
purchases cut across for that trip ( “basket_categories”)
lineitem_sample.csv, A final dataset is also available for use that breaks down each
basket into its individual product purchase ids, along with category
the item belongs to. This is provided only if you want to extend
your analysis into new features, and all customer_ids it references
correspond to those in the other data files.
Report Structure
You must provide a report that clearly describes your proposed customer segmentation of
the client’s consumer base, based on the sample of transactional data they have provided.
The client company has requested that the number of groups to be used be between 5 and
7 inclusive - but that exact number is up to yourself. The customer segmentation process
will require a stage of basic statistical analysis, a stage of feature selection, a stage of
clustering, a stage of generating segments and then a stage analysing of the results to
generate pen profiles - these must all be described (please take note of the marking scheme
to decide how much of the report your should assign to each). Note that you may use any
software you desire for your analysis, but your clustering should be undertaken in python3
for this coursework. While the report is relatively flexible in structure it is expected that you
will have the following sections included:
1. An Executive Summary: including a description of the task, a summary of
your technical approach, a summary of the data that underpins it, a summary
of the results, and a summary of the insights you have arrived at.
2. A Feature Description section: a summary of the features you have selected and/or
generated from the data to describe customers, justifying the strategy your have
taken.3. A Customer Base Summary section: A cursory exploratory analysis of the data you’ve
received, summarizing the company’s market according to the features you have
developed.
4. A Segmentation Methodology section: A description of the clustering approach you
have taken, justifying what you think a good value of k, the number of groups to use.
5. A Results section: Having produced your clusters, in this section you must now name
and summarize them. This should be done both statistically, describing how the features
you have used vary across different clusters, and descriptively (each cluster should be
accompanied by single paragraph detailing a brief vignette or “pen profile” that donates
the customer ‘archetype’ reflected by that cluster).
6. A Summary Section: Here you will provide a summary of your results, the business
case for your clustering solution and a recommendation for two segments that you
believe will be of most importance to the company to focus attention to (including your
reasons why). Draw together any key take-home points that have dropped out your
analysis, any marketing recommendations you may have as a result of your analysis,
or indeed any suggestions you have to the client for further analysing business
recommendations for further potential analysis.
4. Marking Criteria
Your submission will be assessed based on the following mark scheme:
● Work Plan (submitted on March 26th) (5 marks)
● Executive Summary (5 marks)
● Appropriateness of Data preparation (5 marks)
● Appropriateness of Feature Selection/Engineering (20 marks)
● Description and Justification of Methodology (10 marks)
● Analysis and Description of Results (25 marks)
- Overall Customer base summary (5 marks)
- Individual Statistical Summaries of Clusters (10 marks)
- Pen Portraits of Clusters (10 marks)
● Efficacy of Insights and Recommendations (5 marks)
● Technical Implementation (20 marks)
- Functionality (15 marks)
- Clarity and Commenting (5 marks)
Overall presentation and professionalism of Report (5 marks)
4. Submission Guidelines
Your Work Plan must be submitted by 11.59pm on 26th March via Moodle, using the document
title “Student ID - Your Name - ASA Coursework 1 – Workplan”.
Your Final Report must be submitted by 11.59pm on 9th Apr via Moodle. In your submission
please attach 1. Your Final Report (Do not forget, this has a strict maximum of 8 pages or 3000
words, with no appendices); 2. Your technical implementation / code; and 3. Your results file
denoting customer ids and their attributed segment. Please put your Student ID and Your
name at the beginning of your document name.
As per University guidelines late submissions will lose 5% from their final mark per day. All
text, code and workflows will also be examined to ensure there is no repetition between
submissions. Any plagiarised work will immediately receive zero marks.5. Final Message from the company’s Chief Data Officer
“Dear consultant - we are aware customer segmentation is somewhat subjective, so we are happy to
offer the following guidance. The segmentations you produce will be used by ourselves to explore oru
customer base in the first instance, but then subsequently for marketing, and you’re suggestions
should be guided by this usage. We also understand the job will be to decide how best to describe
each customer (i.e. feature selection). While we would be very interested in new ways you can think
of to describe our consumers’ behaviour, the following ways of describing customer behaviour might
be worth investigating:
Spend: It is often said that 20% of a customer base accounts for 80% of a company's
profits. This may or may not be true, but it does isolate the fact that some groups will produce
more revenue for the company than others, and this needs to be a part of the analysis. Hence
a customer’s “total_spend” might be included as a feature.
Frequency: We have noted that some of our customers exhibit different shopping patterns
over time. The frequency of visitation in a shopper’s patterns could well help to underpin
customer segments, so the “number_of_visits” may be worth considering.
Average Spend: We definitely want to consider the average item spend of each group
generated by the clustering process - are customers in a group purchasing many small items or
single higher cost products? This will give us an indication of whether that customer type is
focussed on low or high involvement products, and thus help shape our relationship with them.
You may wish to use a customers average_spend_per_item within your features Along the
same lines, we also think you may want to include the average_basket_spend and
average_item_count for each visit as features.
And finally, the ability to distinguish customers in each segment by the different types of
products or categories that they buy from would allow use to potentially sculpt marketing effort
to different groups based on their actual interests, preferences and shopping missions.
Of course, these are just some of the core items you might want to take into account when you design
the features you are using to input into your clustering algorithm. Please extend as you see fit. We
appreciate your efforts and understand that time constraints mean that you won’t be able to
cover everything - nonetheless we look forward to seeing your individual ‘cut’ through this data in
the time you have available for the study.” - CDO.
END OF COURSEWORK SPECIFICATION