代写INMR77 Business Intelligence and Data Mining帮做Python语言程序
- 首页 >> DatabaseModule code: INMR77
Module name: Business Intelligence and Data Mining
Work to be handed in by: 13 May 2024 @2pm
Assignment Specification:
The module is assessed 100% through this coursework assignment.
This coursework aims to assess your knowledge of business intelligence and data mining, and your ability to perform. data mining tasks by applying suitable concepts, methods and techniques learned during the lectures and practical lab sessions for business intelligence.
The coursework is carried out individually. You are required to produce a report of 20 pages of A4 (+/- 10%), including tables and diagrams but excluding references and appendices, based on the case as described in this coursework document.
An appendix can be used to include support materials to back up main body points where necessary.
You are also required to submit any supplementary materials of your work using SAS Enterprise Miner on blackboard by the specified deadline.
1. The Case
Opportunities and Challenges of Sharing Economy: Airbnband InsideAirbnb
Airbnb - Holiday Lets, Homes, Experiences & Places
Airbnb (Airbnb.co.uk) is an online marketplace for arranging or offering short-term rental/lodging i.e. temporary accommodation, primarily homestays, or tourism experiences. It was founded in August 2008 by Brain Chesky and friends, and it currently has 6,300 employees as in 2021.
Airbnb went public with a valuation of over $100 billion on December 10, 2020, making it one of the largest IPOs (Initial Public Offerings) of 2020. It is reported that Airbnb capital market was more than the top three largest hotel chains (Marriott, Hilton, and Intercontinental) combined. Though some are calling over evaluation, the company lacks traditional mortgages, employee fees, and maintenance fees which burden hotels. Airbnb hosts pay their own mortgage and clean their apartments, leaving the company much freer of debt, thus making it far more valuable.
Airbnb service overview
Airbnb provides a platform for hosts to accommodate guests with short-term lodging and tourism-related activities. Guest can search for accommodation using filters such as location, price, specific types of home. Before booking, users must provide personal and payment information. Some hosts also require a scan of government-issues identification before accepting a reservation. Hosts provide prices and other details for their rental or listing e.g. number of guests included in the price, type of property, type of room, number of bathrooms, number of bedrooms, number of beds and type of bed, and minimum number of nights for a reservation, and amenities.
In addition, Airbnb provides a guest review system where hosts and guests can leave reviews about their experience, and rate each other after a stay. However, the truthfulness and impartiality of reviews may be adversely affected by concerns of future stays because prospective hosts may refuse to host a user who generally leaves negative reviews. Besides, the company's policy requires users to forego anonymity, which may also detract from users' willingness to leave negative reviews.
Criticism of Airbnb
Airbnb has attracted criticism for increasing housing/residential rental prices in cities where it operates and creating nuisances and security issues etc for those living near leased properties and has negatively affects the quality of life in residential areas, and housing crisis around city in the UK, USA and Europe. The company has attracted regulatory attention from cities such as San Francisco, New York City, and the European Union over the past number years. It has also faced challenges from the hotel industry and other, similar companies.
Airbnb has made a quarter (25%) of its global workforce redundant in 2020 due to the global pandemic. But the news was welcome by some campaigners who were fighting for soaring rents in cities with large number of Airbnb hosts. The number of longer-term rental properties (i.e. residential as opposed to short-term/holiday lets) in central Dublin was up 71% on comparable period last year, as landlords abandoned short-term lets through Airbnb.
Inside Airbnb – Adding Data to the Debate (insideairbnb.com)
Inside Airbnb (insideairbnb.com) is an independent, non-commercial set of tools and data that allows individual to explore how Airbnb is really used in cities around the world. It was set up by Murray Cox and John Morries in 2016.
Airbnb claims to be part of the “sharing economy” and disrupting the hotel industry by offering short-term rental/lodging. However, data shows that most Airbnb listings in most cities are entire homes, many of which are rented all year round (i.e. illegal short-term rentals)– disrupting housing and communities.
Most recently, New York City’s plans to crackdown on illegal short-term rentals which could remove as many as 10,000 Airbnb listing is sparking fierce debates about housing, hotels, the tourist market and residents’ rights.
By analysing publicly available information about a city’s Airbnb’s listings, Inside Airbnb provides filters and key metrics so users can see how Airbnb is being used to compete with the residential housing market. With Inside Airbnb, user can ask fundamental questions about Ainbnb in any neighbourhood, or across the city as a whole, such as:
• how many listings are in my neighbourhood and where are they?
• how many houses and apartments are being rented out frequently to tourists and not to long-term residents?
• how much are hosts making from renting to tourists (compare that to long-term residential rentals)?
• which host are likely running a business with a multiple listings and where are they?
These questions (and the answers) get to the core of the debate for many cities around the world, with Airbnb claiming that their hosts only occasionally rent the homes in which they live. In addition, many cities or state legislation or ordinances that address residential housing, short term or vacation rentals, and zoning usually make reference to allowed use, including:
• how many nights a dwelling is rented per year
• minimum nights stay
• whether the host is present
• how many rooms are being rented in a building
• the number of occupants allowed in a rental
• whether the listing is licensed
The Inside Airbnb tool or data can be used to answer some of these questions. Some understanding of how the Airbnb platform is being used will help clear up the laws as they change.
Additional Information:
Further information of Airbnb, please visit:https://www.airbnb.co.uk/
Further information of Inside Airbnb, please visit:http://insideairbnb.com/index.html
2. Coursework requirements
The sharing economy has brought opportunities and challenges to homeowners, society, residents, communities and governments. One of the biggest issues with Airbnb is whether hosts are sharing the primary residence in which they live "occasionally" (i.e. genuine short- term rentals) or are renting out residential properties permanently as hotels (i.e. not genuine short-term rentals )
Airbnb could easily answer this question but instead it is up to us to shape our communities and solve our urgent need to house tourists, housing shortage/crisis, and to address the nuisances, security and safety issues etc for those living near leased properties by Airbnb.
In this assignment, you are required to carryout data mining tasks using data of Airbnb
listings of London, UK from InsideAirbnb, and to report your findings as a result your data mining and analysis.
2.1 The DATA
The data is available to download from InsideAirbnbas shown in Figure 1 below. A copy of the data, for the purpose of this assignment, is provided and available to download on the blackboard.
http://insideairbnb.com/get-the-data
Figure 1: London Data compiled by InisideAirbnbason 10 December 2023 (-
http://insideairbnb.com/get-the-data
As shown in Figure 1:
1) Listings.csv.gz contains detailed listing data of London. The data was compiled
on 10 December 2023. Each row of the data represents a single listing and
contains information about the host of the property, the property’s
characteristics and overall rating of the property and its features by guests. There are 91,778 listings and 60+ variables in the data set. Listing can be deleted in the Airbnb platform. The data presented is a snapshot of listings available at a
particular of time as on and up to 10 December 2023.
2) Reviews.csv.gz contains the detailed reviews data for each listing. The data was used for a number of derived variables in the detailed listing data e.g.
number_of_reviews, number_of_review_ltm, first_review, last_review, and reviews_per_months.
3) Calender.csv.gz contains detailed calendar data i.e. the availability calendar for 365 days in the future for each listing. In addition
4) A data dictionary - can be viewed and downloaded from
https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2Bo UGoNRIGa6Szc4/edit#gid=1322284596
2.2. Your tasks
You are required to use the detailed listing data (listings.xslx) to find meaningful pattens and rules of whether hosts London are renting out residential properties as hotels/business or genuinely sharing the primary residence in which the live “occasionally” (legitimate/geneuie short-term rentals).
You are expected to conduct cluster analysis (unsupervised learning) to differentiate hosts/listings that are likely to be genuine short term let (genuine short-term rentals) versus hosts/listings that are likely to be operating as a business (not genuine short-term rentals) based on information about the host of the property, the property’scharacteristics and overall rating of the property and its features by guests etc. Also, to build a classification model that could differentiate hosts and/or listings that are for genuine (occasionally) short- term or viceversa.
You are also expected to conduct literature search in the area, and conduct data exploration, data preparation, for the model building using the methods and techniques learned during the module. You are expected to provide justifications for the approaches taken (e.g. use existing literature to backup points made) and your findings have to backup by your analyses/results (e.g. use of screenshots/figures)
2.2 What to deliver
You are required to produce a report of 20 pages of A4 (+/- 10%) including tables and diagrams but excluding references and appendices. An appendix can be used to include support materials to back up main body points where necessary. You are also required to submit the supplementary materials of your work using SAS Enterprise Miner on blackboard by the specified deadline.
The total of 100 marks will be allocated to the following aspects of the report, which should also be used as a guideline to structure the report.
1. Introduction (15%)
The introduction section should include the background, problems, and issues of Airbnb in the context of sharing economy, a clear statement of your data mining goal. You are expected to conduct literature review in the area, and justify your statements using relevant literature and sources.
2. Data Understanding (15%)
In this section, you are expected to conduct exploratory data analysis e.g. summaries statistics and data visualisation techniques, using suitable and relevant techniques and methods and report your key findings, including variables and measurement identified for your subsequent data preparation tasks.
3. Data Preparation (15%)
In this section, you are expected to take the variables identified in the previous step and prepare them for your model building. This should include:
a) data cleaning e.g. missing data handling
b) data transformation e.g. creating new derive variables
c) data reduction (e.g. correlation analysis)
You are expected to justify the approaches taken backup by relevant literature/sources. Make sure to include figures and tables (screenshot) to support your analyses and findings.
4. Cluster Analysis and Results Interpretation (15%)
In this section, you are expected to conduct cluster analysis i.e. identifying clusters/segments of listings based a set or combination set of variables e.g. host’scharacteristic, listings/property’s characteristics and availability, and reviews from guests etc. This should include,
a) a list of variables and clustering techniques used with reasoning for your cluster analysis
b) result interpretations and comments on the characteristics of the clusters/segments obtained.
You are expected to justify the approaches taken backup by relevant literature/sources. Make sure to include figures and tables (screenshots) to support your findings and analysis.
Supplement materials can be provided at the appendix section.
5. Classification Model Building and Model Evaluation (15%)
In this section, you are expected to build a classification model based on the results obtained from your cluster analysis above. Since this information would be most likely to be used to differentiate those listings/hosts that are likely to be “ not genuine short-term rentals”, it would be more meaningful to select segment/cluster(s) that would likely be defined as “not genuine short-term rental” in your classification model – the target variable.
You are expected to:
a) Conduct further data preparation where necessary and justify the variables used for your model building – target/dependent variable and independent variables.
b) Model building and model evaluation –classification methods used and provide your reasoning.
Make sure to include figures and tables (screenshots) to support your model buildings, analyses and findings. Supplement materials can be provided at the appendix section.
6. Conclusion, critical reflection, and suggestion for improvements (15 marks)
In this section, you should conclude the outcomes of your findings in relation to the data mining goal. Reflect your learning and discuss the limitations of your data mining process, this might include the assessment of the suitability of data and variables, methods and techniques used, assumptions made, and provide suggestion for model improvements.
In addition, there are 10 marks allocated to the structure (clarity of organisation and structure - addresses all components of the assignment brief with appropriate weighting across each component, logical structure to the overall argument that is easy to follow), and presentation (e.g. effective use of tables and diagrams, proper use of citation and referencing in an Author-Year e.g., Harvard, APA format, length/page limit).