| eCommerce, Prediction Markets, and Data Mining |
|
|
|
bridges vol. 18, July 2008 / News from the Network: Austrian Researchers Abroad by Wolfgang Jank
We are living in an increasingly data-driven world. Companies are amassing larger and larger data repositories about their customers in the form of loyalty cards, cookies, or crawlers. This puts pressure on decision makers to extract new and actionable knowledge from that data. Data Mining is a relatively young discipline that has roots in Statistics, Machine Learning, and Visualization and its goal is the discovery of interesting (i.e., non-trivial, previously unknown, and potentially useful) knowledge from huge amounts of data. While Data Mining has impact in many different areas, it plays a special role in electronic commerce (eCommerce). One reason is that e-tailers never actually see their customers and thus have to base business decisions on the only piece of information that is observable: a customer’s click-through behavior. Another reason is that click-through information is easily recordable online, thus valuable information is readily available. Data Mining and eCommerceThere are many prominent examples of Data Mining in eCommerce. One of the most visible current examples is the Netflix Challenge (www.netflixprize.com). Netflix offers $1 million to the person or team that can improve their movie recommendation engine by at least 10 percent. Currently, over 30,000 teams from 172 countries are competing to derive the best Data Mining algorithm and win the prize. Another example is Google Analytics (www.google.com/analytics/). Google Analytics offers a plethora of information about a Web site’s visitors, e.g., where they came from, how long they stayed, which keywords they searched, which ads they clicked, and which products they bought.
Figure courtesy of Chris Volinsky, AT&T Research
AT&T uses massive graph mining to detect fraud in their telephone network data. One type of fraud finds customers who are delinquent on a bill, and subsequently sign up for new service under another phone number. AT&T detects the fraud by comparing the social network of new accounts to those from known delinquent accounts. If there is a large intersection, it is likely that both telephone numbers belong to a single fraudulent individual. Data Mining can be divided into 4 or 5 fundamental tasks. Data Mining can find clusters or segments in the data, it can predict or classify, it can detect anomalies in the data, and it also visualizes (or unearths) new relationships. For instance, BlogPulse (www.blogpulse.com/) is a site that visually mines data about blogging activity from around the world. At the heart of Data Mining is the ability to obtain good data. Data can be obtained either indirectly by observing customers’ actions and choices (e.g., via loyalty cards, clickstreams, or cookies) or by directly asking them about their opinions (via polls or surveys). The following is a description of a new approach that extracts and aggregates data via a market mechanism. This approach is referred to as a prediction market. Prediction MarketsPrediction markets (PMs), also known as information markets, idea markets, or betting exchanges, are increasingly used to aggregate the “wisdom of crowds” from the online communities. PMs are speculative markets created for the sole purpose of making predictions. Assets are created whose final cash value is tied to a particular event (e.g., will the next US president be a Republican?) or parameter (e.g., total sales next quarter). The current market prices can then be interpreted as predictions of the probability of the event or the expected value of the parameter. One of the most famous PMs is the Iowa Electronic Market (IEM). IEM has achieved fame because, since its creation in 1988, it has predicted the US Presidential elections more accurately than traditional polls 75 percent of the time. PMs have many interesting applications, such as forecasting economic trends (e.g., HedgeStreet), natural disasters (Hurricane Futures Market at the University of Miami), outcomes of political campaigns (e.g., IEM), or sporting events (e.g., TradeSports). PMs are also used by an increasing number of major corporations, such as HP, Intel, Microsoft, Google, and Yahoo to tap internal, future-focused knowledge about sales, supplier behavior, project completion time, and new product release timing. Mining Prediction MarketsWe applied Data Mining to one of the best-known PMs, the Hollywood Stock Exchange (HSX), to forecast the release-week theatrical revenues of motion pictures. Distributors (Hollywood studios) and exhibitors (theater owners) have long considered pre-release forecasting of demand for movies as one of the most important, yet difficult, tasks. We focused on the release-week revenues since movies have extremely short life cycles: the release-week often results in 40 percent of a movie’s total theatrical revenues. The release-week revenue is also widely recognized by the industry as the best indictor of a movie’s overall success in theaters and in subsequent distribution channels, such as videos and international markets.
Movie Demand Decay Rates (Y-axis: revenue (on log-scale), X-axis: weeks after release)
The graphs show actually movie demand decay rates from the time of release. Data Mining tries to predict these decay rates in advance of the release.
Since its establishment in 1996, HSX has attracted 1.7 million registered users who trade virtual stocks of upcoming movies and other assets, such as movie directors and stars. A registered user is given a free membership and $2 million virtual currency for trading. Each movie is IPO-ed months or years prior to its theatrical release. Trading of a movie is halted on the day (typically Friday) of the movie’s release. Trading is resumed on the following Monday after the price is adjusted by multiplying its release-week revenue by 2.8. The stock is de-listed at the end of the fourth week after the movie’s release. Then traders can cash out based on the de-list price of the stock. We mined the information on HSX using a very novel Data Mining technique called Functional Shape Analysis (FSA). In essence, FSA identifies patterns (or shapes) in the trading paths which lead to predictions that incorporate the changing rate of information diffusion and herding effects. We found that while conventional methods resulted in forecasting errors as high as 60 percent, our method yielded a forecasting error of less than 7 percent. Moreover, FSA can also be used in a dynamic fashion to produce very early forecasts which are particularly valuable in the movie industry. More details about this study can be found at the author’s Web site: http://www.smith.umd.edu/faculty/wjank/ or by e-mailing the author directly at: wjank[at]rhsmith.umd.edu.
***
The author, Wolfgang Jank, is associate professor of Decisions, Operations &
Information Technologies at the Robert H. Smith School of Business,
University of Maryland, and is affiliated with the Center for
Electronic Markets & Enterprises.
|
|||

