《数据仓库讲义之Recent Developments in Data Warehousing》由会员分享,可在线阅读,更多相关《数据仓库讲义之Recent Developments in Data Warehousing(62页珍藏版)》请在金锄头文库上搜索。
1、Recent Developments in Data WarehousingHugh J. Watson Terry College of Business University of Georgia hwatsonterry.uga.eduhttp:/www.terry.uga.edu/hwatson/dw_tutorial.pptTutorial ObjectivesnProvide an overview of data warehousingnProvide materials to support the teaching of data warehousing nDiscuss
2、recent developments in data warehousingThe Importance of Data WarehousingnProvide a “single version of the truth”nImprove decision making nSupport key corporate initiatives such as performance management, B2C and B2B e-commerce, and customer relationship managementnEstimated to be a $113.5 billion m
3、arket in 2002 for systems, software, services, and in-house expenditures (Palo Alto Management Group) Data Warehouse CharacteristicsnSubject oriented - data are organized around sales, products, etc.nIntegrated - data are integrated to provide a comprehensive viewnTime variant - historical data are
4、maintainednNonvolatile - data are not updated by usersTopics CoverednDefinitions and conceptsnTwo case studies: Harrahs Entertainment (first) and Owens street number and street name; and city and state.CorrectingnCorrects parsed individual data components using sophisticated data algorithms and seco
5、ndary data sources.nExample include replacing a vanity address and adding a zip code.StandardizingnStandardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.nExamples include adding a pre name, replacing a nic
6、kname, and using a preferred street name. MatchingnSearching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.nExamples include identifying similar names and addresses.ConsolidatingAnalyzing and identifyin
7、g relationships between matched records and consolidating/merging them into ONE representation.Data StagingnOften used as an interim step between data extraction and later stepsnAccumulates data from asynchronous sources using native interfaces, flat files, FTP sessions, or other processesnAt a pred
8、efined cutoff time, data in the staging file is transformed and loaded to the warehousenThere is usually no end user access to the staging filenAn operational data store may be used for data stagingData TransformationnTransforms the data in accordance with the business rules and standards that have
9、been establishednExample include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregatesData LoadingnData are physically moved to the data warehousenThe loading takes place within a “load window” nThe trend is to near real time updates of the data wa
10、rehouse as the warehouse is increasingly used for operational applicationsMeta DatanData about datanNeeded by both information technology personnel and usersnIT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. nUsers need
11、 to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc. Recent Development: Meta Data IntegrationnA growing realization that meta data is critical to data warehousing success nProgress is being made on getting vendors
12、 to agree on standards and to incorporate the sharing of meta data among their toolsnVendors like Microsoft, Computer Associates, and Oracle have entered the meta data marketplace with significant product offeringsDatabase VendorsnHigh end (i.e., terabyte plus) vendors include IBM (DB2) and NCR -Ter
13、adata (Teradata)nOracle (8i) and Microsoft (SQL Server 7) are major players for smaller databasesOn-line Analytical Processing (OLAP)nA set of functionality that facilitates multidimensional analysisnAllows users to analyze data in ways that are natural to themnComes in many varieties - ROLAP, MOLAP
14、, DOLAP, etc.ROLAPnRelational OLAPnUses a RDBMS to implement and OLAP environmentnTypically involves a star schema to provide the multidimensional capabilitiesnOLAP tool manipulates RDBMS star schema datanCalled slowlap by MOLAP vendorsMOLAPnMultidimensional OLAPnUses a MDDBS (e.g., Essbase) to stor
15、e and access datanUsually requires proprietary (non SQL) data access toolsnProvides exceptionally fast response timesStar SchemanCreates non-normalized data structuresnEasier for users to understandnOptimized for OLAPnUses fact (facts or measures in the business) and dimension (establishes the conte
16、xt of the facts) tablesOLAP ToolsnProducts come from vendors such as Brio, Cognos, Hyperion, and BusinessObjectsnTypically available as a fat or thin (i.e., browser) clientnIn a web environment, the browser communicates with a web server, which talks to an application server, which connects to backend databasesnThe application server provides query, reporting, and OLAP analysis functionality ove