数据库系统原理英文课件：ch12 Indexing and Hashing

资源描述

《数据库系统原理英文课件：ch12 Indexing and Hashing》由会员分享，可在线阅读，更多相关《数据库系统原理英文课件：ch12 Indexing and Hashing（84页珍藏版）》请在金锄头文库上搜索。

1、Database System Concepts, 5th Ed.Silberschatz, Korth and SudarshanSee www.db- for conditions on re-use Chapter 12: Indexing and HashingSilberschatz, Korth and Sudarshan12.2Database System Concepts - 5th Edition, Oct 4, 2006Chapter 12: Indexing and HashingnBasic ConceptsnOrdered Indices nB+-Tree Inde

2、x FilesnB-Tree Index FilesnStatic HashingnDynamic Hashing nComparison of Ordered Indexing and Hashing nIndex Definition in SQLnMultiple-Key AccessSilberschatz, Korth and Sudarshan12.3Database System Concepts - 5th Edition, Oct 4, 2006Basic ConceptsnIndexing mechanisms used to speed up access to desi

3、red data.lE.g., author catalog in librarynSearch Key - attribute to set of attributes used to look up records in a file.nAn index file consists of records (called index entries) of the formnIndex files are typically much smaller than the original file nTwo basic kinds of indices:lOrdered indices: se

4、arch keys are stored in sorted orderlHash indices: search keys are distributed uniformly across “buckets” using a “hash function”. search-keypointerSilberschatz, Korth and Sudarshan12.4Database System Concepts - 5th Edition, Oct 4, 2006Index Evaluation MetricsnAccess types supported efficiently. E.g

5、., lrecords with a specified value in the attributelor records with an attribute value falling in a specified range of values.nAccess timenInsertion timenDeletion timenSpace overheadSilberschatz, Korth and Sudarshan12.5Database System Concepts - 5th Edition, Oct 4, 2006Ordered IndicesnIn an ordered

6、index, index entries are stored sorted on the search key value. E.g., author catalog in library.nPrimary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.lAlso called clustering indexlThe search key of a primary index is usually but not nec

7、essarily the primary key.nSecondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.nIndex-sequential file: ordered sequential file with a primary index.Silberschatz, Korth and Sudarshan12.6Database System Concepts

8、 - 5th Edition, Oct 4, 2006Dense Index FilesnDense index Index record appears for every search-key value in the file. Silberschatz, Korth and Sudarshan12.7Database System Concepts - 5th Edition, Oct 4, 2006Sparse Index FilesnSparse Index: contains index records for only some search-key values.lAppli

9、cable when records are sequentially ordered on search-keynTo locate a record with search-key value K we:lFind index record with largest search-key value KlSearch file sequentially starting at the record to which the index record pointsSilberschatz, Korth and Sudarshan12.8Database System Concepts - 5

10、th Edition, Oct 4, 2006Sparse Index Files (Cont.)nCompared to dense indices:lLess space and less maintenance overhead for insertions and deletions.lGenerally slower than dense index for locating records.nGood tradeoff: sparse index with an index entry for every block in file, corresponding to least

11、search-key value in the block.Silberschatz, Korth and Sudarshan12.9Database System Concepts - 5th Edition, Oct 4, 2006Multilevel IndexnIf primary index does not fit in memory, access becomes expensive.nSolution: treat primary index kept on disk as a sequential file and construct a sparse index on it

12、.louter index a sparse index of primary indexlinner index the primary index filenIf even outer index is too large to fit in main memory, yet another level of index can be created, and so on.nIndices at all levels must be updated on insertion or deletion from the file.Silberschatz, Korth and Sudarsha

13、n12.10Database System Concepts - 5th Edition, Oct 4, 2006Multilevel Index (Cont.)Multilevel Index (Cont.)Silberschatz, Korth and Sudarshan12.11Database System Concepts - 5th Edition, Oct 4, 2006Index Update: DeletionnIf deleted record was the only record in the file with its particular search-key va

14、lue, the search-key is deleted from the index also.nSingle-level index deletion:lDense indices deletion of search-key:similar to file record deletion.lSparse indices 4 if an entry for the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value

15、 in the file (in search-key order). 4If the next search-key value already has an index entry, the entry is deleted instead of being replaced.Silberschatz, Korth and Sudarshan12.12Database System Concepts - 5th Edition, Oct 4, 2006Index Update: InsertionnSingle-level index insertion:lPerform a lookup

16、 using the search-key value appearing in the record to be inserted.lDense indices if the search-key value does not appear in the index, insert it.lSparse indices if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created. 4If a new bloc

17、k is created, the first search-key value appearing in the new block is inserted into the index.nMultilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithmsSilberschatz, Korth and Sudarshan12.13Database System Concepts - 5th Edition, Oct 4, 2006Secondary

18、 IndicesnFrequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index) satisfy some condition.lExample 1: In the account relation stored sequentially by account number, we may want to find all accounts in a particular branchlExample

19、2: as above, but where we want to find all accounts with a specified balance or range of balancesnWe can have a secondary index with an index record for each search-key valueSilberschatz, Korth and Sudarshan12.14Database System Concepts - 5th Edition, Oct 4, 2006Secondary Indices ExamplenIndex recor

20、d points to a bucket that contains pointers to all the actual records with that particular search-key value.nSecondary indices have to be denseSecondary index on balance field of accountSilberschatz, Korth and Sudarshan12.15Database System Concepts - 5th Edition, Oct 4, 2006Primary and Secondary Ind

21、icesnIndices offer substantial benefits when searching for records.nBUT: Updating indices imposes overhead on database modification -when a file is modified, every index on the file must be updated, nSequential scan using primary index is efficient, but a sequential scan using a secondary index is e

22、xpensive lEach record access may fetch a new block from disklBlock fetch requires about 5 to 10 milliseconds4 versus about 100 nanoseconds for memory accessSilberschatz, Korth and Sudarshan12.16Database System Concepts - 5th Edition, Oct 4, 2006B+-Tree Index FilesnDisadvantage of indexed-sequential

23、fileslperformance degrades as file grows, since many overflow blocks get created. lPeriodic reorganization of entire file is required.nAdvantage of B+-tree index files: lautomatically reorganizes itself with small, local, changes, in the face of insertions and deletions. lReorganization of entire fi

24、le is not required to maintain performance.n(Minor) disadvantage of B+-trees: lextra insertion and deletion overhead, space overhead.nAdvantages of B+-trees outweigh disadvantageslB+-trees are used extensivelyB+-tree indices are an alternative to indexed-sequential files.Silberschatz, Korth and Suda

25、rshan12.17Database System Concepts - 5th Edition, Oct 4, 2006B+-Tree Index Files (Cont.)nAll paths from root to leaf are of the same lengthnEach node that is not a root or a leaf has between n/2 and n children.nA leaf node has between (n1)/2 and n1 valuesnSpecial cases: lIf the root is not a leaf, i

26、t has at least 2 children.lIf the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n1) values.A B+-tree is a rooted tree satisfying the following properties:Silberschatz, Korth and Sudarshan12.18Database System Concepts - 5th Edition, Oct 4, 2006B+-Tree Node

27、 StructurenTypical nodelKi are the search-key values lPi are pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes).nThe search-keys in a node are ordered K1 K2 K3 . . . Kn1Silberschatz, Korth and Sudarshan12.19Database System Concepts - 5th Edition,

28、Oct 4, 2006Leaf Nodes in B+-TreesnFor i = 1, 2, . . ., n1, pointer Pi either points to a file record with search-key value Ki, or to a bucket of pointers to file records, each record having search-key value Ki. Only need bucket structure if search-key does not form a primary key.nIf Li, Lj are leaf

29、nodes and i k.2.If such a value exists, assume it is Ki. Then set N = Pi3.Otherwise k Kn1. Set N = Pn Until N is a leaf node3.If for some i, key Ki = k follow pointer Pi to the desired record or bucket. 4.Else no record with search-key value k exists.Silberschatz, Korth and Sudarshan12.25Database Sy

30、stem Concepts - 5th Edition, Oct 4, 2006Queries on B+-Trees (Cont.)nIf there are K search-key values in the file, the height of the tree is no more than logn/2(K).nA node is generally the same size as a disk block, typically 4 kilobytesland n is typically around 100 (40 bytes per index entry).nWith

31、1 million search key values and n = 100lat most log50(1,000,000) = 4 nodes are accessed in a lookup.nContrast this with a balanced binary tree with 1 million search key values around 20 nodes are accessed in a lookuplabove difference is significant since every node access may need a disk I/O, costin

32、g around 20 millisecondsSilberschatz, Korth and Sudarshan12.26Database System Concepts - 5th Edition, Oct 4, 2006Updates on B+-Trees: Insertion1.Find the leaf node in which the search-key value would appear2.If the search-key value is already present in the leaf node1.Add record to the file2.If nece

33、ssary add a pointer to the bucket.3.If the search-key value is not present, then 1.add the record to the main file (and create a bucket if necessary)2.If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node3.Otherwise, split the node (along with the new (key-value, point

34、er) entry) as discussed in the next slide.Silberschatz, Korth and Sudarshan12.27Database System Concepts - 5th Edition, Oct 4, 2006Updates on B+-Trees: Insertion (Cont.)nSplitting a leaf node:ltake the n (search-key value, pointer) pairs (including the one being inserted) in sorted order. Place the

35、first n/2 in the original node, and the rest in a new node.llet the new node be p, and let k be the least key value in p. Insert (k,p) in the parent of the node being split. lIf the parent is full, split it and propagate the split further up.nSplitting of nodes proceeds upwards till a node that is n

36、ot full is found. lIn the worst case the root node may be split increasing the height of the tree by 1. Result of splitting node containing Brighton and Downtown on inserting ClearviewNext step: insert entry with (Downtown,pointer-to-new-node) into parentSilberschatz, Korth and Sudarshan12.28Databas

37、e System Concepts - 5th Edition, Oct 4, 2006Updates on B+-Trees: Insertion (Cont.)B+-Tree before and after insertion of “Clearview”Silberschatz, Korth and Sudarshan12.29Database System Concepts - 5th Edition, Oct 4, 2006RedwoodInsertion in B+-Trees (Cont.)nSplitting a non-leaf node: when inserting (

38、k,p) into an already full internal node NlCopy N to an in-memory area M with space for n+1 pointers and n keyslInsert (k,p) into MlCopy P1,K1, , K n/2-1,P n/2 from M back into node NlCopy Pn/2+1,K n/2+1,Kn,Pn+1 from M into newly allocated node NlInsert (K n/2,N) into parent NnRead pseudocode in book

39、!Downtown Mianus PerryridgeDowntown Mianus Silberschatz, Korth and Sudarshan12.30Database System Concepts - 5th Edition, Oct 4, 2006Updates on B+-Trees: DeletionnFind the record to be deleted, and remove it from the main file and from the bucket (if present)nRemove (search-key value, pointer) from t

40、he leaf node if there is no bucket or if the bucket has become emptynIf the node has too few entries due to the removal, and the entries in the node and a sibling fit into a single node, then merge siblings:lInsert all the search-key values in the two nodes into a single node (the one on the left),

41、and delete the other node.lDelete the pair (Ki1, Pi), where Pi is the pointer to the deleted node, from its parent, recursively using the above procedure.Silberschatz, Korth and Sudarshan12.31Database System Concepts - 5th Edition, Oct 4, 2006Updates on B+-Trees: DeletionnOtherwise, if the node has

42、too few entries due to the removal, but the entries in the node and a sibling do not fit into a single node, then redistribute pointers:lRedistribute the pointers between the node and a sibling such that both have more than the minimum number of entries.lUpdate the corresponding search-key value in

43、the parent of the node.nThe node deletions may cascade upwards till a node which has n/2 or more pointers is found. nIf the root node has only one pointer after deletion, it is deleted and the sole child becomes the root. Silberschatz, Korth and Sudarshan12.32Database System Concepts - 5th Edition,

44、Oct 4, 2006Examples of B+-Tree DeletionnDeleting “Downtown” causes merging of under-full leavesl leaf node can become empty only for n=3!Before and after deleting “Downtown”Silberschatz, Korth and Sudarshan12.33Database System Concepts - 5th Edition, Oct 4, 2006Examples of B+-Tree Deletion (Cont.)nL

45、eaf with “Perryridge” becomes underfull (actually empty, in this special case) and merged with its sibling.nAs a result “Perryridge” nodes parent became underfull, and was merged with its sibling lValue separating two nodes (at parent) moves into merged nodelEntry deleted from parentnRoot node then

46、has only one child, and is deletedDeletion of “Perryridge” from result of previous exampleSilberschatz, Korth and Sudarshan12.34Database System Concepts - 5th Edition, Oct 4, 2006Example of B+-tree Deletion (Cont.)nParent of leaf containing Perryridge became underfull, and borrowed a pointer from it

47、s left siblingnSearch-key value in the parents parent changes as a resultBefore and after deletion of “Perryridge” from earlier exampleSilberschatz, Korth and Sudarshan12.35Database System Concepts - 5th Edition, Oct 4, 2006B+-Tree File OrganizationnIndex file degradation problem is solved by using

48、B+-Tree indices.nData file degradation problem is solved by using B+-Tree File Organization.nThe leaf nodes in a B+-tree file organization store records, instead of pointers.nLeaf nodes are still required to be half fulllSince records are larger than pointers, the maximum number of records that can

49、be stored in a leaf node is less than the number of pointers in a nonleaf node.nInsertion and deletion are handled in the same way as insertion and deletion of entries in a B+-tree index.Silberschatz, Korth and Sudarshan12.36Database System Concepts - 5th Edition, Oct 4, 2006B+-Tree File Organizatio

50、n (Cont.)nGood space utilization important since records use more space than pointers. nTo improve space utilization, involve more sibling nodes in redistribution during splits and mergeslInvolving 2 siblings in redistribution (to avoid split / merge where possible) results in each node having at le

51、ast entriesExample of B+-tree File OrganizationSilberschatz, Korth and Sudarshan12.37Database System Concepts - 5th Edition, Oct 4, 2006Indexing StringsnVariable length strings as keyslVariable fanoutlUse space utilization as criterion for splitting, not number of pointersnPrefix compressionlKey val

52、ues at internal nodes can be prefixes of full key4Keep enough characters to distinguish entries in the subtrees separated by the key valueE.g. “Silas” and “Silberschatz” can be separated by “Silb”lKeys in leaf node can be compressed by sharing common prefixesSilberschatz, Korth and Sudarshan12.38Dat

53、abase System Concepts - 5th Edition, Oct 4, 2006B-Tree Index FilesnSimilar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant storage of search keys.nSearch keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer field for each search key

54、 in a nonleaf node must be included.nGeneralized B-tree leaf nodenNonleaf node pointers Bi are the bucket or file record pointers.Silberschatz, Korth and Sudarshan12.39Database System Concepts - 5th Edition, Oct 4, 2006B-Tree Index File ExampleB-tree (above) and B+-tree (below) on same dataSilbersch

55、atz, Korth and Sudarshan12.40Database System Concepts - 5th Edition, Oct 4, 2006B-Tree Index Files (Cont.)nAdvantages of B-Tree indices:lMay use less tree nodes than a corresponding B+-Tree.lSometimes possible to find search-key value before reaching leaf node.nDisadvantages of B-Tree indices:lOnly

56、small fraction of all search-key values are found early lNon-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees typically have greater depth than corresponding B+-TreelInsertion and deletion more complicated than in B+-Trees lImplementation is harder than B+-Trees.nTypically, advantages of

57、B-Trees do not out weigh disadvantages. Silberschatz, Korth and Sudarshan12.41Database System Concepts - 5th Edition, Oct 4, 2006Multiple-Key AccessnUse multiple indices for certain types of queries.nExample: select account_numberfrom accountwhere branch_name = “Perryridge” and balance = 1000nPossib

58、le strategies for processing query using indices on single attributes:1. Use index on branch_name to find accounts with branch name Perryridge; test balance = 1000 2. Use index on balance to find accounts with balances of $1000; test branch_name = “Perryridge”.3. Use branch_name index to find pointe

59、rs to all records pertaining to the Perryridge branch. Similarly use index on balance. Take intersection of both sets of pointers obtained.Silberschatz, Korth and Sudarshan12.42Database System Concepts - 5th Edition, Oct 4, 2006Indices on Multiple KeysnComposite search keys are search keys containin

60、g more than one attributelE.g. (branch_name, balance)nLexicographic ordering: (a1, a2) (b1, b2) if either la1 b1, or la1=b1 and a2 b2Silberschatz, Korth and Sudarshan12.43Database System Concepts - 5th Edition, Oct 4, 2006Indices on Multiple Attributesn With the where clause where branch_name = “Per

61、ryridge” and balance = 1000the index on (branch_name, balance) can be used to fetch only records that satisfy both conditions.lUsing separate indices in less efficient we may fetch many records (or pointers) that satisfy only one of the conditions.nCan also efficiently handle where branch_name = “Pe

62、rryridge” and balance 1000nBut cannot efficiently handle where branch_name “Perryridge” and balance = 1000lMay fetch many records that satisfy the first but not the second conditionSuppose we have an index on combined search-key(branch_name, balance).Silberschatz, Korth and Sudarshan12.44Database Sy

63、stem Concepts - 5th Edition, Oct 4, 2006Non-Unique Search KeysnAlternatives:lBuckets on separate block (bad idea)lList of tuple pointers with each key4Extra code to handle long lists4Deletion of a tuple can be expensive if there are many duplicates on search key (why?)4Low space overhead, no extra c

64、ost for querieslMake search key unique by adding a record-identifier4Extra storage overhead for keys4Simpler code for insertion/deletion4Widely usedSilberschatz, Korth and Sudarshan12.45Database System Concepts - 5th Edition, Oct 4, 2006Other Issues in IndexingnCovering indiceslAdd extra attributes

65、to index so (some) queries can avoid fetching the actual records4Particularly useful for secondary indices Why?lCan store extra attributes only at leafnRecord relocation and secondary indiceslIf a record moves, all secondary indices that store record pointers have to be updated lNode splits in B+-tr

66、ee file organizations become very expensivelSolution: use primary-index search key instead of record pointer in secondary index4Extra traversal of primary index to locate recordHigher cost for queries, but node splits are cheap4Add record-id if primary-index search key is non-uniqueDatabase System C

67、oncepts, 5th Ed.Silberschatz, Korth and SudarshanSee www.db- for conditions on re-use HashingSilberschatz, Korth and Sudarshan12.47Database System Concepts - 5th Edition, Oct 4, 2006Static HashingnA bucket is a unit of storage containing one or more records (a bucket is typically a disk block). nIn

68、a hash file organization we obtain the bucket of a record directly from its search-key value using a hash function.nHash function h is a function from the set of all search-key values K to the set of all bucket addresses B.nHash function is used to locate records for access, insertion as well as del

69、etion.nRecords with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record. Silberschatz, Korth and Sudarshan12.48Database System Concepts - 5th Edition, Oct 4, 2006Example of Hash File OrganizationnThere are 10 buckets,nTh

70、e binary representation of the ith character is assumed to be the integer i.nThe hash function returns the sum of the binary representations of the characters modulo 10lE.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3Hash file organization of account file, using branch_name as key (See figur

71、e in next slide.)Silberschatz, Korth and Sudarshan12.49Database System Concepts - 5th Edition, Oct 4, 2006Example of Hash File Organization Hash file organization of account file, using branch_name as key(see previous slide for details).Silberschatz, Korth and Sudarshan12.50Database System Concepts

72、- 5th Edition, Oct 4, 2006Hash FunctionsnWorst hash function maps all search-key values to the same bucket; this makes access time proportional to the number of search-key values in the file.nAn ideal hash function is uniform, i.e., each bucket is assigned the same number of search-key values from t

73、he set of all possible values.nIdeal hash function is random, so each bucket will have the same number of records assigned to it irrespective of the actual distribution of search-key values in the file.nTypical hash functions perform computation on the internal binary representation of the search-ke

74、y. lFor example, for a string search-key, the binary representations of all the characters in the string could be added and the sum modulo the number of buckets could be returned. .Silberschatz, Korth and Sudarshan12.51Database System Concepts - 5th Edition, Oct 4, 2006Handling of Bucket OverflowsnB

75、ucket overflow can occur because of lInsufficient buckets lSkew in distribution of records. This can occur due to two reasons:4multiple records have same search-key value4chosen hash function produces non-uniform distribution of key valuesnAlthough the probability of bucket overflow can be reduced,

76、it cannot be eliminated; it is handled by using overflow buckets.Silberschatz, Korth and Sudarshan12.52Database System Concepts - 5th Edition, Oct 4, 2006Handling of Bucket Overflows (Cont.)nOverflow chaining the overflow buckets of a given bucket are chained together in a linked list.nAbove scheme

77、is called closed hashing. lAn alternative, called open hashing, which does not use overflow buckets, is not suitable for database applications.Silberschatz, Korth and Sudarshan12.53Database System Concepts - 5th Edition, Oct 4, 2006Hash IndicesnHashing can be used not only for file organization, but

78、 also for index-structure creation. nA hash index organizes the search keys, with their associated record pointers, into a hash file structure.nStrictly speaking, hash indices are always secondary indices lif the file itself is organized using hashing, a separate primary hash index on it using the s

79、ame search-key is unnecessary. lHowever, we use the term hash index to refer to both secondary index structures and hash organized files. Silberschatz, Korth and Sudarshan12.54Database System Concepts - 5th Edition, Oct 4, 2006Example of Hash IndexSilberschatz, Korth and Sudarshan12.55Database Syste

80、m Concepts - 5th Edition, Oct 4, 2006Deficiencies of Static HashingnIn static hashing, function h maps search-key values to a fixed set of B of bucket addresses. Databases grow or shrink with time. lIf initial number of buckets is too small, and file grows, performance will degrade due to too much o

81、verflows.lIf space is allocated for anticipated growth, a significant amount of space will be wasted initially (and buckets will be underfull).lIf database shrinks, again space will be wasted.nOne solution: periodic re-organization of the file with a new hash functionlExpensive, disrupts normal oper

82、ationsnBetter solution: allow the number of buckets to be modified dynamically. Silberschatz, Korth and Sudarshan12.56Database System Concepts - 5th Edition, Oct 4, 2006Dynamic HashingnGood for database that grows and shrinks in sizenAllows the hash function to be modified dynamicallynExtendable has

83、hing one form of dynamic hashing lHash function generates values over a large range typically b-bit integers, with b = 32.lAt any time use only a prefix of the hash function to index into a table of bucket addresses. lLet the length of the prefix be i bits, 0 i 32. 4Bucket address table size = 2i. I

84、nitially i = 04Value of i grows and shrinks as the size of the database grows and shrinks.lMultiple entries in the bucket address table may point to a bucket (why?)lThus, actual number of buckets is ij (more than one pointer to bucket j)lallocate a new bucket z, and set ij = iz = (ij + 1)lUpdate the

85、 second half of the bucket address table entries originally pointing to j, to point to zlremove each record in bucket j and reinsert (in j or z)lrecompute new bucket for Kj and insert record in the bucket (further splitting is required if the bucket is still full)nIf i = ij (only one pointer to buck

86、et j)lIf i reaches some limit b, or too many splits have happened in this insertion, create an overflow bucket lElse4increment i and double the size of the bucket address table.4replace each entry in the table by two entries that point to the same bucket.4recompute new bucket address table entry for

87、 KjNow i ij so use the first case above. To split a bucket j when inserting record with search-key value Kj:Silberschatz, Korth and Sudarshan12.60Database System Concepts - 5th Edition, Oct 4, 2006Deletion in Extendable Hash StructurenTo delete a key value, llocate it in its bucket and remove it. lT

88、he bucket itself can be removed if it becomes empty (with appropriate updates to the bucket address table). lCoalescing of buckets can be done (can coalesce only with a “buddy” bucket having same value of ij and same ij 1 prefix, if it is present) lDecreasing bucket address table size is also possib

89、le4Note: decreasing bucket address table size is an expensive operation and should be done only if number of buckets becomes much smaller than the size of the table Silberschatz, Korth and Sudarshan12.61Database System Concepts - 5th Edition, Oct 4, 2006Use of Extendable Hash Structure: Example Init

90、ial Hash structure, bucket size = 2Silberschatz, Korth and Sudarshan12.62Database System Concepts - 5th Edition, Oct 4, 2006Example (Cont.)nHash structure after insertion of one Brighton and two Downtown recordsSilberschatz, Korth and Sudarshan12.63Database System Concepts - 5th Edition, Oct 4, 2006

91、Example (Cont.)Hash structure after insertion of Mianus recordSilberschatz, Korth and Sudarshan12.64Database System Concepts - 5th Edition, Oct 4, 2006Example (Cont.)Hash structure after insertion of three Perryridge recordsSilberschatz, Korth and Sudarshan12.65Database System Concepts - 5th Edition

92、, Oct 4, 2006Example (Cont.)nHash structure after insertion of Redwood and Round Hill recordsSilberschatz, Korth and Sudarshan12.66Database System Concepts - 5th Edition, Oct 4, 2006Extendable Hashing vs. Other SchemesnBenefits of extendable hashing: lHash performance does not degrade with growth of

93、 filelMinimal space overheadnDisadvantages of extendable hashinglExtra level of indirection to find desired recordlBucket address table may itself become very big (larger than memory)4Cannot allocate very large contiguous areas on disk either4Solution: B+-tree structure to locate desired record in b

94、ucket address tablelChanging size of bucket address table is an expensive operationnLinear hashing is an alternative mechanism lAllows incremental growth of its directory (equivalent to bucket address table)lAt the cost of more bucket overflowsSilberschatz, Korth and Sudarshan12.67Database System Co

95、ncepts - 5th Edition, Oct 4, 2006Comparison of Ordered Indexing and HashingComparison of Ordered Indexing and HashingnCost of periodic re-organizationnRelative frequency of insertions and deletionsnIs it desirable to optimize average access time at the expense of worst-case access time?nExpected typ

96、e of queries:lHashing is generally better at retrieving records having a specified value of the key.lIf range queries are common, ordered indices are to be preferrednIn practice:lPostgreSQL supports hash indices, but discourages use due to poor performancelOracle supports static hash organization, b

97、ut not hash indiceslSQLServer supports only B+-treesSilberschatz, Korth and Sudarshan12.68Database System Concepts - 5th Edition, Oct 4, 2006Bitmap IndicesnBitmap indices are a special type of index designed for efficient querying on multiple keysnRecords in a relation are assumed to be numbered seq

98、uentially from, say, 0lGiven a number n it must be easy to retrieve record n4Particularly easy if records are of fixed sizenApplicable on attributes that take on a relatively small number of distinct valueslE.g. gender, country, state, lE.g. income-level (income broken up into a small number of leve

99、ls such as 0-9999, 10000-19999, 20000-50000, 50000- infinity)nA bitmap is simply an array of bitsSilberschatz, Korth and Sudarshan12.69Database System Concepts - 5th Edition, Oct 4, 2006Bitmap Indices (Cont.)nIn its simplest form a bitmap index on an attribute has a bitmap for each value of the attr

100、ibutelBitmap has as many bits as recordslIn a bitmap for value v, the bit for a record is 1 if the record has the value v for the attribute, and is 0 otherwiseSilberschatz, Korth and Sudarshan12.70Database System Concepts - 5th Edition, Oct 4, 2006Bitmap Indices (Cont.)nBitmap indices are useful for

101、 queries on multiple attributes lnot particularly useful for single attribute queriesnQueries are answered using bitmap operationslIntersection (and)lUnion (or)lComplementation (not) nEach operation takes two bitmaps of the same size and applies the operation on corresponding bits to get the result

102、bitmaplE.g. 100110 AND 110011 = 100010 100110 OR 110011 = 110111 NOT 100110 = 011001lMales with income level L1: 10010 AND 10100 = 100004Can then retrieve required tuples.4Counting number of matching tuples is even fasterSilberschatz, Korth and Sudarshan12.71Database System Concepts - 5th Edition, O

103、ct 4, 2006Bitmap Indices (Cont.)nBitmap indices generally very small compared with relation sizelE.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. 4If number of distinct attribute values is 8, bitmap is only 1% of relation sizenDeletion needs to be handled pr

104、operlylExistence bitmap to note if there is a valid record at a record locationlNeeded for complementation4not(A=v): (NOT bitmap-A-v) AND ExistenceBitmapnShould keep bitmaps for all values, even null valuelTo correctly handle SQL null semantics for NOT(A=v):4 intersect above result with (NOT bitmap-

105、A-Null)Silberschatz, Korth and Sudarshan12.72Database System Concepts - 5th Edition, Oct 4, 2006Efficient Implementation of Bitmap OperationsEfficient Implementation of Bitmap OperationsnBitmaps are packed into words; a single word and (a basic CPU instruction) computes and of 32 or 64 bits at oncel

106、E.g. 1-million-bit maps can be and-ed with just 31,250 instructionnCounting number of 1s can be done fast by a trick:lUse each byte to index into a precomputed array of 256 elements each storing the count of 1s in the binary representation4Can use pairs of bytes to speed up further at a higher memor

107、y costlAdd up the retrieved countsnBitmaps can be used instead of Tuple-ID lists at leaf levels of B+-trees, for values that have a large number of matching recordslWorthwhile if 1/64 of the records have that value, assuming a tuple-id is 64 bitslAbove technique merges benefits of bitmap and B+-tree

108、 indicesSilberschatz, Korth and Sudarshan12.73Database System Concepts - 5th Edition, Oct 4, 2006Index Definition in SQLnCreate an indexcreate index on ()E.g.: create index b-index on branch(branch_name)nUse create unique index to indirectly specify and enforce the condition that the search key is a

109、 candidate key is a candidate key.lNot really required if SQL unique integrity constraint is supportednTo drop an index drop index nMost database systems allow specification of type of index, and clustering.Database System Concepts, 5th Ed.Silberschatz, Korth and SudarshanSee www.db- for conditions

110、on re-use End of ChapterSilberschatz, Korth and Sudarshan12.75Database System Concepts - 5th Edition, Oct 4, 2006Partitioned HashingnHash values are split into segments that depend on each attribute of the search-key.(A1, A2, . . . , An) for n attribute search-keynExample: n = 2, for customer, searc

111、h-key being (customer-street, customer-city)search-key value hash value(Main, Harrison)101 111(Main, Brooklyn) 101 001(Park, Palo Alto) 010 010(Spring, Brooklyn)001 001(Alma, Palo Alto) 110 010nTo answer equality query on single attribute, need to look up multiple buckets. Similar in effect to grid

112、files. Silberschatz, Korth and Sudarshan12.76Database System Concepts - 5th Edition, Oct 4, 2006Sequential File For account RecordsSilberschatz, Korth and Sudarshan12.77Database System Concepts - 5th Edition, Oct 4, 2006Sample account FileSilberschatz, Korth and Sudarshan12.78Database System Concept

113、s - 5th Edition, Oct 4, 2006Figure 12.2Silberschatz, Korth and Sudarshan12.79Database System Concepts - 5th Edition, Oct 4, 2006Figure 12.14Silberschatz, Korth and Sudarshan12.80Database System Concepts - 5th Edition, Oct 4, 2006Figure 12.25Silberschatz, Korth and Sudarshan12.81Database System Conce

114、pts - 5th Edition, Oct 4, 2006Grid FilesnStructure used to speed the processing of general multiple search-key queries involving one or more comparison operators.nThe grid file has a single grid array and one linear scale for each search-key attribute. The grid array has number of dimensions equal t

115、o number of search-key attributes.nMultiple cells of grid array can point to same bucketnTo find the bucket for a search-key value, locate the row and column of its cell using the linear scales and follow pointerSilberschatz, Korth and Sudarshan12.82Database System Concepts - 5th Edition, Oct 4, 200

116、6Example Grid File for accountSilberschatz, Korth and Sudarshan12.83Database System Concepts - 5th Edition, Oct 4, 2006Queries on a Grid FilenA grid file on two attributes A and B can handle queries of all following forms with reasonable efficiency l(a1 A a2)l(b1 B b2)l(a1 A a2 b1 B b2),.nE.g., to a

117、nswer (a1 A a2 b1 B b2), use linear scales to find corresponding candidate grid array cells, and look up all the buckets pointed to from those cells.Silberschatz, Korth and Sudarshan12.84Database System Concepts - 5th Edition, Oct 4, 2006Grid Files (Cont.)nDuring insertion, if a bucket becomes full,

118、 new bucket can be created if more than one cell points to it. lIdea similar to extendable hashing, but on multiple dimensionsl If only one cell points to it, either an overflow bucket must be created or the grid size must be increasednLinear scales must be chosen to uniformly distribute records across cells. l Otherwise there will be too many overflow buckets.nPeriodic re-organization to increase grid size will help.lBut reorganization can be very expensive.nSpace overhead of grid array can be high.nR-trees (Chapter 23) are an alternative

展开阅读全文

数据库系统原理英文课件：ch12 Indexing and Hashing

最新文档