数据结构英文教学课件：chapter10 Hashing

资源描述

《数据结构英文教学课件：chapter10 Hashing》由会员分享，可在线阅读，更多相关《数据结构英文教学课件：chapter10 Hashing（47页珍藏版）》请在金锄头文库上搜索。

1、Chapter 10HashingADT of Dictionary ADT of Dictionary const int DefaultSize = 26;templateclass Dictionarypublic: Dictionary ( int size = DefaultSize ); int IsIn ( Name name ); Attribute *Find ( Name name ); void Insert ( Name name, Attribute attr ); void Remove ( Name name ); Hash tableSupport the fo

2、llowing operationsFind InsertDelete. (deletions may be unnecessary in some applications)Unlike binary search tree, AVL tree and B+-tree, the following functions cannot be done:Minimum and maximumSuccessor and predecessorReport data within a given rangeList out the data in orderUnrealistic solution E

3、ach position (slot) corresponds to a key in the universe of keysTk corresponds to an element with key kIf the set contains no element with key k, then Tk=NULLUnrealistic solutioninsert, delete and find all take O(1) (worst-case) timeProblem:The scheme wastes too much space if the universe is too lar

4、ge compared with the actual number of elements to be stored. E.g. student IDs are 8-digit integers, so the universe size is 108, but we only have about 4000 students in Software College.HashingUsually, m m, where m is the size of the hash tableDesign a good hash functionthat is fast to compute and c

5、an minimize the number of collisionsDesign a method to resolve the collisions when they occurHash FunctionThe division method h(k) = k mod me.g. m=12, k=100, h(k)=4 Requires only a single division operation (quite fast)Certain values of m should be avoidede.g. if m=2p, then h(k) is just the p lowest

6、-order bits of k; the hash function does not depend on all the bitsSimilarly, if the keys are decimal numbers, should not set m to be a power of 10Hash FunctionIts a good practice to set the table size m to be a prime numberGood values for m: primes not too close to exact powers of 2e.g. the hash ta

7、ble is to hold 2000 numbers, and we dont mind an average of 3 numbers being hashed to the same entrychoose m=701Hash Function.Can the keys be strings?Most hash functions assume that the keys are natural numbersif keys are not natural numbers, a way must be found to interpret them as natural numbersH

8、ash Function.Method 1Add up the ASCII values of the characters in the stringProblems:Different permutations of the same set of characters would have the same hash valueIf the table size is large, the keys are not distribute well. e.g. Suppose m=10007 and all the keys are eight or fewer characters lo

9、ng. Since ASCII value a reasonably equitable distributionProblemEnglish is not randomOnly 28 percent of the table can actually be hashed to (assuming a table size of 10,007)Method 3computesinvolves all characters in the key and be expected to distribute wellCollision Handling: (1) Separate ChainingI

10、nstead of a hash table, we use a table of linked listkeep a linked list of keys that hash to the same valueh(K) = K mod 10Separate ChainingTo insert a key KCompute h(K) to determine which list to traverse If Th(K) contains a null pointer, initiatize this entry to point to a linked list that contains

11、 K alone. If Th(K) is a non-empty list, we add K at the beginning of this list.To delete a key Kcompute h(K), then search for K within the list at Th(K). Delete K if it is found.Separate Chaining Assume that we will be storing n keys. Then we should make m the next larger prime number. If the hash f

12、unction works well, the number of keys in each linked list will be a small constant.Therefore, we expect that each search, insertion, and deletion can be done in constant time.Disadvantage: Memory allocation in linked list manipulation will slow down the program. Advantage: deletion is easy.Collisio

13、n Handling:(2) Open AddressingOpen addressing: relocate the key K to be inserted if it collides with an existing key. That is, we store K at an entry different from Th(K). Two issues arisewhat is the relocation scheme?how to search for K later?Three common methods for resolving a collision in open a

14、ddressingLinear probingQuadratic probingDouble hashingOpen AddressingTo insert a key K, compute h0(K). If Th0(K) is empty, insert it there. If collision occurs, probe alternative cell h1(K), h2(K), . until an empty cell is found.hi(K) = (hash(K) + f(i) mod m, with f(0) = 0f: collision resolution str

15、ategyLinear Probingf(i) =icells are probed sequentially (with wraparound) hi(K) = (hash(K) + i) mod mInsertion:Let K be the new key to be inserted. We compute hash(K)For i = 0 to m-1compute L = ( hash(K) + I ) mod mTL is empty, then we put K there and stop. If we cannot find an empty entry to put K,

16、 it means that the table is full and we should report an error. Linear Probinghi(K) = (hash(K) + i) mod mE.g, inserting keys 89, 18, 49, 58, 69 with hash(K)=K mod 10To insert 58, probe T8, T9, T0, T1To insert 69, probe T9, T0, T1, T2 Primary ClusteringWe call a block of contiguously occupied table e

17、ntries a clusterOn the average, when we insert a new key K, we may hit the middle of a cluster. Therefore, the time to insert K would be proportional to half the size of a cluster. That is, the larger the cluster, the slower the performance. Primary ClusteringLinear probing has the following disadva

18、ntages:Once h(K) falls into a cluster, this cluster will definitely grow in size by one. Thus, this may worsen the performance of insertion in the future.If two cluster are only separated by one entry, then inserting one key into a cluster can merge the two clusters together. Thus, the cluster size

19、can increase drastically by a single insertion. This means that the performance of insertion can deteriorate drastically after a single insertion.Large clusters are easy targets for collisions.Quadratic Probingf(i) = i2hi(K) = ( hash(K) + i2 ) mod mE.g., inserting keys 89, 18, 49, 58, 69 with hash(K

20、) = K mod 10To insert 58, probe T8, T9, T(8+4) mod 10To insert 69, probe T9, T(9+1) mod 10, T(9+4) mod 10Quadratic ProbingTwo keys with different home positions will have different probe sequencese.g. m=101, h(k1)=30, h(k2)=29probe sequence for k1: 30,30+1, 30+4, 30+9probe sequence for k2: 29, 29+1,

21、 29+4, 29+9If the table size is prime, then a new key can always be inserted if the table is at least half empty Quadratic ProbingSecondary clusteringKeys that hash to the same home position will probe the same alternative cellsTo avoid secondary clustering, the probe sequence need to be a function

22、of the original key value, not the home positionDouble HashingTo alleviate the problem of clustering, the sequence of probes for a key should be independent of its primary position = use two hash functions: hash() and hash2()f(i) = i + hash2(K)E.g. hash2(K) = R - (K mod R), with R is a prime smaller

23、 than mDouble Hashinghi(K) = ( hash(K) + f(i) ) mod m; hash(K) = K mod mf(i) = i + hash2(K); hash2(K) = R - (K mod R),Example: m=10, R = 7 and insert keys 89, 18, 49, 58, 69To insert 49, hash2(49)=7, 2nd probe is T(9+7) mod 10To insert 58, hash2(58)=5, 2nd probe is T(8+5) mod 10To insert 69, hash2(6

24、9)=1, 2nd probe is T(9+1) mod 10Choice of hash2()Hash2() must never evaluate to zeroFor any key K, hash2(K) must be relatively prime to the table size m. Otherwise, we will only be able to examine a fraction of the table entries. E.g.,if hash(K) = 0 and hash2(K) = m/2, then we can only examine the e

25、ntries T0, Tm/2, and nothing else!Choice of hash2()One solution is to make m prime, and choose R to be a prime smaller than m, and set hash2(K) = R (K mod R)Quadratic probing, however, does not require the use of a second hash function likely to be simpler and faster in practice例：例：例：例：给定关键字序列为给定关键

26、字序列为给定关键字序列为给定关键字序列为 1919，1414，2323，1 1，6868，2020，8484，2727，5555，1111，1010，7979 散列函数：散列函数：散列函数：散列函数：HH（keykey）=key%13 =key%13 散列表空间地址为散列表空间地址为散列表空间地址为散列表空间地址为0 01212，试用线性探查法建立散列表试用线性探查法建立散列表试用线性探查法建立散列表试用线性探查法建立散列表01234567891011121919141423231 11 11 16868202084848484 84848484272727272727 27272727 5

27、5551111 10107979存储的是存储的是存储的是存储的是数据元素数据元素数据元素数据元素1 8 8 9 9101011117 76 6 5 53 3 12124 4 2 21 1 0 014 27 55 68 84 19 20 10 23 11 Deletion in open addressingActual deletion cannot be performed in open addressing hash tablesotherwise this will isolate records further down the probe sequenceSolution: Add

28、 an extra bit to each table entry, and mark a deleted slot by storing a special value DELETED (tombstone)class HashTable /用线性探查法处理溢出时散列表类的定义用线性探查法处理溢出时散列表类的定义public: enum KindOfEntry Active, Empty, Deleted ; HashTable ( ) : TableSize ( DefaultSize ) AllocateHt ( ); CurrentSize = 0; HashTable ( ) del

29、ete ht; const HashTable & operator = ( const HashTable & ht2 ); int Find ( const Type & x ); int Insert ( const Type & x ); int Remove ( const Type & x ); int IsIn ( const Type & x ) return ( i = Find (x) ) = 0 ? 1 : 0; void MakeEmpty ( );private: struct HashEntry Type Element; KindOfEntry info; int

30、 operator= ( HashEntry &, HashEntry & ); int operator!= ( HashEntry &, HashEntry & ); HashEntry ( ) : info (Empty ) HashEntry (const Type & E, KindOfEntry i = Empty ) : Element (E), info (i) ; enum DefaultSize = 11 HashEntry *ht; int CurrentSize, TableSize; void AllocateHt ( ) ht = new HashEntryTabl

31、eSize ; int FindPos ( const Type & x ) const; template int HashTable:Find ( const Type & x ) /线性探查法的搜索算法，函数返回找到位置。线性探查法的搜索算法，函数返回找到位置。/若返回负数可能是空位，若为若返回负数可能是空位，若为-TableSize则失败。则失败。 int i = FindPos ( x ), j = i; while ( htj.info != Empty & htj.Element != x ) j = ( j + 1 ) % TableSize; if ( j = i ) ret

32、urn -TableSize; if ( htj.info = Active ) return j; else -j;template void HashTabe : MakeEmpty ( ) /置表中所有表项为空置表中所有表项为空 for ( int i = 0; i TableSize; i+) hti.info = Empty; CurrentSize = 0;template const HashTable & HashTable :operator = ( const HashTable &ht2 ) /重载函数：从散列表重载函数：从散列表ht2复制到当前散列表复制到当前散列表 i

33、f ( this != &ht2 ) delete ht; TableSize = ht2.TableSize; AllocateHt ( ); for ( int i = 0; i TableSize; i+ ) hti = ht2.hti; CurrentSize = ht2.CurrentSize; return *this; template int HashTable:Insert (const Type & x ) /将新表项将新表项 x 插入到当前的散列表中插入到当前的散列表中 if ( ( int i = Find (x) ) = 0 ) return 0; /不插入不插入 e

34、lse if ( i != -TableSize & ht-i.info != Active ) /在在 -i 处插入表项处插入表项x ht-i.Element = x; ht-i.info = Active; CurrentSize+; return 1; else return 0;template int HashTable : Remove ( const Type & x ) /在当前散列表中删除表项在当前散列表中删除表项x if ( ( int i = Find (x) ) = 0 ) /找到找到,删除删除 hti.info = deleted; /做删除标记做删除标记 Curre

35、ntSize-; return 1; else return 0;若设若设是散列表的装填因子：是散列表的装填因子：用不同的方法溢出处理冲突时散列表的平均搜索长度用不同的方法溢出处理冲突时散列表的平均搜索长度用不同的方法溢出处理冲突时散列表的平均搜索长度用不同的方法溢出处理冲突时散列表的平均搜索长度如图所示。如图所示。如图所示。如图所示。散散散散列列列列表表表表的的的的装装装装填填填填因因因因子子子子表表表表明明明明了了了了表表表表中中中中的的的的装装装装满满满满程程程程度度度度。越越越越大大大大，说说说说明明明明表表表表越越越越满满满满，再再再再插插插插入入入入新新新新元元元元素素素素时时

36、时时发发发发生生生生冲冲冲冲突突突突的的的的可可可可能性就越大。能性就越大。能性就越大。能性就越大。散散散散列列列列表表表表的的的的搜搜搜搜索索索索性性性性能能能能，即即即即平平平平均均均均搜搜搜搜索索索索长长长长度度度度依依依依赖赖赖赖于于于于散散散散列列列列表的装填因子，不直接依赖于表的装填因子，不直接依赖于表的装填因子，不直接依赖于表的装填因子，不直接依赖于 n n 或或或或 mm。不不不不论论论论表表表表的的的的长长长长度度度度有有有有多多多多大大大大，我我我我们们们们总总总总能能能能选选选选择择择择一一一一个个个个合合合合适适适适的的的的装填因子，以把平均搜索长度限制在一定范围内。装填因子，以把平均搜索长度限制在一定范围内。装填因子，以把平均搜索长度限制在一定范围内。装填因子，以把平均搜索长度限制在一定范围内。Give Input4371，1323，6173，4199，4344，9679，1989,3081,1505,4930 and hash function h(X) = X mod 13.Table size is fixed to 23,show the result of open addressing hash table using quadratic probing, and compute the number of collisions.

展开阅读全文

数据结构英文教学课件：chapter10 Hashing

最新文档