Package CedarBackup2 :: Module knapsack
[hide private]
[frames] | no frames]

Source Code for Module CedarBackup2.knapsack

  1  # -*- coding: iso-8859-1 -*- 
  2  # vim: set ft=python ts=3 sw=3 expandtab: 
  3  # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
  4  # 
  5  #              C E D A R 
  6  #          S O L U T I O N S       "Software done right." 
  7  #           S O F T W A R E 
  8  # 
  9  # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
 10  # 
 11  # Copyright (c) 2004-2005 Kenneth J. Pronovici. 
 12  # All rights reserved. 
 13  # 
 14  # This program is free software; you can redistribute it and/or 
 15  # modify it under the terms of the GNU General Public License, 
 16  # Version 2, as published by the Free Software Foundation. 
 17  # 
 18  # This program is distributed in the hope that it will be useful, 
 19  # but WITHOUT ANY WARRANTY; without even the implied warranty of 
 20  # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 
 21  # 
 22  # Copies of the GNU General Public License are available from 
 23  # the Free Software Foundation website, http://www.gnu.org/. 
 24  # 
 25  # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
 26  # 
 27  # Author   : Kenneth J. Pronovici <pronovic@ieee.org> 
 28  # Language : Python (>= 2.3) 
 29  # Project  : Cedar Backup, release 2 
 30  # Revision : $Id: knapsack.py 687 2007-02-18 04:59:52Z pronovic $ 
 31  # Purpose  : Provides knapsack algorithms used for "fit" decisions 
 32  # 
 33  # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
 34   
 35  ######## 
 36  # Notes 
 37  ######## 
 38   
 39  """ 
 40  Provides the implementation for various knapsack algorithms. 
 41   
 42  Knapsack algorithms are "fit" algorithms, used to take a set of "things" and 
 43  decide on the optimal way to fit them into some container.  The focus of this 
 44  code is to fit files onto a disc, although the interface (in terms of item, 
 45  item size and capacity size, with no units) is generic enough that it can 
 46  be applied to items other than files. 
 47   
 48  All of the algorithms implemented below assume that "optimal" means "use up as 
 49  much of the disc's capacity as possible", but each produces slightly different 
 50  results.  For instance, the best fit and first fit algorithms tend to include 
 51  fewer files than the worst fit and alternate fit algorithms, even if they use 
 52  the disc space more efficiently. 
 53   
 54  Usually, for a given set of circumstances, it will be obvious to a human which 
 55  algorithm is the right one to use, based on trade-offs between number of files 
 56  included and ideal space utilization.  It's a little more difficult to do this 
 57  programmatically.  For Cedar Backup's purposes (i.e. trying to fit a small 
 58  number of collect-directory tarfiles onto a disc), worst-fit is probably the 
 59  best choice if the goal is to include as many of the collect directories as 
 60  possible. 
 61   
 62  @sort: firstFit, bestFit, worstFit, alternateFit 
 63   
 64  @author: Kenneth J. Pronovici <pronovic@ieee.org> 
 65  """ 
 66   
 67  ####################################################################### 
 68  # Public functions 
 69  ####################################################################### 
 70   
 71  ###################### 
 72  # firstFit() function 
 73  ###################### 
 74   
75 -def firstFit(items, capacity):
76 77 """ 78 Implements the first-fit knapsack algorithm. 79 80 The first-fit algorithm proceeds through an unsorted list of items until 81 running out of items or meeting capacity exactly. If capacity is exceeded, 82 the item that caused capacity to be exceeded is thrown away and the next one 83 is tried. This algorithm generally performs more poorly than the other 84 algorithms both in terms of capacity utilization and item utilization, but 85 can be as much as an order of magnitude faster on large lists of items 86 because it doesn't require any sorting. 87 88 The "size" values in the items and capacity arguments must be comparable, 89 but they are unitless from the perspective of this function. Zero-sized 90 items and capacity are considered degenerate cases. If capacity is zero, 91 no items fit, period, even if the items list contains zero-sized items. 92 93 The dictionary is indexed by its key, and then includes its key. This 94 seems kind of strange on first glance. It works this way to facilitate 95 easy sorting of the list on key if needed. 96 97 The function assumes that the list of items may be used destructively, if 98 needed. This avoids the overhead of having the function make a copy of the 99 list, if this is not required. Callers should pass C{items.copy()} if they 100 do not want their version of the list modified. 101 102 The function returns a list of chosen items and the unitless amount of 103 capacity used by the items. 104 105 @param items: Items to operate on 106 @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 107 108 @param capacity: Capacity of container to fit to 109 @type capacity: integer 110 111 @returns: Tuple C{(items, used)} as described above 112 """ 113 114 # Use dict since insert into dict is faster than list append 115 included = { } 116 117 # Search the list as it stands (arbitrary order) 118 used = 0 119 remaining = capacity 120 for key in items.keys(): 121 if remaining == 0: 122 break 123 if remaining - items[key][1] >= 0: 124 included[key] = None 125 used += items[key][1] 126 remaining -= items[key][1] 127 128 # Return results 129 return (included.keys(), used)
130 131 132 ##################### 133 # bestFit() function 134 ##################### 135
136 -def bestFit(items, capacity):
137 138 """ 139 Implements the best-fit knapsack algorithm. 140 141 The best-fit algorithm proceeds through a sorted list of items (sorted from 142 largest to smallest) until running out of items or meeting capacity exactly. 143 If capacity is exceeded, the item that caused capacity to be exceeded is 144 thrown away and the next one is tried. The algorithm effectively includes 145 the minimum number of items possible in its search for optimal capacity 146 utilization. For large lists of mixed-size items, it's not ususual to see 147 the algorithm achieve 100% capacity utilization by including fewer than 1% 148 of the items. Probably because it often has to look at fewer of the items 149 before completing, it tends to be a little faster than the worst-fit or 150 alternate-fit algorithms. 151 152 The "size" values in the items and capacity arguments must be comparable, 153 but they are unitless from the perspective of this function. Zero-sized 154 items and capacity are considered degenerate cases. If capacity is zero, 155 no items fit, period, even if the items list contains zero-sized items. 156 157 The dictionary is indexed by its key, and then includes its key. This 158 seems kind of strange on first glance. It works this way to facilitate 159 easy sorting of the list on key if needed. 160 161 The function assumes that the list of items may be used destructively, if 162 needed. This avoids the overhead of having the function make a copy of the 163 list, if this is not required. Callers should pass C{items.copy()} if they 164 do not want their version of the list modified. 165 166 The function returns a list of chosen items and the unitless amount of 167 capacity used by the items. 168 169 @param items: Items to operate on 170 @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 171 172 @param capacity: Capacity of container to fit to 173 @type capacity: integer 174 175 @returns: Tuple C{(items, used)} as described above 176 """ 177 178 # Use dict since insert into dict is faster than list append 179 included = { } 180 181 # Sort the list from largest to smallest 182 itemlist = items.items() 183 itemlist.sort(lambda x,y: cmp(y[1][1], x[1][1])) # sort descending 184 keys = [] 185 for item in itemlist: 186 keys.append(item[0]) 187 188 # Search the list 189 used = 0 190 remaining = capacity 191 for key in keys: 192 if remaining == 0: 193 break 194 if remaining - items[key][1] >= 0: 195 included[key] = None 196 used += items[key][1] 197 remaining -= items[key][1] 198 199 # Return the results 200 return (included.keys(), used)
201 202 203 ###################### 204 # worstFit() function 205 ###################### 206
207 -def worstFit(items, capacity):
208 209 """ 210 Implements the worst-fit knapsack algorithm. 211 212 The worst-fit algorithm proceeds through an a sorted list of items (sorted 213 from smallest to largest) until running out of items or meeting capacity 214 exactly. If capacity is exceeded, the item that caused capacity to be 215 exceeded is thrown away and the next one is tried. The algorithm 216 effectively includes the maximum number of items possible in its search for 217 optimal capacity utilization. It tends to be somewhat slower than either 218 the best-fit or alternate-fit algorithm, probably because on average it has 219 to look at more items before completing. 220 221 The "size" values in the items and capacity arguments must be comparable, 222 but they are unitless from the perspective of this function. Zero-sized 223 items and capacity are considered degenerate cases. If capacity is zero, 224 no items fit, period, even if the items list contains zero-sized items. 225 226 The dictionary is indexed by its key, and then includes its key. This 227 seems kind of strange on first glance. It works this way to facilitate 228 easy sorting of the list on key if needed. 229 230 The function assumes that the list of items may be used destructively, if 231 needed. This avoids the overhead of having the function make a copy of the 232 list, if this is not required. Callers should pass C{items.copy()} if they 233 do not want their version of the list modified. 234 235 The function returns a list of chosen items and the unitless amount of 236 capacity used by the items. 237 238 @param items: Items to operate on 239 @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 240 241 @param capacity: Capacity of container to fit to 242 @type capacity: integer 243 244 @returns: Tuple C{(items, used)} as described above 245 """ 246 247 # Use dict since insert into dict is faster than list append 248 included = { } 249 250 # Sort the list from smallest to largest 251 itemlist = items.items() 252 itemlist.sort(lambda x,y: cmp(x[1][1], y[1][1])) # sort ascending 253 keys = [] 254 for item in itemlist: 255 keys.append(item[0]) 256 257 # Search the list 258 used = 0 259 remaining = capacity 260 for key in keys: 261 if remaining == 0: 262 break 263 if remaining - items[key][1] >= 0: 264 included[key] = None 265 used += items[key][1] 266 remaining -= items[key][1] 267 268 # Return results 269 return (included.keys(), used)
270 271 272 ########################## 273 # alternateFit() function 274 ########################## 275
276 -def alternateFit(items, capacity):
277 278 """ 279 Implements the alternate-fit knapsack algorithm. 280 281 This algorithm (which I'm calling "alternate-fit" as in "alternate from one 282 to the other") tries to balance small and large items to achieve better 283 end-of-disk performance. Instead of just working one direction through a 284 list, it alternately works from the start and end of a sorted list (sorted 285 from smallest to largest), throwing away any item which causes capacity to 286 be exceeded. The algorithm tends to be slower than the best-fit and 287 first-fit algorithms, and slightly faster than the worst-fit algorithm, 288 probably because of the number of items it considers on average before 289 completing. It often achieves slightly better capacity utilization than the 290 worst-fit algorithm, while including slighly fewer items. 291 292 The "size" values in the items and capacity arguments must be comparable, 293 but they are unitless from the perspective of this function. Zero-sized 294 items and capacity are considered degenerate cases. If capacity is zero, 295 no items fit, period, even if the items list contains zero-sized items. 296 297 The dictionary is indexed by its key, and then includes its key. This 298 seems kind of strange on first glance. It works this way to facilitate 299 easy sorting of the list on key if needed. 300 301 The function assumes that the list of items may be used destructively, if 302 needed. This avoids the overhead of having the function make a copy of the 303 list, if this is not required. Callers should pass C{items.copy()} if they 304 do not want their version of the list modified. 305 306 The function returns a list of chosen items and the unitless amount of 307 capacity used by the items. 308 309 @param items: Items to operate on 310 @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 311 312 @param capacity: Capacity of container to fit to 313 @type capacity: integer 314 315 @returns: Tuple C{(items, used)} as described above 316 """ 317 318 # Use dict since insert into dict is faster than list append 319 included = { } 320 321 # Sort the list from smallest to largest 322 itemlist = items.items() 323 itemlist.sort(lambda x,y: cmp(x[1][1], y[1][1])) # sort ascending 324 keys = [] 325 for item in itemlist: 326 keys.append(item[0]) 327 328 # Search the list 329 used = 0 330 remaining = capacity 331 332 front = keys[0:len(keys)/2] 333 back = keys[len(keys)/2:len(keys)] 334 back.reverse() 335 336 i = 0 337 j = 0 338 339 while remaining > 0 and (i < len(front) or j < len(back)): 340 if i < len(front): 341 if remaining - items[front[i]][1] >= 0: 342 included[front[i]] = None 343 used += items[front[i]][1] 344 remaining -= items[front[i]][1] 345 i += 1 346 if j < len(back): 347 if remaining - items[back[j]][1] >= 0: 348 included[back[j]] = None 349 used += items[back[j]][1] 350 remaining -= items[back[j]][1] 351 j += 1 352 353 # Return results 354 return (included.keys(), used)
355