0xzhang的博客

广泛应用的算法

· 0xzhang

ds.algorithms - Core algorithms deployed - Theoretical Computer Science Stack Exchange

Q: Manu

A: Vijay D, Huck Bennett

为了向其他领域的人展示算法的重要性,用一个列表记录已经广泛地部署在商业、政府的软硬件设施上的算法。

Preface§

Algorithms that are the main driver behind a system are, in my opinion, easier to find in non-algorithms courses for the same reason theorems with immediate applications are easier to find in applied mathematics rather than pure mathematics courses. It is rare for a practical problem to have the exact structure of the abstract problem in a lecture. To be argumentative, I see no reason why fashionable algorithms course material such as Strassen's multiplication, the AKS primality test, or the Moser-Tardos algorithm is relevant for low-level practical problems of implementing a video database, an optimizing compiler, an operating system, a network congestion control system or any other system. The value of these courses is learning that there are intricate ways to exploit the structure of a problem to find efficient solutions. Advanced algorithms is also where one meets simple algorithms whose analysis is non-trivial. For this reason, I would not dismiss simple randomized algorithms or PageRank.

I think you can choose any large piece of software and find basic and advanced algorithms implemented in it. As a case study, I've done this for the Linux kernel, and shown a few examples from Chromium.

Linux kernel§

  1. Linked list, doubly linked list, lock-free linked list.
  2. B+ Trees with comments telling you what you can't find in the textbooks.
  3. Priority sorted lists used for mutexes, drivers, etc.
  4. Red-Black trees are used for scheduling, virtual memory management, to track file descriptors and directory entries,etc.
  5. Interval trees
  6. Radix trees, are used for memory management, NFS related lookups and networking related functionality.
  7. Priority heap, which is literally, a textbook implementation, used in the control group system.
  8. Hash functions, with a reference to Knuth and to a paper.
  9. Some parts of the code, such as this driver, implement their own hash function.
  10. Hash tables used to implement inodes, file system integrity checks etc.
  11. Bit arrays, which are used for dealing with flags, interrupts, etc. and are featured in Knuth Vol. 4.
  12. Semaphores and spin locks
  13. Binary search is used for interrupt handling, register cache lookup, etc.
  14. Binary search with B-trees
  15. Depth first search and variant used in directory configuration.
  16. Breadth first search is used to check correctness of locking at runtime.
  17. Merge sort on linked lists is used for garbage collection, file system management, etc.
  18. Bubble sort is amazingly implemented too, in a driver library.
  19. Knuth-Morris-Pratt string matching,
  20. Boyer-Moore pattern matching with references and recommendations for when to prefer the alternative.

Chromium Web Browser§

  1. Splay trees.
  2. Voronoi diagrams are used in a demo.
  3. Tabbing based on Bresenham's algorithm.
  4. Binary trees
  5. Red-Black trees
  6. AVL trees
  7. Rabin-Karp string matching is used for compression.
  8. Compute the suffixes of an automaton.
  9. Bloom filter implemented by Apple Inc.
  10. Bresenham's algorithm.

Programming Language Libraries§

  1. The C++ STL includes lists, stacks, queues, maps, vectors, and algorithms for sorting, searching and heap manipulation.
  2. The Java API is very extensive and covers much more.
  3. The Boost C++ library includes algorithms like Boyer-Moore and Knuth-Morris-Pratt string matching algorithms.

Allocation and Scheduling Algorithms§

  1. Least Recently Used can be implemented in multiple ways. A list-based implementation in the Linux kernel.
  2. Other possibilities are First In First Out, Least Frequently Used, and Round Robin.
  3. A variant of FIFO was used by the VAX/VMS system.
  4. The Clock algorithm by Richard Carr is used for page frame replacement in Linux.
  5. The Intel i860 processor used a random replacement policy.
  6. Adaptive Replacement Cache is used in some IBM storage controllers, and was used in PostgreSQL though only briefly due to patent concerns.
  7. The Buddy memory allocation algorithm, which is discussed by Knuth in TAOCP Vol. 1 is used in the Linux kernel, and the jemalloc concurrent allocator used by FreeBSD and in facebook.

Core utils in *nix systems§

  1. grep and awk both implement the Thompson-McNaughton-Yamada construction of NFAs from regular expressions, which apparently even beats the Perl implementation.
  2. tsort implements topological sort.
  3. fgrep implements the Aho-Corasick string matching algorithm.
  4. GNU grep, implements the Boyer-Moore algorithm according to the author Mike Haertel.
  5. crypt(1) on Unix implemented a variant of the encryption algorithm in the Enigma machine.
  6. Unix diff implemented by Doug McIllroy, based on a prototype co-written with James Hunt, performs better than the standard dynamic programming algorithm used to compute Levenshtein distances. The Linux version computes the shortest edit distance.

Cryptographic Algorithms§

  1. Merkle trees, specifically the Tiger Tree Hash variant, were used in peer-to-peer applications such as GTK Gnutella and LimeWire.
  2. MD5 is used to provide a checksum for software packages and is used for integrity checks on *nix systems (Linux implementation) and is also supported on Windows and OS X.
  3. OpenSSL implements many cryptographic algorithms including AES, Blowfish, DES, SHA-1, SHA-2, RSA, DES, etc.

Compilers§

  1. LALR parsing is implemented by yacc and bison.
  2. Dominator algorithms are used in most optimizing compilers based on SSA form.
  3. lex and flex compile regular expressions into NFAs.

Compression and Image Processing§

  1. The Lempel-Ziv algorithms for the GIF image format are implemented in image manipulation programs, starting from the *nix utility convert to complex programs.
  2. Run length encoding is used to generate PCX files (used by the original Paintbrush program), compressed BMP files and TIFF files.
  3. Wavelet compression is the basis for JPEG 2000 so all digital cameras that produce JPEG 2000 files will be implementing this algorithm.
  4. Reed-Solomon error correction is implemented in the Linux kernel, CD drives, barcode readers and was combined with convolution for image transmission from Voyager.

PageRank§

PageRank is one of the best-known such algorithms. Developed by Google co-founder Larry Page and co-authors, it formed the basis of Google's original search engine and is widely credited with helping them to achieve better search results than their competitors at the time.