ds.algorithms - Core algorithms deployed - Theoretical Computer Science Stack Exchange
Q: Manu
A: Vijay D, Huck Bennett
为了向其他领域的人展示算法的重要性,用一个列表记录已经广泛地部署在商业、政府的软硬件设施上的算法。
Preface§
Algorithms that are the main driver behind a system are, in my opinion, easier to find in non-algorithms courses for the same reason theorems with immediate applications are easier to find in applied mathematics rather than pure mathematics courses. It is rare for a practical problem to have the exact structure of the abstract problem in a lecture. To be argumentative, I see no reason why fashionable algorithms course material such as Strassen's multiplication, the AKS primality test, or the Moser-Tardos algorithm is relevant for low-level practical problems of implementing a video database, an optimizing compiler, an operating system, a network congestion control system or any other system. The value of these courses is learning that there are intricate ways to exploit the structure of a problem to find efficient solutions. Advanced algorithms is also where one meets simple algorithms whose analysis is non-trivial. For this reason, I would not dismiss simple randomized algorithms or PageRank.
I think you can choose any large piece of software and find basic and advanced algorithms implemented in it. As a case study, I've done this for the Linux kernel, and shown a few examples from Chromium.
Linux kernel§
- Linked list, doubly linked list, lock-free linked list.
- B+ Trees with comments telling you what you can't find in the textbooks.
- Priority sorted lists used for mutexes, drivers, etc.
- Red-Black trees are used for scheduling, virtual memory management, to track file descriptors and directory entries,etc.
- Interval trees
- Radix trees, are used for memory management, NFS related lookups and networking related functionality.
- Priority heap, which is literally, a textbook implementation, used in the control group system.
- Hash functions, with a reference to Knuth and to a paper.
- Some parts of the code, such as this driver, implement their own hash function.
- Hash tables used to implement inodes, file system integrity checks etc.
- Bit arrays, which are used for dealing with flags, interrupts, etc. and are featured in Knuth Vol. 4.
- Semaphores and spin locks
- Binary search is used for interrupt handling, register cache lookup, etc.
- Binary search with B-trees
- Depth first search and variant used in directory configuration.
- Breadth first search is used to check correctness of locking at runtime.
- Merge sort on linked lists is used for garbage collection, file system management, etc.
- Bubble sort is amazingly implemented too, in a driver library.
- Knuth-Morris-Pratt string matching,
- Boyer-Moore pattern matching with references and recommendations for when to prefer the alternative.
Chromium Web Browser§
- Splay trees.
- Voronoi diagrams are used in a demo.
- Tabbing based on Bresenham's algorithm.
- Binary trees
- Red-Black trees
- AVL trees
- Rabin-Karp string matching is used for compression.
- Compute the suffixes of an automaton.
- Bloom filter implemented by Apple Inc.
- Bresenham's algorithm.
Programming Language Libraries§
- The C++ STL includes lists, stacks, queues, maps, vectors, and algorithms for sorting, searching and heap manipulation.
- The Java API is very extensive and covers much more.
- The Boost C++ library includes algorithms like Boyer-Moore and Knuth-Morris-Pratt string matching algorithms.
Allocation and Scheduling Algorithms§
- Least Recently Used can be implemented in multiple ways. A list-based implementation in the Linux kernel.
- Other possibilities are First In First Out, Least Frequently Used, and Round Robin.
- A variant of FIFO was used by the VAX/VMS system.
- The Clock algorithm by Richard Carr is used for page frame replacement in Linux.
- The Intel i860 processor used a random replacement policy.
- Adaptive Replacement Cache is used in some IBM storage controllers, and was used in PostgreSQL though only briefly due to patent concerns.
- The Buddy memory allocation algorithm, which is discussed by Knuth in TAOCP Vol. 1 is used in the Linux kernel, and the jemalloc concurrent allocator used by FreeBSD and in facebook.
Core utils in *nix systems§
- grep and awk both implement the Thompson-McNaughton-Yamada construction of NFAs from regular expressions, which apparently even beats the Perl implementation.
- tsort implements topological sort.
- fgrep implements the Aho-Corasick string matching algorithm.
- GNU grep, implements the Boyer-Moore algorithm according to the author Mike Haertel.
- crypt(1) on Unix implemented a variant of the encryption algorithm in the Enigma machine.
- Unix diff implemented by Doug McIllroy, based on a prototype co-written with James Hunt, performs better than the standard dynamic programming algorithm used to compute Levenshtein distances. The Linux version computes the shortest edit distance.
Cryptographic Algorithms§
- Merkle trees, specifically the Tiger Tree Hash variant, were used in peer-to-peer applications such as GTK Gnutella and LimeWire.
- MD5 is used to provide a checksum for software packages and is used for integrity checks on *nix systems (Linux implementation) and is also supported on Windows and OS X.
- OpenSSL implements many cryptographic algorithms including AES, Blowfish, DES, SHA-1, SHA-2, RSA, DES, etc.
Compilers§
- LALR parsing is implemented by yacc and bison.
- Dominator algorithms are used in most optimizing compilers based on SSA form.
- lex and flex compile regular expressions into NFAs.
Compression and Image Processing§
- The Lempel-Ziv algorithms for the GIF image format are implemented in image manipulation programs, starting from the *nix utility convert to complex programs.
- Run length encoding is used to generate PCX files (used by the original Paintbrush program), compressed BMP files and TIFF files.
- Wavelet compression is the basis for JPEG 2000 so all digital cameras that produce JPEG 2000 files will be implementing this algorithm.
- Reed-Solomon error correction is implemented in the Linux kernel, CD drives, barcode readers and was combined with convolution for image transmission from Voyager.
PageRank§
PageRank is one of the best-known such algorithms. Developed by Google co-founder Larry Page and co-authors, it formed the basis of Google's original search engine and is widely credited with helping them to achieve better search results than their competitors at the time.