docs(dev-infra): document limitation in ts-circular-deps tool (#36659)

Adds documentation on discovered limitations in the ts-circular-deps
tool, so that we can reference it when needed.

PR Close #36659
This commit is contained in:
Paul Gschwendtner 2020-04-16 23:37:03 +02:00 committed by Matias Niemelä
parent 81d23b33ef
commit 37bfb14956
2 changed files with 82 additions and 0 deletions

View File

@ -0,0 +1,82 @@
### ts-circular-dependencies
This tool requires a test configuration that declares a set of source files which
should be checked for cyclic dependencies. e.g.
```
yarn ts-circular-deps --config ./test-config.js <check|approve>
```
### Limitations
In order to detect cycles, the tool currently visits each source file and runs
depth first search. If the DFS comes across any node that is part of the current
DFS path, then a cycle has been detected and the tool will capture it.
This algorithm has limitations. For example, consider the following graph:
![Example graph](./example-graph.png)
Depending on which source file is considered first, the output of the circular dependency tool
will be different. This is because the tool does not recursively find _all_ possible cycles. This
would be too inefficient for large graphs (especially in the `angular/angular` repository).
In this concrete example, the tool will visit `r3_test_bed` first. Then the first neighbour
(based on the import in the source file) will be visited. This is `test_bed`. Once done, the
tool will visit the first neighbour of `test_bed`. This is `r3_test_bed` again. The node has
already been visited, and also is part of the current DFS path. The tool captures this as cycle.
As no more nodes can be visited within that path, the tool continues (as per DFS algorithm)
with visiting the remaining neighbours of `r3_test_bed`. It will visit `test_bed_common` and
then come across `test_bed`. The tool only knows that `test_bed` has already been visited, but
it does not know that it would close a cycle. The tool certainly could know this by recursively
checking neighbours of `test_bed` again, but this is inefficient and will cause the algorithm
to eventually degenerate into brute-force.
In summary, the tool is unable to capture _all_ elementary cycles in the graph. This does not
mean though that the tool is incorrectly suggesting that there are _no_ cycles in a graph. The
tool is still able to correctly detect whether there are _any_ cycles in a graph or not. For
example, if edge from `r3_test_bed` to `test_bed` is removed, then the tool will be able to
capture at least one of the other cycles. The golden will change in an unexpected way, but it's
**expected** given the trade-off we take for an acceptable running time.
Other algorithms exist which are proven to print out _all_ the elementary cycles in a directed
graph. For example:
* [Johnson's algorithm for finding simple cycles][johnson-cycles].
* [Tarjan's algorithm for enumerating elementary circuits][tarjan-cycles].
Experiments with these algorithms unveiled that usual source file graphs we have in Angular
repositories are too large to be processed in acceptable time. At the time of writing, the
source file graph of `angular/angular` consists of 3350 nodes and 8730 edges.
Algorithms like the one from Donald B. Johnson, which first split the graph into strongly
connected components, and then search for elementary cycles in all components with at least
two vertices, are too inefficient for the source files graphs we have. Time complexity for
such algorithms is described to be `O((n + e)(c + 1))` where `c` is the number of elementary
circuits. Donald B. Johnson describes the number of elementary circuits the followed:
> Thus the number of elementary circuits in a directed graph can grow faster with n than
the exponential 2"
This shows quite well that these algorithms become quickly inefficient the more vertices, edges
and simple cycles a graph has. Finding elementary cycles of arbitrary length seems NP-complete as
finding a Hamiltonian cycle with length of `n` is NP-complete too. Below is a quote from a
[paper describing a randomized algorithm](np-complete-cycles) for finding simple cycles of a
_fixed_ length that seems to confirm this hypothesis:
> It is well known that finding the longest cycle in a graph is a hard problem, since finding
a hamiltonian cycle is NP-complete. Hence finding a simple cycle of length k, for an arbitrary
k, is NP-complete.
Other tools like `madge` or `dpdm` have the same limitations.
**Resources**:
* [Finding all the elementary circuits of a directed graph - Donald. B. Johnson][johnson-cycles]
* [Enumeration of the elementary circuits of a directed graph - Robert Tarjan][tarjan-cycles]
* [Once again: Finding simple cycles in graphs - Carsten Dorgerlohx; Jürgen Wirtgen][np-complete-cycles]
[johnson-cycles]: https://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF
[tarjan-cycles]: https://ecommons.cornell.edu/bitstream/handle/1813/5941/72-145.pdf?sequence=1&isAllowed=y
[np-complete-cycles]: https://pdfs.semanticscholar.org/16b2/d1a3cf4a8a5dbcad10bb901724631ebead33.pdf

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB