Is there a CLI parameter in Graphviz to replace all labels with meaningless strings when processing a DOT file for analysis

I’d like to seek help regarding an issue with one of my graphs on a forum or similar platform, but I prefer not to share the actual content directly. I recall that Graphviz might have a feature that replaces all text labels with meaningless strings, allowing the processed graph to be shared safely for analysis. However, I can’t seem to locate the specific CLI option for this. Could anyone confirm whether such a feature exists and how to use it?

If your output format does not preserve the actual text, you can get by with using a font like Dummy Text Font | dafont.com.

I’ve looked for a solution to anonymization myself, and would be happy to learn that I’ve missed a CLI option like this, but I’m not aware of it. For cases where the output format preserves text, I ended up building the text replacement right into my application that generates the DOT code.

[Anonymizing is complex. This will be more detail than you want]

  • Sorry, there is no cli parameter or singular object-level attribute to anonymize
  • Two basic versions of the problem:
    • When only the resulting diagram is to be shared
      • Label/text changes
        • node
        • edge (label, xlabel, headlabel, taillabel)
        • graphs (subgraphs, clusters, root)
        • there are many, less common, attributes (see this)
      • NOTE 1: because they can be “decompiled”, svg, postscript, and pdf should be shared as input files, not result/image files. See below.
      • NOTE 2: escape-strings should be handled carefully (i.e. saving the escapes)
      • NOTE 3: record nodes and html nodes are quite complex
    • When the input file is to be shared (e.g when creating a bug report)
      • All the above
      • Name changes
        • nodes
        • edges
        • graphs (subgraphs, clusters, root)
      • comments
      • extra (non-standard) attributes

The right way to anonymize is to incorporate the feature in the Graphviz code.

I have a (lightly tested) pre-processor that does much of the above except for record & html records. I’ll post it if anyone is interested.

I’ve done some thinking about this in the past, but been unable to convince myself this can be done robustly. Because, as Steve says, label text affects graph layout, even if you could remove/hide the original text content a motivated third party can likely reverse engineer your redacted content from the layout.

As long as the content of the characters is hidden, at most the length of the characters can be obtained. I think this is enough

Depending on the renderer, you may be able to infer character height as well.

If your goal is simply to replace text with blank space, my intuition is that this is easiest done by post processing an SVG. You can probably take an SVG from the core plugin (-Tsvg) and just delete any <text> element. With output from the Pango plugin (-Tsvg:cairo) things will be more complicated as text will have become paths.

Alternatively you could set a transparent font color and render to a bitmap format.

Or you could write a script to replace text in the input file with arbitrary similar length content and hope this doesn’t affect the layout in significant ways.

I am not sure how we would implement this inside Graphviz. As I said, all the ways I can imagine doing this leak information about your original text. As you point out, not a lot of information. But in the past, some other people requesting this feature have wanted strong privacy guarantees. I’m not comfortable including a feature that implies your original text is unrecoverable unless we’re really sure that is the case.

Good comments. Agree with @smattr. In answer to the immediate question, I don’t think there are any tools or features in Graphviz itself to anonymize graph content.

If the goal is to anonymize the graphical output of Graphviz, agree with the suggestion to perform this downstream by operating on the SVG. There must be a generic approach to this.

If the goal is to anonymize the graph file itself by modifying the text, agree this seems difficult to get 100% right (due to complexities of text rendering with kerning, ligatures, etc.) but ChatGPT suggests a couple of things. It claims that graphviz text label height is not content-dependent (I think that’s true) and it suggests approximating width closely using a small set of glyphs (H, W, i, etc.) to generate a substitute label. It offers some pseudocode in Python.

A transformer like this seems like a reasonable exercise to write with AI generated code.