How to enter certain UTF-8 characters

How can I get this character to dispay: U+1D404 (bold Capital E) (see UTF-8 Character Set 1D400-1D4FF)?

Not sure I understand the question. Strings are UTF-8 internally, so just put it in your input file? 𝐄

Are you getting some error when you try this?

You can identify a unicode character numerically in ascii, either using decimal or hexadecimal notation, but it needs to be in an HTML string. For your case, you can do something like

graph {
  vn [label=<&#x1D400;>]
  un [label=<&#119808;>]

The documentation notes the decimal case but not the hex case. Also, I just noted there is a problem in the expat library, in that it doesn’t accept a capital X in the hex case, but only allows a lowercase x. The documentation should have a few more examples.

Looks like only ‘x’ is allowed in HTML entities, so the Graphviz code allowing ‘X’ will never get called.

Can you elaborate? I assume you’re referring to lib/common/xml.c:29:

 17 /* return true if *s points to &[A-Za-z]+;      (e.g. &Ccedil; )
 18  *                          or &#[0-9]*;        (e.g. &#38; )
 19  *                          or &#x[0-9a-fA-F]*; (e.g. &#x6C34; )
 20  */
 21 static bool xml_isentity(const char *s)
 22 {
 23     s++;                        /* already known to be '&' */
 24     if (*s == ';') { // '&;' is not a valid entity
 25         return false;
 26     }
 27     if (*s == '#') {
 28         s++;
 29         if (*s == 'x' || *s == 'X') {

There are code paths into this XML processing from several places, none of which I see restricting the input to containing only lower case “x”.

I’m also a little unclear about this:

I’m not too familiar with the expat codebase, but I think you’re referring to expat/lib/xmltok_impl.c:516 (as of commit 441f98d02deafd9b090aea568282b28f66a50e36):

 512 static int PTRCALL
 513 PREFIX(scanCharRef)(const ENCODING *enc, const char *ptr, const char *end,
 514                     const char **nextTokPtr) {
 515   if (HAS_CHAR(enc, ptr, end)) {
 516     if (CHAR_MATCHES(enc, ptr, ASCII_x))
 517       return PREFIX(scanHexCharRef)(enc, ptr + MINBPC(enc), end, nextTokPtr);

This comparison does indeed appear to bottom out on exclusively lower case “x”, regardless of encoding. But my reading of the official guidance is that lower case “x” is the only legal form. I don’t think this is an expat problem. This seems to be Graphviz (and Chrome and Firefox, now that I check) being overly liberal.