Convertor PRO — Manual

Complete reference for Convertor PRO: every encoding field, every data-format mode, the libraries used, and all the limits of each conversion.

Back to the overview: Convertor PRO · Open the live tool: www.jpkc.com/tools/convertor/

This manual describes Convertor PRO in full: both tabs, every input field, every mode, the libraries used, and the limits of each conversion. The tool's interface is in English, so field, button, and option names are given as they appear in the actual UI.

Convertor PRO is split into two tabs: Encoding (character and encoding conversion) and Data Formats (conversion between JSON, YAML, TOML, XML, and INI). Both run entirely in the browser.

Tab 1: Encoding

The Encoding tab uses Richard Ishida's conversion functions (rishida.net, GPL). The core principle matters: there is one central pivot — the actual characters — surrounded by several fields, each of which is a different textual representation of those characters.

How it works: Convert fills everything

Each field has its own Convert button and a copy button. When you type something into a field and click its Convert, two things happen: the content is first decoded to actual characters, and from those characters all other fields are recomputed. So if you type é into the Characters field and click Convert, the other fields automatically show é, %C3%A9, é, and so on. Conversely, if you type é into the HTML/XML field and click its Convert, é appears in Characters and everything else fills in too.

Conversion is therefore possible in both directions, always through the characters as the common center. On empty input, the tool politely tells you it needs some text.

ASCII and Latin1 options

Several fields have two checkboxes, ASCII and Latin1. They control which characters get escaped at all. With ASCII (the default for NCRs, Unicode, and 0x), ASCII characters (U+0000–U+007F) are left untouched and only everything above is converted into the given notation. With Latin1, the Latin-1 range is additionally left alone. With neither, all characters are converted. This is handy when you only want to escape the "exotic" characters of an otherwise readable string.

Mixed input

The topmost field, Mixed input, is a catch-all: here you may throw in mixed escapes of different notations together, and the tool resolves them all to characters. It has several buttons:

  • Convert — resolves the recognized escapes to characters.
  • Hex CP, Dec CP — additionally outputs the hexadecimal or decimal code points.
  • UTF-8, UTF-16 — outputs the respective code units.

The Convert \x checkbox enables handling of single-letter \x escapes. Mixed input is the all-rounder for "I have some escaped mess here and want the plain text."

HTML/XML

HTML/XML field: numeric and named HTML/XML entities. When decoding, the field understands named entities (&,  , ©, …) as well as numeric ones. When generating from characters, there are two options:

  • Escape invisibles (default: on) — turns invisible characters (control characters, non-breaking spaces, etc.) into visible entities so they don't vanish silently in the markup.
  • Bidi to markup — converts bidi control characters (for right-to-left text) into corresponding markup.

Percent encoding (URIs)

Percent encoding (URIs) field: the %XX notation for URLs. ü becomes %C3%BC (percent-encoded UTF-8 bytes), and conversely the field resolves a percent-encoded string back to plain text. The right mode when you need to put special characters into a URL path or query string.

Hexadecimal NCRs / Decimal NCRs

Two fields for Numeric Character References as HTML uses them:

  • Hexadecimal NCRs — form &#xHHHH; (e.g. 😀 for 😀).
  • Decimal NCRs — form &#NNNN; (e.g. 😀).

Both have the ASCII/Latin1 options. NCRs work in any HTML/XML context and are the most robust way to write an arbitrary character into markup.

Unicode U+hex / 0x… notation

Two fields for code-point notations as they appear in specs and source code:

  • Unicode U+hex — the standard notation U+00E9, U+1F600.
  • 0x… notation — the 0x style many programming languages use for hex literals.

Both with ASCII/Latin1 options.

Hex code points / Decimal code points

The bare code-point values without a prefix:

  • Hex code points — hexadecimal code points, space-separated.
  • Decimal code points — the same values in decimal.

Useful when passing code points to a program or table that expects no particular notation prefix.

UTF-8 code units / UTF-16 code units

The actual encoding bytes or code units:

  • UTF-8 code units — the UTF-8 byte sequence of a character.
  • UTF-16 code units — the UTF-16 code units, including surrogate pairs for characters beyond the Basic Multilingual Plane.

Here you see how a character is really stored at the byte level — important when debugging encoding problems.

JavaScript escapes / CSS escapes

Two fields for the escape syntax of two specific languages:

  • JavaScript escapes\uXXXX escapes for JS strings. The C-style Supp. checkbox enables C-style handling of supplementary-plane characters.
  • CSS escapes — the backslash-hex escapes CSS allows in selectors and content values.

Tab 2: Data Formats

The second tab converts between JSON, YAML, TOML, XML, and INI. The interaction model is source → target: you enter data on the left, pick a target format on the right, and click Convert.

Layout and controls

On the left, the Source panel with a format selector (JSON, YAML, TOML, XML, INI); on the right, the Output panel with its own selector. Both panels are syntax-highlighting ACE editors (theme "Dracula", word wrap on). The Output field is read-only.

Source-panel buttons:

  • Paste — pastes from the clipboard and auto-detects the format.
  • Open file — loads a local file (accepts .json, .yaml, .yml, .toml, .xml, .htm, .html, .ini, .cfg, .conf); the format is derived from the extension, otherwise from content detection.
  • Clear — empties source and output.

Output-panel buttons: Copy (to the clipboard) and Save (download as a file with the matching extension). Below: Convert and Swap (output becomes the new input, formats swap).

How conversion works

Every conversion runs in two steps through a shared intermediate stage: the source format is parsed into a JavaScript object, and that object is serialized into the target format. That is why any direction between the five formats is possible — there are no fixed pairs, but five parsers and five serializers around one object center.

Building blocks used:

  • JSON — native JSON.parse / JSON.stringify (output with 2-space indentation).
  • YAML — the js-yaml library (load / dump, indent 2, no anchors/references).
  • TOML — a lightweight, bundled TOML 1.0 implementation (tables, array-of-tables, strings, multiline strings, numbers in decimal/hex/octal/binary, booleans, dates, arrays, inline tables).
  • XML — the native DOMParser for reading, a custom serializer for writing.
  • INI — a custom parser and serializer.

Automatic format detection

On paste or file open, the tool tries to determine the source format itself: XML by a leading <, JSON by a leading {/[ with a valid parse, INI/TOML by [section] headers and comment or value patterns, YAML by --- or key: patterns. This is a heuristic — with ambiguous input (INI and TOML resemble each other) it can be wrong; then simply set the source format correctly by hand.

Limits and edge cases

Because everything runs through a generic object intermediate stage, some quirks are unavoidable — good to know before you puzzle over an output:

  • Comments are lost. Comments from YAML, TOML, or INI do not survive the conversion, because the intermediate object has no notion of comments.
  • XML → object: attributes land as keys with an @ prefix, mixed text content as #text. Repeated same-named elements become an array. Text-only elements are auto-typed to number or boolean where possible.
  • Object → XML: if the object has exactly one root key, that becomes the root element, otherwise the root is named root. Invalid element names are sanitized (disallowed characters become _). Arrays are emitted as repeated same-named elements. An XML declaration with encoding="UTF-8" is produced.
  • INI is flat. INI knows only one section level: root-level primitives come first (without a section header), sub-objects become [section] blocks, more deeply nested objects are flattened with dot notation (a.b.c), and arrays become comma-separated values. An array at the root level cannot be emitted as INI — the tool reports that INI requires an object at the top level.
  • INI typing on read: unquoted values are auto-typed (true/false/yes/no/on/off → boolean, numbers → number); ; and # start comments.
  • Errors are reported, not swallowed. Invalid input (broken JSON, invalid XML, …) produces an error message with a reason rather than a silently wrong output.

For sensible ordering and low-loss round-trips, see Tips & Tricks; concrete runs are in the Examples.