First there was the encoding_sampler Gem
EncodingSampler helps solve the problem of what to do when the character encoding is unknown, for example when a user is uploading a file but has no idea of its encoding (or typically, even what “character encoding” means.) EncodingSampler extracts a concise set of samples from the selected file for display so the user can choose wisely.
If you deal with client data files, you know the problems… the client’s eye’s glaze over when you ask about character encoding. And the encoding problems may be sparse; If there’s one line with an “®” character 2MB down in the file, you’re just not going to find it by inspection. Encoding_sampler does the heavy lifting.
But… while getting the encoding right is critical, we have found that building an encoding detection tool into our general data import applications is overkill. It complicates the user interface and confuses our clients.
So… the encoding-inspector web service
My solution was to create a separate web application to determine a text file’s encoding:
encoding-inspector, at http://encoding-inspector.triresources.com
The inspiration comes from common tools like the W3C Validations Service, and this online JSON parser. Encoding-inspector is essentially a front-end to the encoding_sampler Gem. Here’s a screen shot of a typical result:
Results provide simple visual samples so non-programmer/non-tech users can understand the options and choose the right encoding. In this example, you can see that UTF-8 is the correct encoding.
encoding-inspector is built and maintained by Roll No Rocks and it’s hosted by TRI Resources International. It’s our hope that people find this resource useful, and that we can continue to provide it as a free service.