Java source code encoding

Unicode Tutorials — Herong’s Tutorial Examples — v5.31, by Herong Yang

String Literals and Source Code Encoding

This section provides tutorial example on how to represent non-ASCII characters in UTF-8 encoding byte sequences as part of String literals in the Java source code.

In previous tutorials, we have learned how to represent non-ASCII characters in \uXXXX escape sequences as part of String literals in Java source code.

In this tutorial, we will learn how to represent non-ASCII characters in UTF-8 encoding byte sequences as part of String literals in Java source code.

Here is our test string that contains 2 Non-ASCII characters:

Delicious food U+1F60B takes time U+23F3 Where: U+1F60B: FACE SAVOURING DELICIOUS FOOD U+23F3: HOURGLASS WITH FLOWING SAND

Our test string should be displayed like this, if you have the correct Unicode font installed on your computer.

Delicious food takes time takes time

In our first test program, we will continue to use \uXXXX sequences in our source code. Note that U+1F60B character needs to be encoded as a surrogate pair of \uD83D\uDE0B based on the UTF-16 encoding rule.

/* UnicodeStringLiterals.java * Copyright (c) 2019 HerongYang.com. All Rights Reserved. */ class UnicodeStringLiterals < public static void main(String[] arg) < try < String str = "Delicious food \uD83D\uDE0B takes time \u23F3"; System.out.print("\ncodePointCount(): " +str.codePointCount(0,str.length())); System.out.print("\n length(): " +str.length()); System.out.print("\n String dump: "); printString(str); >catch (Exception e) < System.out.print("\n"+e.toString()); >> public static void printString(String s) < char[] chars = s.toCharArray(); for (char c : chars) < byte hi = (byte) (c >>> 8); byte lo = (byte) (c & 0xff); System.out.print(String.format("%02X%02X ", hi, lo)); > > >

Compile and run it with Java 11:

C:\herong>javac UnicodeStringLiterals.java C:\herong>java UnicodeStringLiterals codePointCount(): 29 length(): 30 String dump: 0044 0065 006C 0069 0063 0069 006F 0075 0073 0020 0066 006F 006F 0064 0020 D83D DE0B 0020 0074 0061 006B 0065 0073 0020 0074 0069 006D 0065 0020 23F3

In our second test program, we will continue to use UTF-8 encoding byte sequences in our source code. This program is definitely better than the first program, because you can actually see non-ASCII characters displayed in the source code.

/* UnicodeStringLiteralsUTF8.java * Copyright (c) 2019 HerongYang.com. All Rights Reserved. */ import java.io.*; class UnicodeStringLiteralsUTF8 < public static void main(String[] arg) < try < String str font" >😋 takes time ⏳"; System.out.print("\ncodePointCount(): " +str.codePointCount(0,str.length())); System.out.print("\n length(): " +str.length()); System.out.print("\n String dump: "); printString(str); > catch (Exception e) < System.out.print("\n"+e.toString()); >> public static void printString(String s) < char[] chars = s.toCharArray(); for (char c : chars) < byte hi = (byte) (c >>> 8); byte lo = (byte) (c & 0xff); System.out.print(String.format("%02X%02X ", hi, lo)); > > >

This time, we need to make sure that UnicodeStringLiteralsUTF8.java is saved as a UTF-8 encoding file and compile with the «-encoding UTF8» option:

C:\herong>javac -encoding UTF8 UnicodeStringLiteralsUTF8.java C:\herong>java UnicodeStringLiteralsUTF8 codePointCount(): 29 length(): 30 String dump: 0044 0065 006C 0069 0063 0069 006F 0075 0073 0020 0066 006F 006F 0064 0020 D83D DE0B 0020 0074 0061 006B 0065 0073 0020 0074 0069 006D 0065 0020 23F3

The output is identical to the first program. This proves that we have properly represented non-ASCII characters in UTF-8 encoding byte sequences as part of String literals in the Java source code.

Источник

Transform Java source code to UTF-8 Encoding

The reason behind your whole-file-diff is probably due to this variation, which can be resolved in two ways. Either you can make Git use a specific line ending regardless of the editor used, or you can ensure all editors and IDEs are configured to use the same line endings. Git allows configuring the LF handling locally by adjusting the settings in the config.

How to force Javapoet to create UTF-8 Java source code?

By examining Javapoet’s Github-Page, one can employ the approach of using the Method.

JavaFile.writeTo(PrintStream)

It is possible to generate a PrintStream utilizing UTF-8 and save the document in the following manner:

PrintStream stream = new PrintStream("YourTargetFile.java", "UTF-8"); yourJavaFileObject.writeTo(stream);

Convert «Java source code characters» in JSON string using PHP, You need to tell the web browser what encoding you are giving it.

Convert «Java source code characters» in JSON string using PHP

It is necessary to specify the encoding provided to the web browser.

In case you’re utilizing PHP 5.4, you can use the options function of json_encode() in the following manner:

echo $b=json_encode('Dalé',JSON_UNESCAPED_UNICODE); echo json_decode($b);

Change encoding to utf-8 java Code Example, String charset = «ISO-8859-1»; // or what corresponds BufferedReader in = new BufferedReader( new InputStreamReader (new FileInputStream(file), charset));

Convert from C/C++/Java source code to unicode with python 3

I use raw data which requires transformation. The data includes URIs, some of which have characters found in C/C++/Java source code. For instance, a string like «\u03A5» needs to be converted to an Upsilon symbol, as demonstrated on this website: https://www.fileformat.info/info/unicode/char/03a5/index.htm. Thank you for your assistance and warm regards.

Java encode string to utf 8 Code Example, String charset = «ISO-8859-1»; // or what corresponds BufferedReader in = new BufferedReader( new InputStreamReader (new FileInputStream(file), charset));

Encoding of a java source file

Encoding differences are not typically the cause of newline differences, as the issue is more nuanced.

On Windows, a file encoded in UTF-8 may have newlines represented as (or CRLF), while on a Unix-like OS, a UTF-8 encoded file may have newlines represented by (or just LF).

The variation is possibly the reason behind the entire-file comparison, and there are multiple approaches to resolve it.

Either make git utilize LF consistently, irrespective of the editors’ input, or.
Set up uniform line endings for all editors, based on the specific editors or IDEs in use.

Typically, one or more of the subsequent factors account for these disparities.

Source encoding

In Java, the source file lacks a defined encoding, requiring consensus among team members. Typically, UTF-8 is the preferred option with no compelling reason to select an alternative. Should your editor only support system encoding, an alternative editor should be chosen.

Line endings

To handle LF in a local configuration, Git allows you to set the core.autocrlf config. However, I advise against using it. Instead, opt for a project-wide configuration and put a .gitattributes file in the root of your project.

# Set the default behavior, in case people don't have core.autocrlf set. * text=auto # Explicitly declare text files you want to always be normalized and converted # to native line endings on checkout. *.java text # Declare files that will always have Unix LF line endings on checkout. *.sh text eol=lf Dockerfile text eol=lf

Indentation

Improper usage of tabs and whitespaces can lead to a chaotic commit history. While most Integrated Development Environments perform automatic formatting or pretty printing during each save action, there are variations in how they execute this function. To avoid confusion, it is advisable to establish a standard source formatting that is agreed upon by all team members, although it may require extensive discussions.

Here’s a compelling argument for those who can’t come to a consensus on using tabs or spaces — developers who opt for spaces tend to earn higher salaries than those who choose tabs.

Java source code to utf-8 Code Example, Queries related to “java source code to utf-8” · utf-8 encoding · utf-8 codec can’t decode byte · encode utf-8 · java utf-8 · encode string to utf-8 · convert file to

Источник