- Saved searches
- Use saved searches to filter your results more quickly
- License
- dnaumenko/java-diff-utils
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Нахождение разницы между двумя строками в Java
- 2. Проблема
- 3. diff-match-patch
- 4. Строковые утилиты
- 5. Производительность
- 6. Заключение
- Saved searches
- Use saved searches to filter your results more quickly
- License
- java-diff-utils/java-diff-utils
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Library for performing the comparison operations between texts
License
dnaumenko/java-diff-utils
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Diff Utils library is an OpenSource library for performing the comparison operations between texts: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.
Main reason to build this library was the lack of easy-to-use libraries with all the usual stuff you need while working with diff files. Originally it was inspired by JRCS library and it’s nice design of diff module.
- computing the difference between two texts.
- capable to hand more than plain ASCII. Arrays or List of any type that implements hashCode() and equals() correctly can be subject to differencing using this library
- patch and unpatch the text with the given patch
- parsing the unified diff format
- producing human-readable differences
This library implements Myer’s diff algorithm. But it can easily replaced by any other which is better for handing your texts. I have plan to add implementation of some in future.
- JDK 1.5 compatibility
- Ant build script
- Generate output in unified diff format (thanks for Bill James)
Just add the code below to your maven dependencies:
dependency> groupId>com.googlecode.java-diff-utilsgroupId> artifactId>diffutilsartifactId> version>1.3.0version> dependency>
dependency org="com.googlecode.java-diff-utils" name="diffutils" rev="1.3.0"/>
- support for inline diffs in output
- helpers for showing side-by-side, line-by-line diffs or text with inter-line and intra-line change highlights
- customization of diff algorithm for better experience while computing diffs between strings (ignoring blank lines or spaces, etc)
- generating output in other formats (not only unified). E.g. CVS.
This work is licensed under The Apache Software License, Version 1.1. Reason: The code contains work of HP, which contributed it under Apache-1.1. [Example code). It was easier to change the license to Apache-1.1 than to contact HP Legal for a code created in 2003 at HP Bristol.
About
Library for performing the comparison operations between texts
Нахождение разницы между двумя строками в Java
В этом кратком руководстве показано, как найти разницу между двумя строками с помощью Java.
В этом уроке мы будем использовать две существующие библиотеки Java и сравним их подходы к решению этой проблемы.
2. Проблема
Рассмотрим следующее требование: мы хотим найти разницу между строками « ABCDELMN» и «ABCFGLMN».
В зависимости от того, в каком формате нам нужен вывод, и игнорируя возможность написать собственный код для этого, мы нашли два основных доступных варианта.
Первая — это написанная Google библиотека под названием diff-match-patch . Как они утверждают, библиотека предлагает надежные алгоритмы синхронизации простого текста .
Другой вариант — класс StringUtils из Apache Commons Lang.
Давайте рассмотрим различия между этими двумя.
3. diff-match-patch
Для целей этой статьи мы будем использовать форк оригинальной библиотеки Google , так как артефакты для оригинальной не выпускаются на Maven Central. Кроме того, некоторые имена классов отличаются от исходной кодовой базы и больше соответствуют стандартам Java.
Во-первых, нам нужно включить его зависимость в наш файл pom.xml :
dependency> groupId>org.bitbucket.cowwocgroupId> artifactId>diff-match-patchartifactId> version>1.2version> dependency>
Затем рассмотрим этот код:
String text1 = "ABCDELMN"; String text2 = "ABCFGLMN"; DiffMatchPatch dmp = new DiffMatchPatch(); LinkedListDiff> diff = dmp.diffMain(text1, text2, false);
Если мы запустим приведенный выше код, который создает разницу между текстом1 и текстом2 , печать переменной diff приведет к следующему результату:
[Diff(EQUAL,"ABC"), Diff(DELETE,"DE"), Diff(INSERT,"FG"), Diff(EQUAL,"LMN")]
На самом деле на выходе будет список объектов Diff , каждый из которых формируется типом операции ( INSERT , DELETE или EQUAL ), и частью текста, связанной с операцией .
При запуске diff между text2 и text1 мы получим такой результат:
[Diff(EQUAL,"ABC"), Diff(DELETE,"FG"), Diff(INSERT,"DE"), Diff(EQUAL,"LMN")]
4. Строковые утилиты
Класс от Apache Commons имеет более упрощенный подход .
Во- первых, мы добавим зависимость Apache Commons Lang в наш файл pom.xml :
dependency> groupId>org.apache.commonsgroupId> artifactId>commons-lang3artifactId> version>3.12.0version> dependency>
Затем, чтобы найти разницу между двумя текстами с помощью Apache Commons, мы вызываем StringUtils#Difference :
StringUtils.difference(text1, text2)
Результатом будет простая строка :
В то время как запуск diff между text2 и text1 вернет:
Этот простой подход можно улучшить с помощью StringUtils.indexOfDifference() , который вернет индекс, с которого две строки начинают различаться (в нашем случае это четвертый символ строки). Этот индекс можно использовать для получения подстроки исходной строки , чтобы показать, что общего между двумя входными данными , в дополнение к тому, что отличается.
5. Производительность
Для наших тестов мы генерируем список из 10 000 строк с фиксированной частью из 10 символов , за которыми следуют 20 случайных буквенных символов .
Затем мы перебираем список и выполняем сравнение между n -м элементом и n+1 -м элементом списка:
@Benchmark public int diffMatchPatch() for (int i = 0; i inputs.size() - 1; i++) diffMatchPatch.diffMain(inputs.get(i), inputs.get(i + 1), false); > return inputs.size(); >
@Benchmark public int stringUtils() for (int i = 0; i inputs.size() - 1; i++) StringUtils.difference(inputs.get(i), inputs.get(i + 1)); > return inputs.size(); >
Наконец, давайте запустим тесты и сравним две библиотеки:
Benchmark Mode Cnt Score Error Units StringDiffBenchmarkUnitTest.diffMatchPatch avgt 50 130.559 ± 1.501 ms/op StringDiffBenchmarkUnitTest.stringUtils avgt 50 0.211 ± 0.003 ms/op
6. Заключение
С точки зрения чистой скорости выполнения StringUtils явно производительнее , хотя и возвращает только ту подстроку, с которой начинают различаться две строки.
В то же время Diff-Match-Patch обеспечивает более тщательный результат сравнения , за счет производительности.
Реализация этих примеров и фрагментов доступна на GitHub .
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Diff Utils library is an OpenSource library for performing the comparison / diff operations between texts or some kind of data: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.
License
java-diff-utils/java-diff-utils
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Diff Utils library is an OpenSource library for performing the comparison operations between texts: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.
Main reason to build this library was the lack of easy-to-use libraries with all the usual stuff you need while working with diff files. Originally it was inspired by JRCS library and it’s nice design of diff module.
This is originally a fork of java-diff-utils from Google Code Archive.
Javadocs of the actual release version: JavaDocs java-diff-utils
Look here to find more helpful informations and examples.
These two outputs are generated using this java-diff-utils. The source code can also be found at the Examples page:
Producing a one liner including all difference information.
//create a configured DiffRowGenerator DiffRowGenerator generator = DiffRowGenerator.create() .showInlineDiffs(true) .mergeOriginalRevised(true) .inlineDiffByWord(true) .oldTag(f -> "~") //introduce markdown style for strikethrough .newTag(f -> "**") //introduce markdown style for bold .build(); //compute the differences for two test texts. ListDiffRow> rows = generator.generateDiffRows( Arrays.asList("This is a test senctence."), Arrays.asList("This is a test for diffutils.")); System.out.println(rows.get(0).getOldLine());
This is a test senctence for diffutils.
Producing a side by side view of computed differences.
DiffRowGenerator generator = DiffRowGenerator.create() .showInlineDiffs(true) .inlineDiffByWord(true) .oldTag(f -> "~") .newTag(f -> "**") .build(); ListDiffRow> rows = generator.generateDiffRows( Arrays.asList("This is a test senctence.", "This is the second line.", "And here is the finish."), Arrays.asList("This is a test for diffutils.", "This is the second line.")); System.out.println("|original|new|"); System.out.println("|--------|---|"); for (DiffRow row : rows) < System.out.println("|" + row.getOldLine() + "|" + row.getNewLine() + "|"); >
original | new |
---|---|
This is a test senctence . | This is a test for diffutils. |
This is the second line. | This is the second line. |
And here is the finish. |
- computing the difference between two texts.
- capable to hand more than plain ascii. Arrays or List of any type that implements hashCode() and equals() correctly can be subject to differencing using this library
- patch and unpatch the text with the given patch
- parsing the unified diff format
- producing human-readable differences
- inline difference construction
- Algorithms:
- Meyers Standard Algorithm
- Meyers with linear space improvement
- HistogramDiff using JGit Library
But it can easily replaced by any other which is better for handing your texts. I have plan to add implementation of some in future.
Recently a checkstyle process was integrated into the build process. java-diff-utils follows the sun java format convention. There are no TABs allowed. Use spaces.
public static T> PatchT> diff(ListT> original, ListT> revised, BiPredicateT, T> equalizer) throws DiffException < if (equalizer != null) < return DiffUtils.diff(original, revised, new MyersDiff<>(equalizer)); > return DiffUtils.diff(original, revised, new MyersDiff<>()); >
This is a valid piece of source code:
- blocks without braces are not allowed
- after control statements (if, while, for) a whitespace is expected
- the opening brace should be in the same line as the control statement
Just add the code below to your maven dependencies:
dependency> groupId>io.github.java-diff-utilsgroupId> artifactId>java-diff-utilsartifactId> version>4.12version> dependency>
// https://mvnrepository.com/artifact/io.github.java-diff-utils/java-diff-utils implementation "io.github.java-diff-utils:java-diff-utils:4.12"
About
Diff Utils library is an OpenSource library for performing the comparison / diff operations between texts or some kind of data: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.