Url checking in java

This tutorial shows you how to use Java Platform, Standard Edition 8 (Java SE 8) and NetBeans 8 to create a link checker with the HTTPClient class.

Time to Complete

Introduction

  • The HTTPClient class helps build HTTP-aware client applications, such as web browsers and web service clients for distributed communication.
  • The URL class is a pointer to a resource on the web. A resource can be something as simple as a file or a directory, or it can be a reference to a more complicated object, such as a query to a database or to a search engine.
  • The HttpURLConnection class helps establish an HTTP connection between the HTTPClient and server.

Scenario

A testing team wants to verify and validate a given set of URLs.

Hardware and Software Requirements

Creating a Java Application

In this section, you create a Java application that you will use to demonstrate the HTTP link checker application.

  1. In NetBeans IDE 8.0, select New Project from the File menu. alt description here
  2. On the Choose Project page, perform the following steps:
    1. Select Java from Categories.
    2. Select Java Application from Projects.
    3. Click Next.

    View Image

  3. On the Name and Location page, perform the following steps:
    1. Enter LinkChecker as the project name.
    2. Select Create Main Class.
    3. Enter com.example. HTTPClient .
    4. Click Finish.

    View Image

The Java SE 8 LinkChecker project is created in NetBeans. You’re now ready to use the HTTPClient.java file to implement the link checker application.

Creating a Java enum Data Type

In this section, you create an enum data type to store the HTTP response code. An enum data type is a special data type that includes a set of predefined constants for a variable. The variable must be equal to one of the predefined values. You declare the HTTP response code and validate the URLs against them.

In this section, you use urlStatus, which has values like HTTP_OK(200, «OK», «SUCCESS»), NO_CONTENT(204, «No Content», «SUCCESS»), and INTERNAL_SERVER_ERROR(500, «Internal Server Error», «ERROR») .

    Create URLStatus.java and initialize it by using a constructor.

HTTP_OK(200, «OK», «SUCCESS»), NO_CONTENT(204, «No Content», «SUCCESS»),
MOVED_PERMANENTLY(301, «Moved Permanently», «SUCCESS»), NOT_MODIFIED(304, «Not modified», «SUCCESS»),
USE_PROXY(305, «Use Proxy», «SUCCESS»), INTERNAL_SERVER_ERROR(500, «Internal Server Error», «ERROR»),
NOT_FOUND(404, «Not Found», «ERROR»);

private int statusCode;
private String httpMessage;
private String result;
public int getStatusCode() return statusCode;
>
private URLStatus(int code, String message, String status) statusCode = code;
httpMessage = message;
result = status;
>

You defined the set of HTTP response code values as constants inside the URLStatus enum type. You initialize the declared enum values by using the constructor.

alt description here

  • Retrieve the HTTP status message. public static String getStatusMessageForStatusCode(int httpcode) String returnStatusMessage = «Status Not Defined»;
    for (URLStatus object : URLStatus.values()) if (object.statusCode == httpcode) returnStatusMessage = object.httpMessage;
    >
    >
    return returnStatusMessage;
    >
    The getStatusMessageForStatusCode() method receives httpcode as the input parameter. The httpcode parameter is verified across all defined enum values. For httpcode, if an enum is defined, then the HTTP message for that code is returned; otherwise , «Status Not Defined» is returned.
  • Retrieve the result of the URL. public static String getResultForStatusCode(int code) String returnResultMessage = «Result Not Defined»;
    for (URLStatus object : URLStatus.values()) if (object.statusCode == code) returnResultMessage = object.result;
    >
    >
    return returnResultMessage;
    >
    >
    The getResultForStatusCode () method receives code as the input parameter. The code parameter is verified across all defined enum values. For code, if an enum is defined, then the result for that code is returned; otherwise, » Result Not Defined » is returned.
  • Review the code. Your code should look like the following:
  • Note: Here is an explanation of some of the HTTP response codes:

    • 200, OK: The client request was received, understood, and processed successfully.
    • 301, Moved Permanently: The location was moved, and you’re directed to the new location.
    • 500, Internal Server Error: An error occurred during execution.

    Verifying and Validating URLs

    In this section, you verify and validate the URLs that are available in the url-list.txt fi le. You verify the URL for its correct format by using the verifyUrl method, and then you validate the verified URLs by using the validateUrl method to check for broken URLs. Add the url-list.txt file to the source package.

    Verifying the URLs

    In this section, you use the Java SE 8 regular expression to validate the URL format. The HTTPClient.java file has a verifyUrl method, which accepts the URL as the input parameter.

      Import the following packages:

    import java.net.HttpURLConnection;
    import java.net.URL;
    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    import java.util.List;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

    public class HTTPClient <
    private boolean verifyUrl(String url) String urlRegex = «^(http|https)://[-a-zA-Z0-9+&@#/%?=~_|. ;]*[-a-zA-Z0-9+@#/%=&_|]»;
    Pattern pattern = Pattern.compile(urlRegex);
    Matcher m = pattern.matcher(url);
    if (m.matches()) return true;
    > else return false;
    >
    >

    The verifyUrl method verifies the url parameter passed as the input parameter by matching it with the regular expression. If the match is successful, then it returns true; otherwise, it returns false.

    alt description here

  • Review the code. Your code should look like the following:
  • Validating the URLs

    In this section, you validate the URLs listed in the url-list.txt file.

    alt description here

    1. Modify HTTPClient.java.
      1. Declare the following variables:
        private String failedURLS =»»;
        private String succeededURLS =»»;
        private String incorrectURLS = «»;
      2. Retrieve the URLs from the url-list.txt file. public void validateUrl() throws Exception Path filePath = Paths.get(«src/url-list.txt»);
        List myURLArrayList = Files.readAllLines(filePath);
        You retrieve the file location by using the get method with the Paths class. Using the readAllLines method, you read the URLs in the filePath into the List of type string.
      3. Invoke the verifyUrl method. myURLArrayList.forEach((String url) ->if (verifyUrl(url)) <
        try < Here, you're using the For-Each loop, which you write with a lambda expression. The For-Each loop retrieves the URL from myURLArrayList, and the retrieved URL is passed as an input parameter to the verifyUrl method. The verifyUrl method returns true for a valid URL format; otherwise, it returns false. If the verifyUrl method returns true, then the if condition is true, thereby executing the code in the try block.
      4. Create the HttpURLConnection connection. URL myURL = new URL(url);
        HttpURLConnection myConnection = (HttpURLConnection) myURL.openConnection();
        You will open the myURL instance with the connection that you created in this step.
      5. Validate the URL with the response code. if (myConnection.getResponseCode()==URLStatus.HTTP_OK.getStatusCode()) succeededURLS = succeededURLS + «\n» + url + «****** Status message is : »
        + URLStatus.getStatusMessageForStatusCode(myConnection.getResponseCode());
        > else failedURLS = failedURLS + «\n» + url + «****** Status message is : »
        + URLStatus.getStatusMessageForStatusCode(myConnection.getResponseCode());
        >
        The myConnection instance receives the URL’s response code and verifies the status . If the status code is 200 ( HTTP_OK ), then the URL is classified as succeededURLS , otherwise, it’s classified as failedURLS .
      6. Close try with the catch block. > catch (Exception e) System.out.print(«For url- » + url+ «» +e.getMessage());
        > The catch block is executed when an exception is thrown when HttpURLConnection is created and opened.
      7. Verify the incorrect URLs. > else incorrectURLS += «\n» + url;
        >
        >);
        > The else block is executed when the verifyUrl method returns false because the URL validation failed.
    2. Review the code. Your code should look like the following:
    3. Add the following code to the main() method in the HTTPClient.java file: public static void main(String[] args)

      try HTTPClient myClient = new HTTPClient();
      myClient.validateUrl();
      System.out.println(«Valid URLS that have successfully connected :»);
      System.out.println(myClient.succeededURLS);
      System.out.println(«\n—————\n\n»);
      System.out.println(«Broken URLS that did not successfully connect :»);
      System.out.println(myClient.failedURLS);
      > catch (Exception e) System.out.print(e.getMessage()); >
      >
      > The main() method creates an instance of HTTPClient named myClient. Using the myClient instance, you invoke the validateUrl method. The myClient instance displays the valid URLs that connected, the broken URLs that did not connect, and the status code in the console.

    4. Review the code. Your code should look like the following: alt description here
    5. On the Projects tab, right-click HTTPClient.java and select Run File.alt description here
    6. Review the set of URLs displayed in the url-list.txt file. alt description here

      For the given set of URLs, the application retrieves each URL, verifies it, validates it, and classifies it accordingly.

      alt description here

      Verify the output.

      You successfully used the URLs listed in the url-list.txt file and classified them as valid URLs or broken URLs. The status codes of the broken URLs are displayed in the console.

      By running this application and adding URLs to the url-list.txt file, the testing team can use the link checker functionality to verify and validate URLs.

      Note: When you run this application on the Oracle network, some URLs may be blocked, and a connection timeout error is displayed.

      Summary

      In this tutorial, you learned how to create a Java SE project. You also learned how to use the URL and HttpURLConnection classes.

      Resources

      • To learn more about HttpURLConnection in Java, see Java SE docs: HttpURLConnection.
      • To learn more about URL in Java, see Java SE docs: URL.
      • To learn more about Java SE, refer to additional OBEs in the Oracle Learning Library.
      • For more information about HTTP status codes.

      Credits

      To navigate this Oracle by Example tutorial, note the following:

      Topic List: Click a topic to navigate to that section. Expand All Topics: Click the button to show or hide the details for the sections. By default, all topics are collapsed. Hide All Images: Click the button to show or hide the screenshots. By default, all images are displayed. Print: Click the button to print the content. The content that is currently displayed or hidden is printed.

      To navigate to a particular section in this tutorial, select the topic from the list.

      Источник

      Validate a URL in Java

      This post covers various methods to validate a URL in Java.

      1. Using Apache Commons Validator

      Apache Commons Validator package contains several standard validation routines. We can use UrlValidator class that provides URL validation by checking the scheme, authority, path, query, and fragment.

      Output:

      The URL https://www.techiedelight.com/ is valid

      We can also specify the valid schemes to be used in validating in addition to or instead of the default values (HTTP, HTTPS, FTP).

      We can also change the UrlValidator parsing rules by specifying any of the following instructions to the Validator.

      1. ALLOW_2_SLASHES option allow two slashes in the path component of the URL.
      2. ALLOW_ALL_SCHEMES option allows all validly formatted schemes to pass validation instead of supplying a set of valid schemes.
      3. ALLOW_LOCAL_URLS option allow local URLs, such as http://localhost/.
      4. NO_FRAGMENTS option disallows any URL fragments.

      The following program demonstrates it:

      2. Using Regular Expression provided by OWASP

      We can also use OWASP Validation Regex, which is considered to be very safe. We can use the following regular expression to check for a valid URL.

      ESAPI validation routine can also be used which uses the following regular expression.

      Output:

      The URL https://www.techiedelight.com/ is valid

      3. Using java.net.URL

      We can also java.net.URL class to validate a URL. The idea is to create a URL object from the specified string representation. A MalformedURLException will be thrown if no protocol is specified, or an unknown protocol is found, or spec is null. Then we call the toURI() method that throws a URISyntaxException if the URL is not formatted strictly according to RFC 2396 and cannot be converted to a URI.

      Источник

      Читайте также:  Cs machinery css 300
    Оцените статью