Test if json python

Determine if a file is either in JSON or XML format

The purpose of this function is to determine if report_input_file is either in JSON format or XML format, this is what I came up with, just want to know if this is best practice / pythonic way of achieving this?

def parse_report_file(self, report_input_file): """ Checks if input report file is of type json or xml """ parsed_report_file = None try: parsed_report_file = self._parse_json_file(report_input_file) self.is_json_report_file = True except Exception, e: parsed_report_file = self._parse_xml_file(report_input_file) self.is_json_report_file = False return parsed_report_file 

\$\begingroup\$ As far as I know, it’s enough to check if(s.startswith(‘<') where s is the first in your file. An XML document entity (in common parlance, an XML document) must start with < . \$\endgroup\$

\$\begingroup\$ @Dex’ter that’s a good point, but I will I guess always need the try-catch for cases where the report_input_file is on neither of these two formats. \$\endgroup\$

\$\begingroup\$ @JonB is the purpose to check if they’re intended to be JSON/XML, or if they’re valid JSON/XML? The two have to potential to diverge wildly. I also imagine that you’d be able to identify, for at least some files, that they appear to be invalid JSON as opposed to invalid XML, but that might be more than what you want. \$\endgroup\$

4 Answers 4

Well, if you only need to distinguish between JSON and XML formats I’d keep it simple. If the encoding of that string is known to you (or is ASCII or UTF), then looking at the very first char in the file should be enough. Your code looks pythonic to me, but I’d rather do it this way:

def parse_report_file(report_input_file): with open(report_input_file) as unknown_file: c = unknown_file.read(1) if c != ' 

While it is legal for JSON data structures to begin with null , true , false you can avoid those situations if you already know a bit about your data structures.

If you also need to distinguish between JSON , XML and other random file, you should simply strip any whitespace before you look at the "first" char.

\$\begingroup\$ @Dancrumb But if the invalid XML is parsed as JSON, it will still be flagged as invalid. \$\endgroup\$

\$\begingroup\$ @Dancrumb An invalid XML that is identified as JSON isn't a big deal though. Invalid is invalid and will get rejected when parsing the JSON. If there's a problem with this, it's that it ignores whitespace, and I think that's a pretty minor problem that would be beyond the scope of this review anyway. \$\endgroup\$

\$\begingroup\$ Right, but the JSON parser's error message will refer to the invalidity of the file wrt the JSON spec, not the XML spec. This makes things harder to debug for users of this function. \$\endgroup\$

\$\begingroup\$ It's not true that "invalid is invalid". Parsers report the reason for the invalidity. Applying a parser implies a set of expectations around the format of the file and allows the parser to report when those expectations were missed. If you apply the wrong parser to a file, you'll get reports about irrelevant expectations, making debugging unnecessarily harder. \$\endgroup\$

Go the way you did already. Taking a cheaper approach like testing the first (non whitespace) character might work on input that always succeeds on parsing. But as soon as you encounter an invalid file you're stuck. You probably won't get the correct error messages.

Implementing a parser for XML and / or JSON with regex only will not work. You will have several edge cases where you achieve false positives or false negatives (e.g. XML recognized as JSON and vice versa). You're parser won't be as efficient as the libraries you're using to handle these formats. You will achieve less performance AND more bugs. Just don't do it!

About the python style guide thing I can't judge as I'm no python guru.

\$\begingroup\$ Listen to this advice, don't Implement your own xml/html/json parser in regex unless you are a masochist. \$\endgroup\$

As Dexter said, you could keep it simple and just look at the first character. The flaw being noted with that is it might return "Is JSON" if it's malformed XML and vice-versa, so if you wanted, you could return 1 of three options. So to expand a bit and give you something to build upon. You could use regular expressions to match the first and last characters (and build from here for anything in between).

import 're' def parse_report_file(report_input_file): with open(report_input_file, 'r') as unknown_file: # Remove tabs, spaces, and new lines when reading data = re.sub(r'\s+', '', unknown_file.read()) if (re.match(r'^<.+>$', data)): return 'Is XML' if (re.match(r'^(<|[).+(>|])$', data)): return 'Is JSON' return 'Is INVALID' 

The second regex isn't matching the same opening braces with closing braces, so you could do that there. You could also read the entire file in without removing the spaces and then ignore any number of beginning and ending spaces when doing additional regex matches, but I did it this way for a little more readability.

Anyway, just another example. Definitely not the most performant but showing you some options.

\$\begingroup\$ Regex are not for parsing. Going the way you suggest just leads to the same problems which the "simpler" or at least more light weight suggestion suffers from. Reimplemeting the whole parser for JSON is one thing. But XML? This just leads to awful maintainability. And your performance won't be as good as you think at all. There is a reason why pre-built, highly optimized parsing libraries do exist at all. \$\endgroup\$

\$\begingroup\$ I commented that the performance would not be good, but that I was showing a regex option. And though the name of the function begins with parse, I'm not sure the op's question was to completely determine whether the entire file was valid. I'm not going to write an entire parser in my answer, but showing the possibility to match some things with regex. I already noted in the answer both things you bring up in the comment. \$\endgroup\$

\$\begingroup\$ I just wanted to express another clear warning to not go that path as it will lead to the problems you already stated. Not having a full parser is (in my opinion) a bug. You don't know when it fails. But when it does you're f***ed. I didn't want to offend you! \$\endgroup\$

\$\begingroup\$ Not offended at all. I just think that the op is not looking for a full parsing solution, as judging by the other answers/comments, it will fail up if invalid. \$\endgroup\$

I'll assume that you want to actually check these are valid xml or json documents (or neither).

I see no issue with your method structure, apart from catching Exception which is probably too general. You either want to be catching something more specific like ValueError or custom error for your types.

I'm not a fan of the other suggestions, just checking the first character in the file, nor using regex to strip whitespace then look for a certain variety of brackets. There are good modules in the standard library available that will parse JSON or XML, and do the work for you, for example json and ElementTree.

Here's a python 2 / 3 compatible version of how you could fit those modules into your existing class.

I've done it fairly simply. Some people prefer not to do IO or slow things in the object __init__ but if your class is really an abstract representation of a json/xml document then it is the perfect situation to use @property for getter-only attributes and set them in the parse_report_file method.

To take this further of course it needs some unit tests, but I've done a quick test by adding some things to the entry point. It's expecting 3 files, here's the contents if you want to try.

This file is really empty but it could just be anything that is not valid xml or json. 
from __future__ import print_function import xml.etree.cElementTree as ET import json class ReportDocument(object): """ ReportDocument is an abstraction of a JSON, XML or neither file type. """ def __init__(self, report_input_file=None, *args, **kwargs): self._is_json_report_file=False self._is_xml_report_file=False self._parsed_report_file=None if report_input_file is not None: self.parse_report_file(report_input_file) @property def is_json_report_file(self): return self._is_json_report_file @property def is_xml_report_file(self): return self._is_xml_report_file def _parse_json_file(self,report_input_file): """ Given a text file, returns a JSON object or None """ with open(report_input_file,'rb') as in_file: return json.load(in_file) def _parse_xml_file(self,report_input_file): """ Given a text file, returns an XML element or None """ with open(report_input_file,'rb') as in_file: return ET.parse(in_file) def parse_report_file(self, report_input_file): """ Checks if input report file is of type json or xml returns a json object or an xml element or None """ try: self._parsed_report_file = self._parse_xml_file(report_input_file) self._is_xml_report_file = True except ET.ParseError: try: self._parsed_report_file = self._parse_json_file(report_input_file) self._is_json_report_file = True except ValueError: # or in Python3 the specific error json.decoder.JSONDecodeError pass if __name__ == '__main__': files_to_test = ['real.xml', 'real.json', 'blank.txt'] for file_type in files_to_test: test_class = ReportDocument(file_type) print(": \n\t is_json: \n\t is_xml:".format( file_type=file_type, is_json_report_file=test_class.is_json_report_file, is_xml_report_file = test_class.is_xml_report_file) ) 

Источник

How to validate JSON data in Python

In this post, we are going to learn how to validate JSON data in Python. We first check if a given string is valid JSON as per standard convention format. In Python that can be achieved by using JSON module built-in methods loads() which checks if a string is valid JSON or load() to validate the contents of the file are valid JSON. The JSON. Tool module is also used to validate a file by the command line. We will understand each of them will python code examples.

Methods to check string or file is valid JSON

  • json.loads() : check if a string is valid json
  • json.load() : This method to validate a string from a file.

Note : The json.loads() or json.load() method throw valuerror,if given string is not valid json

1. Python Check if a string is valid json

To check if the string is valid JSON in Python, We will use the JSON module that is imported using “import JSON“, First need to make sure the JSON module is installed or we can install it using “pip install jsonlib-python3”.

  • Inside the try/exception block, we have used JSON.loads() method to parses the given string.
  • if the string is valid JSON then it parses the JSON.s
  • When a string is not valid JSON then it raises ValueError, and the exception block will execute.
import json json_string = '< "Empname":"John", "age":30, "Salary":10000,"Dept":"IT">' try: json = json.loads(json_string) print('valid json string') except ValueError as error: print("invalid json string:\n",error)

3. Python validate json file command line

To validate a file content is valid JSON instead of directly parsing it. we can check it before parsing by using the command line with the help of the following JSON.tool module command “python -m json.tool filename”.

C:\Users\Admin\Desktop\Python_program>python -m json.tool data.json

Summary

In this post, we have learned How to validate JSON data in Python by using JSON and JSOm.tool modules. We have checked if a string is valid JSON, Python validates JSON file, Python validates JSON file command line.

Источник

Читайте также:  TinyMCE PHP Editor
Оцените статью