Python sql from greenplum

how to play plython3 with gpdb6

We recently released GreenplumPython, a Python library that allows users to interact with Greenplum or PostgreSQL in a Pythonic way. GreenplumPython provides a pandas-like table API that is familiar and intuitive to Python users. GreenplumPython is making it powerful for performing complex analyses such as statistical analysis with UDFs and UDAs. It encapsulates common best practices and avoids common pitfalls in Greenplum compared to writing SQL directly.

However, for most users still using gpdb6 and plpython2, they will need to install plpython3. This article describes how to install plpython3 on gpdb6.

For commercial Greenplum users, plpython3 is shipped with the Greenplum Database package since 6.22.0 . Please refer to PL/Python Language on how to use it.

Prerequisites

Greenplum Database (GPDB) is an open-source data warehousing and analytics platform based on PostgreSQL. It supports several procedural languages, including SQL, C, Perl, and Python, among others. It uses Python2 by default for the Python procedural language.

Follow this section to build and install plpython for your Greenplum server.

Before we start, make sure you have the following software components installed on your system:

  • GPDB6 source code: You can download and install the latest gpdb6 version of using git clone https://github.com/greenplum-db/gpdb.git then git checkout 6X_STABLE ;
  • Python3: You should have Python 3 installed on your system and make sure it’s accessible from the PATH environment variable. Using python3 —version or python —version to check python version to make sure that python version higher than 3.9;
    Since there may be not have default python3.9 , we need to install python3.9 using the following steps.
sudo yum install gcc openssl-devel bzip2-devel libffi-devel zlib-devel # requirements wget https://www.python.org/ftp/python/3.9.13/Python-3.9.13.tgz tar -xvf Python-3.9.13.tgz cd Python-3.9.13 ./configure --enable-optimizations make && make install python3.9 --version 
sudo apt update sudo apt install software-properties-common sudo add-apt-repository ppa:deadsnakes/ppa sudo apt install python3.9-dev python3.9 --version 
  • You may need to install python-devel in your env;
  • GCC and Make: You will need to have the GCC compiler and the Make utility installed in order to compile the PL/Python3 source.
Читайте также:  Строка комментарий в питоне

Build and install PL/Python3 with Greenplum Database 6

PYTHON=/path/to/python3 ./configure --with-perl --with-python --with-libxml --prefix=gpdb6_path 

Install PL/Python3 to an existing Greenplum Database 6 cluster

# gpdb project should be configured with the same --prefix param. cd gpdb_src/pl/plpython source path_to_installed_greenplum/greenplum_path.sh PYTHON=/path/to/python3 make -j8 && make install 

Testing

Once you have compiled and installed PL/Python3, please follow these steps to use it in GPDB:

Connect to GPDB: Connect to GPDB as a superuser and navigate to the database where you want to install PL/Python3.

psql postgres CREATE LANGUAGE plpython3u; 

Then validate the installation: You can run a simple test to validate the installation of PL/Python3 by creating a function that returns the current Python version. For example:

psql postgres CREATE FUNCTION test_plpython3_version() RETURNS text AS $$ import sys return str(sys.version_info) $$ language plpython3u; select test_plpython3_version(); 

This should return the current Python version used by PL/Python3, and you have successfully installed and tested PL/Python3 with Greenplum Database 6.

Frequently Asked Questions

Q: ERROR: could not access file “$libdir/plpython3”: No such file or directory
A: Use the ‘–prefiix’ arguments in the ‘.configure’ command line, it specifies where ‘make install’ copies files to plpython3u requires following files to be installed in the $GPHOME

├── lib │ └── postgresql │ └── plpython3.so └── share └── postgresql └── extension ├── plpython3u--1.0.sql ├── plpython3u.control └── plpython3u--unpackaged--1.0.sql 

Q: Error: could not load library “/usr/local/greenplum-db-6.23.0/lib/postgresql/plpython3.so”: libpython3.9.so.1.0: cannot open shared object file: No such file or directory (dfmgr.c:240)
A: please find where libpython3.9.so locate then

export LD_LIBRARY_PATH="path_to_libpython3.9.so:$LD_LIBRARY_PATH" export PATH="path_to_python3.9/bin:$PATH" 

Q: I use pip installed some packages like numpy but in plpython3u it raise this error

postgres=# CREATE OR REPLACE FUNCTION test_import() RETURNS text AS $$ import sys import numpy return str(numpy.__file__) $$ language plpython3u; select test_import(); ERROR: ModuleNotFoundError: No module named 'numpy' (plpy_elog.c:121) CONTEXT: Traceback (most recent call last): PL/Python function "test_import", line 3, in module> import numpy PL/Python function "test_import" 

A: You can use GUC to set custom location then ensure that you can import the package

python -m pip install --prefix=/home/gpadmin/my_python dill $ psql -d testdb testdb=# load 'plpython3'; testdb=# SET plpython3.python_path='/home/gpadmin/my_python'; 

Ensure that you configure plpython3.python_path before you create or call plpython3 functions in a session. If you set or change the parameter after plpython3u is initialized you receive the error:

ERROR: SET PYTHONPATH failed, the GUC value can only be changed before initializing the python interpreter. 

To set a default value for the configuration parameter, use gpconfig instead:

gpconfig -c plpython3.python_path \ -v "'/home/gpadmin/my_python'" \ --skipvalidation gpstop -u 

Q: Can I use a different version of Python with PL/Python3 both python2 and python3?
A: Yes, plpython2 and plpython3 can co-exist, but they cannot be used in the same session.

Читайте также:  Css стиль для textarea

Conclusion

In this article, we have shown you how to compile and install the PL/Python3 language extension for Greenplum Database 6. PL/Python3 is a powerful procedural language that allows you to write complex analytics functions in Python and execute them within the Greenplum Database, with plpython3 installed you can use GreenplumPython now.

If you encounter any other problems, you post a problem on Greenplumpython and we will get back to you as soon as possible.

Источник

Use SQLAlchemy ORMs to Access Greenplum Data in Python

Download a free trial: Download Now Learn more: Greenplum Python Connector Python Connector Libraries for Greenplum Data Connectivity. Integrate Greenplum with popular Python tools like Pandas, SQLAlchemy, Dash & petl.

The CData Python Connector for Greenplum enables you to create Python applications and scripts that use SQLAlchemy Object-Relational Mappings of Greenplum data.

The rich ecosystem of Python modules lets you get to work quickly and integrate your systems effectively. With the CData Python Connector for Greenplum and the SQLAlchemy toolkit, you can build Greenplum-connected Python applications and scripts. This article shows how to use SQLAlchemy to connect to Greenplum data to query, update, delete, and insert Greenplum data.

With built-in optimized data processing, the CData Python Connector offers unmatched performance for interacting with live Greenplum data in Python. When you issue complex SQL queries from Greenplum, the CData Connector pushes supported SQL operations, like filters and aggregations, directly to Greenplum and utilizes the embedded SQL engine to process unsupported operations client-side (often SQL functions and JOIN operations).

Connecting to Greenplum Data

Connecting to Greenplum data looks just like connecting to any relational data source. Create a connection string using the required connection properties. For this article, you will pass the connection string as a parameter to the create_engine function.

To connect to Greenplum, set the Server, Port (the default port is 5432), and Database connection properties and set the User and Password you wish to use to authenticate to the server. If the Database property is not specified, the default database for the authenticate user is used.

Follow the procedure below to install SQLAlchemy and start accessing Greenplum through Python objects.

Install Required Modules

Use the pip utility to install the SQLAlchemy toolkit:

Be sure to import the module with the following:

Model Greenplum Data in Python

You can now connect with a connection string. Use the create_engine function to create an Engine for working with Greenplum data.

Читайте также:  Хаки

NOTE: Users should URL encode the any connection string properties that include special characters. For more information, refer to the SQL Alchemy documentation.

Declare a Mapping Class for Greenplum Data

After establishing the connection, declare a mapping class for the table you wish to model in the ORM (in this article, we will model the Orders table). Use the sqlalchemy.ext.declarative.declarative_base function and create a new class with some or all of the fields (columns) defined.

base = declarative_base() class Orders(base): __tablename__ = «Orders» Freight = Column(String,primary_key=True) ShipName = Column(String) .

Query Greenplum Data

With the mapping class prepared, you can use a session object to query the data source. After binding the Engine to the session, provide the mapping class to the session query method.

Using the query Method

engine = create_engine(«greenplum:///?User=user&Password=admin&Database=dbname&Server=127.0.0.1&Port=5432″) factory = sessionmaker(bind=engine) session = factory() for instance in session.query(Orders).filter_by(ShipCountry=»USA»): print(«Freight: «, instance.Freight) print(«ShipName: «, instance.ShipName) print(«———«)

Alternatively, you can use the execute method with the appropriate table object. The code below works with an active session .

Using the execute Method

Orders_table = Orders.metadata.tables[«Orders»] for instance in session.execute(Orders_table.select().where(Orders_table.c.ShipCountry == «USA»)): print(«Freight: «, instance.Freight) print(«ShipName: «, instance.ShipName) print(«———«)

For examples of more complex querying, including JOINs, aggregations, limits, and more, refer to the Help documentation for the extension.

Insert Greenplum Data

To insert Greenplum data, define an instance of the mapped class and add it to the active session . Call the commit function on the session to push all added instances to Greenplum.

new_rec = Orders(Freight=»placeholder», ShipCountry=»USA») session.add(new_rec) session.commit()

Update Greenplum Data

To update Greenplum data, fetch the desired record(s) with a filter query. Then, modify the values of the fields and call the commit function on the session to push the modified record to Greenplum.

updated_rec = session.query(Orders).filter_by(SOME_ID_COLUMN=»SOME_ID_VALUE»).first() updated_rec.ShipCountry = «USA» session.commit()

Delete Greenplum Data

To delete Greenplum data, fetch the desired record(s) with a filter query. Then delete the record with the active session and call the commit function on the session to perform the delete operation on the provided records (rows).

deleted_rec = session.query(Orders).filter_by(SOME_ID_COLUMN=»SOME_ID_VALUE»).first() session.delete(deleted_rec) session.commit()

Free Trial & More Information

Download a free, 30-day trial of the Greenplum Python Connector to start building Python apps and scripts with connectivity to Greenplum data. Reach out to our Support Team if you have any questions.

CData Software is a leading provider of data access and connectivity solutions. Our standards-based connectors streamline data access and insulate customers from the complexities of integrating with on-premise or cloud databases, SaaS, APIs, NoSQL, and Big Data.

Источник

Оцените статью