Auto feature engineering python

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

An open source python library for automated feature engineering

License

alteryx/featuretools

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

* updates for FutureWarnings * update release notes * clean up more warnings * update release notes

Git stats

Files

Failed to load latest commit information.

README.md

«One of the holy grails of machine learning is to automate more and more of the feature engineering process.» ― Pedro Domingos, A Few Useful Things to Know about Machine Learning

Читайте также:  Javascript with case sensitive

Featuretools is a python library for automated feature engineering. See the documentation for more information.

python -m pip install featuretools 

or from the Conda-forge channel on conda:

conda install -c conda-forge featuretools 

You can install add-ons individually or all at once by running

python -m pip install "featuretools[complete]" 

Update checker — Receive automatic notifications of new Featuretools releases

python -m pip install "featuretools[updater]" 

Premium Primitives — Use Premium Primitives, including Natural Language Processing primitives:

python -m pip install "featuretools[premium]" 

TSFresh Primitives — Use 60+ primitives from tsfresh within Featuretools

python -m pip install "featuretools[tsfresh]" 

Dask Support — Use Dask Dataframes to create EntitySets or run DFS with njobs > 1

python -m pip install "featuretools[dask]" 

SQL — Automatic EntitySet generation from relational data stored in a SQL database:

python -m pip install "featuretools[sql]" 

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

>> import featuretools as ft >> es = ft.demo.load_mock_customer(return_entityset=True) >> es.plot()

Featuretools can automatically create a single table of features for any «target dataframe»

>> feature_matrix, features_defs = ft.dfs(entityset=es, target_dataframe_name="customers") >> feature_matrix.head(5)
 zip_code COUNT(transactions) COUNT(sessions) SUM(transactions.amount) MODE(sessions.device) MIN(transactions.amount) MAX(transactions.amount) YEAR(join_date) SKEW(transactions.amount) DAY(join_date) . SUM(sessions.MIN(transactions.amount)) MAX(sessions.SKEW(transactions.amount)) MAX(sessions.MIN(transactions.amount)) SUM(sessions.MEAN(transactions.amount)) STD(sessions.SUM(transactions.amount)) STD(sessions.MEAN(transactions.amount)) SKEW(sessions.MEAN(transactions.amount)) STD(sessions.MAX(transactions.amount)) NUM_UNIQUE(sessions.DAY(session_start)) MIN(sessions.SKEW(transactions.amount)) customer_id . 1 60091 131 10 10236.77 desktop 5.60 149.95 2008 0.070041 1 . 169.77 0.610052 41.95 791.976505 175.939423 9.299023 -0.377150 5.857976 1 -0.395358 2 02139 122 8 9118.81 mobile 5.81 149.15 2008 0.028647 20 . 114.85 0.492531 42.96 596.243506 230.333502 10.925037 0.962350 7.420480 1 -0.470007 3 02139 78 5 5758.24 desktop 6.78 147.73 2008 0.070814 10 . 64.98 0.645728 21.77 369.770121 471.048551 9.819148 -0.244976 12.537259 1 -0.630425 4 60091 111 8 8205.28 desktop 5.73 149.56 2008 0.087986 30 . 83.53 0.516262 17.27 584.673126 322.883448 13.065436 -0.548969 12.738488 1 -0.497169 5 02139 58 4 4571.37 tablet 5.91 148.17 2008 0.085883 19 . 73.09 0.830112 27.46 313.448942 198.522508 8.950528 0.098885 5.599228 1 -0.396571 [5 rows x 69 columns] 

We now have a feature vector for each customer that can be used for machine learning. See the documentation on Deep Feature Synthesis for more examples.

Читайте также:  Php где хранятся базы данных

Featuretools contains many different types of built-in primitives for creating features. If the primitive you need is not included, Featuretools also allows you to define your own custom primitives.

Predict Next Purchase

In this demonstration, we use a multi-table dataset of 3 million online grocery orders from Instacart to predict what a customer will buy next. We show how to generate features with automated feature engineering and build an accurate machine learning pipeline using Featuretools, which can be reused for multiple prediction problems. For more advanced users, we show how to scale that pipeline to a large dataset using Dask.

For more examples of how to use Featuretools, check out our demos page.

The Featuretools community welcomes pull requests. Instructions for testing and development are available here.

The Featuretools community is happy to provide support to users of Featuretools. Project support can be found in four places depending on the type of question:

  1. For usage questions, use Stack Overflow with the featuretools tag.
  2. For bugs, issues, or feature requests start a Github issue.
  3. For discussion regarding development on the core library, use Slack.
  4. For everything else, the core developers can be reached by email at open_source_support@alteryx.com

If you use Featuretools, please consider citing the following paper:

@inproceedingskanter2015deep, author = James Max Kanter and Kalyan Veeramachaneni>, title = Deep feature synthesis: Towards automating data science endeavors>, booktitle = 2015 International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, October 19-21, 2015>, pages = 1--10>, year = 2015>, organization=IEEE> >

Featuretools is an open source project maintained by Alteryx. To see the other open source projects we’re working on visit Alteryx Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

About

An open source python library for automated feature engineering

Источник

Automated Feature Engineering in Python

A guide to augmenting your dataset with new and informative features using Upgini

One of the most vital skills of any data scientist or machine learning professional is the ability to extract deeper and more meaningful features from any given dataset. This concept, more commonly known as feature engineering, is perhaps one of the most powerful techniques to master when modelling machine learning algorithms.

Learning from data involves a lot of engineering. Although most of its complexities have now been abstracted away by modern high-level tools such as sklearn, there still remains the critical need to fully understand the data and shape it into the problem that you want to solve.

Extracting better features helps with providing additional (and potentially stronger) underlying relationships to the model regarding the business domain and its influencing factors.

Needless to say, feature engineering is incredibly time-consuming and exhaustive. It requires a lot of creativity, technical expertise, and also in most cases, trial and error.

I’ve recently came across a new tool, Upgini. Fitting with the current trend on Large Language Models (LLM), Upgini exploits the power of OpenAI’s GPT LLM to automate the entire feature engineering process for our dataset.

In this article, we will go through the Upgini package and discuss its functionality.

For the purpose of this article, we will be using the Amazon Fine Food Review dataset (licensed under CC0: Public Domain).

Источник

Оцените статью