Python dataset to array

Python: How to convert Tensorflow dataset to 2D numpy array

It doesn’t sound like you set up things using the Tensorflow Dataset pipeline, here is the guide for doing so:

You can either follow that (it’s the right approach, but there’s a small learning curve to get used to it), or you can just pass in the numpy array to sess.run as part of the feed_dict parameter. If you go this way then you should just create a tf.placeholder which will be populated by the value in feed_dict . Many of the basic tutorial examples here follow this approach:

You could try eager execution, previously I gave an answer with session run (showed below).
During eager execution using .numpy() on a tensor will convert that tensor to numpy array.
Example code (from my use case):

 #enable eager execution from __future__ import absolute_import, division, print_function, unicode_literals import tensorflow as tf tf.enable_eager_execution() print('Is executing eagerly?',tf.executing_eagerly()) #load datasets import tensorflow_datasets as tfds dataset, metadata = tfds.load('cycle_gan/horse2zebra', with_info=True, as_supervised=True) train_horses, train_zebras = dataset['trainA'], dataset['trainB'] #load dataset in to numpy array train_A=train_horses.batch(1000).make_one_shot_iterator().get_next()[0].numpy() print(train_A.shape) #preview one of the images import matplotlib.pyplot as plt %matplotlib inline import numpy as np print(train_A.shape) plt.imshow(train_A[1]) plt.show() 

I recently had this problem, and I did it like this:

 #load datasets import tf import tensorflow_datasets as tfds dataset, metadata = tfds.load('cycle_gan/horse2zebra', with_info=True, as_supervised=True) train_horses, train_zebras = dataset['trainA'], dataset['trainB'] #load dataset in to numpy array sess = tf.compat.v1.Session() tra=train_horses.batch(1000).make_one_shot_iterator().get_next() train_A=np.array(sess.run(tra)[0]) print(train_A.shape) sess.close() #preview one of the images import matplotlib.pyplot as plt %matplotlib inline import numpy as np print(train_A.shape) plt.imshow(train_A[1]) plt.show() 

I was also needing to accomplish this task (Dataset to array), but without turning on eager mode. I managed to come up with the following:

dataset = tf.data.Dataset.from_tensor_slices([[1,2],[3,4]]) tensor_array = tf.TensorArray(dtype=dataset.element_spec.dtype, size=0, dynamic_size=True, element_shape=dataset.element_spec.shape) tensor_array = dataset.reduce(tensor_array, lambda a, t: a.write(a.size(), t)) tensor = tf.reshape(tensor_array.concat(), (-1,)+tuple(dataset.element_spec.shape)) array = tf.Session().run(tensor) print(type(array)) # print(array) # [[1 2] # [3 4]] 

What this does:
We start with a dataset containing 2 tensors of shape (2,) .

Читайте также:  PHP - Hello, World!

Since eager is off, we need to run the dataset through a Tensorflow session. And since a session requires a tensor, we have to convert the dataset into a tensor.

To accomplish this, we use Dataset.reduce() to put all the elements into a TensorArray (symbolically).

We now use TensorArray.concat() to convert the whole array into a single tensor. However when we do this the whole dataset becomes flattened into a 1-D array. So we need tf.reshape() to get it back into our original tensor’s shape, plus an extra dimension to stack them all.

Finally we take the tensor and run it through a session. This gives us our numpy ndarray.

This was the simplest method for me for supervised problem with (X, y).

def dataset_to_numpy(ds): """ Convert tensorflow dataset to numpy arrays """ images = [] labels = [] # Iterate over a dataset for i, (image, label) in enumerate(tfds.as_numpy(ds)): images.append(image) labels.append(label) for i, img in enumerate(images): if i < 3: print(img.shape, labels[i]) return images, labels 
 ds = tfds.load('mnist', split='train', as_supervised=True) 

Источник

Converting Pandas DataFrame to Numpy Array [Step-By-Step]

Python Code To Convert

Hello Reader! In this article, we will see what the data frame is and how to convert Pandas Dataframe to Numpy Array and vice versa. So Let’s begin:

Introduction

A data frame in Python is a two-dimensional, tabular data structure consisting of rows and columns defining different features of the data frame.

We can create a data frame using the Pandas library or we can import an already built data frame (.csv file) and work on it. You can install Pandas using the pip command.

The above-written code installs pandas and we are all ready to use different functions of the Pandas library. In the same way, we will install the numpy library:

So first, we will see the conversion of this tabular structure (pandas data frame) into a numpy array.

Читайте также:  Html script for text

1. Converting Pandas Dataframe to Numpy Array

We can do this by using dataframe.to_numpy() method. This will convert the given Pandas Dataframe to Numpy Array.

#importing pandas import pandas as pd #creating dataframes student_data = set_of_numbers = print("This is our first dataset :") student_dataframe = pd.DataFrame(student_data) print("\n",student_dataframe) print("\nThis is our second dataset :") numbers_dataframe = pd.DataFrame(set_of_numbers) print("\n",numbers_dataframe)
  • We have created two data frames: student_data and set_of_numbers. Our data frames look like this:

Dataframes

print(type(student_dataframe))
print(type(numbers_dataframe))

The output for both the statements above is the same. I.e.,

Converting student_data to

student_array = student_dataframe.to_numpy() print(student_array)
[['Alice' 201 'Pass'] ['Sam' 202 'Pass'] ['Kevin' 203 'Fail'] ['Max' 204 'Pass'] ['Tom' 205 'Fail']]

For the second data frame (set_of_numbers)

numbers_array = numbers_dataframe.to_numpy() print(numbers_array)
print(type(student_array)) print(type(numbers_array))

So, we can clearly see that we converted our Pandas Dataframe to Numpy Array in just a few steps. This is the simplest way to handle data frames and their conversion.

  • Further, we can also change the data type of columns in a data frame. Considering our second data frame, it consists of some integer values and some floating values, let’s try to change all of them to float.
print(numbers_dataframe.to_numpy(dtype ='float64'))

2. Converting Numpy Arrays to Pandas Dataframes

Now that you have understood the conversion of the Pandas Dataframe to Numpy Array, we may need to convert the data back to Numpy Array. Let’s see how to do that:

  • First, define a numpy array. And then perform the conversion using pandas.DataFrame() function of pandas library.
#importing pandas and numpy import pandas as pd import numpy as np #defining numpy array arr1 = np.array([[1,6,4,5], [3,7,2,4], [9,5,3,7]]) print("Numpy array : ") print(arr1)

So, our array is like this:

Numpy array : [[1 6 4 5] [3 7 2 4] [9 5 3 7]]
#converting array to dataframe df = pd.DataFrame(arr1) print("\npandas dataframe :") df

The converted data frame is :

Pandas Dataframe to Numpy Array

  • We can also give our own headers to rows and columns of the data frames. Headers for rows can be given using index keyword and, for columns, we use the columns keyword.
#converting and providing headers df = pd.DataFrame(arr1, index = ["1","2","3"], columns = ["A","B","C","D" ]) print("\npandas dataframe :") df

This will make our data frame look like this :

Читайте также:  Double calculator in java

Pandas Dataframe to Numpy Array

Conclusion

With this, we come to the end of this article. In this article you understood

  • The basics of pandas dataframe and numpy array
  • How to convert pandas data frame to numpy array
  • How to convert numpy array to pandas dataframe

I hope this article was useful to you. Thank you! 🙂

References –

Источник

Как преобразовать Pandas Dataframe в NumPy массив

Как преобразовать Pandas Dataframe в NumPy массив

  1. В метод to_numpy для преобразования DataFrame в NumPy массив записей
  2. to_records() методу преобразования DataFrame в массив NumPy записи

Мы рассмотрим метод to_numpy() для преобразования Pythonого DataFrame в NumPy массив, введенный из панды v0.24.0 вместо старого метода .values . Мы можем определить to_numpy на объектах Index , Series , и DataFrame .

Старые DataFrame.values имеют противоречивое поведение, не рекомендуется использовать согласно документации по pandas API. Однако, мы рассмотрим пример этого метода в случае, если вы используете более старую версию.

Другой старый метод DataFrame.as_matrix() сейчас устарел, не используйте его!

Мы также представим другой подход, использующий метод DataFrame.to_records() для преобразования заданного DataFrame в массив записей NumPy .

В метод to_numpy для преобразования DataFrame в NumPy массив записей

Pandas DataFrame - это 2d табличная структура данных со строками и столбцами. Эта структура данных может быть преобразована в массив NumPy методом to_numpy :

# python 3.x import pandas as pd import numpy as np df = pd.DataFrame(  data=np.random.randint (  0, 10, (6,4)),  columns=["a", "b", "c", "d"]) nmp=df.to_numpy() print(nmp) print(type(nmp)) 
[[5 5 1 3]  [1 6 6 0]  [9 1 2 0]  [9 3 5 3]  [7 9 4 9]  [8 1 8 9]] class 'numpy.ndarray'> 

То же самое можно сделать с помощью метода dataframe.values следующим образом:

# python 3.x import pandas as pd import numpy as np df = pd.DataFrame(  data=np.random.randint(  0, 10, (6,4)),  columns=["a", "b", "c", "d"]) nmp=df.values print(nmp) print(type(nmp)) 

Источник

Оцените статью