Machine Learning.. Endless Struggle

Joined
Feb 6, 2023
Messages
42
Reaction score
2
Hello. I have been attempting to do some ml with python and I have been struggling a lot. Actually most of the struggle is having to convert the data into numpy arrays. Virtually every source includes importing data from excels and I have no idea how to do this otherwise. I want to predict sales data using random forest regression algorithm with scikit learn. My data is structured like this:
Code:
{'2022': {1: [[0, 0, 8, 10, 7, 6, 8, 0, 0, 5, 16, 12, 8, 7, 1, 0, 15, 7, 6, 9, 13, 0, 0, 4, 6, 18, 11, 16, 0, 0, 5], [5, 13, 8, 9, 0, 0, 12, 9, 17, 20, 15, 0, 0, 11, 17, 14, 16, 4, 1, 0, 25, 6, 17, 15, 10, 2, 0, 13], [11, 22, 14, 9, 1, 0, 13, 23, 14, 13, 7, 2, 0, 12, 11, 8, 1, 5, 0, 0, 4, 4, 4, 3, 9, 0, 0, 4, 19, 11, 4], [14, 2, 0, 6, 16, 4, 11, 16, 0, 13, 8, 9, 4, 9, 5, 2, 0, 11, 12, 4, 5, 5, 0, 0, 11, 6, 12, 12, 10, 3], [0, 0, 0, 0, 12, 8, 1, 0, 12, 13, 18, 10, 7, 0, 0, 24, 9, 12, 0, 10, 2, 0, 14, 14, 14, 4, 5, 0, 0, 7, 5], [10, 4, 10, 0, 0, 8, 7, 5, 1, 6, 0, 0, 16, 3, 16, 12, 9, 3, 0, 8, 6, 2, 3, 6, 1, 0, 5, 7, 10, 3], [12, 2, 0, 3, 7, 7, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 4, 14, 1, 0, 0, 8, 0, 0, 6], [7, 2, 3, 1, 7, 0, 0, 7, 5, 0, 9, 4, 0, 0, 1, 7, 8, 6, 3, 0, 0, 4, 10, 5, 4, 5, 2, 0, 5, 0, 7], [12, 9, 4, 0, 16, 8, 7, 5, 16, 1, 0, 2, 12, 11, 11, 7, 1, 0, 5, 9, 20, 11, 14, 1, 0, 14, 9, 6, 9, 15], [1, 0, 8, 11, 14, 12, 13, 1, 0, 13, 12, 4, 15, 7, 1, 0, 10, 7, 8, 6, 8, 0, 0, 9, 12, 2, 12, 1, 0, 0, 15], [6, 12, 6, 7, 0, 0, 8, 15, 13, 6, 1, 0, 0, 8, 8, 3, 9, 15, 0, 0, 16, 7, 11, 4, 6, 0, 0, 11, 10, 6], [10, 21, 10, 0, 13, 6, 13, 7, 12, 0, 0, 12, 16, 13, 4, 12, 1, 0, 13, 4, 10, 13, 21, 2, 0, 16, 10, 12, 13, 4]]}}
and the deeper question is how should I go about doing ml? part of the problem is I am not really super interested in ml and I just have to use it in my projects so while I don't wanna delve into it maybe I just have to? I'm so lost please help me find a direction.
 
Joined
Jan 30, 2023
Messages
107
Reaction score
13
Hello. I have been attempting to do some ml with python and I have been struggling a lot. Actually most of the struggle is having to convert the data into numpy arrays. Virtually every source includes importing data from excels and I have no idea how to do this otherwise. I want to predict sales data using random forest regression algorithm with scikit learn. My data is structured like this:
Code:
{'2022': {1: [[0, 0, 8, 10, 7, 6, 8, 0, 0, 5, 16, 12, 8, 7, 1, 0, 15, 7, 6, 9, 13, 0, 0, 4, 6, 18, 11, 16, 0, 0, 5], [5, 13, 8, 9, 0, 0, 12, 9, 17, 20, 15, 0, 0, 11, 17, 14, 16, 4, 1, 0, 25, 6, 17, 15, 10, 2, 0, 13], [11, 22, 14, 9, 1, 0, 13, 23, 14, 13, 7, 2, 0, 12, 11, 8, 1, 5, 0, 0, 4, 4, 4, 3, 9, 0, 0, 4, 19, 11, 4], [14, 2, 0, 6, 16, 4, 11, 16, 0, 13, 8, 9, 4, 9, 5, 2, 0, 11, 12, 4, 5, 5, 0, 0, 11, 6, 12, 12, 10, 3], [0, 0, 0, 0, 12, 8, 1, 0, 12, 13, 18, 10, 7, 0, 0, 24, 9, 12, 0, 10, 2, 0, 14, 14, 14, 4, 5, 0, 0, 7, 5], [10, 4, 10, 0, 0, 8, 7, 5, 1, 6, 0, 0, 16, 3, 16, 12, 9, 3, 0, 8, 6, 2, 3, 6, 1, 0, 5, 7, 10, 3], [12, 2, 0, 3, 7, 7, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 4, 14, 1, 0, 0, 8, 0, 0, 6], [7, 2, 3, 1, 7, 0, 0, 7, 5, 0, 9, 4, 0, 0, 1, 7, 8, 6, 3, 0, 0, 4, 10, 5, 4, 5, 2, 0, 5, 0, 7], [12, 9, 4, 0, 16, 8, 7, 5, 16, 1, 0, 2, 12, 11, 11, 7, 1, 0, 5, 9, 20, 11, 14, 1, 0, 14, 9, 6, 9, 15], [1, 0, 8, 11, 14, 12, 13, 1, 0, 13, 12, 4, 15, 7, 1, 0, 10, 7, 8, 6, 8, 0, 0, 9, 12, 2, 12, 1, 0, 0, 15], [6, 12, 6, 7, 0, 0, 8, 15, 13, 6, 1, 0, 0, 8, 8, 3, 9, 15, 0, 0, 16, 7, 11, 4, 6, 0, 0, 11, 10, 6], [10, 21, 10, 0, 13, 6, 13, 7, 12, 0, 0, 12, 16, 13, 4, 12, 1, 0, 13, 4, 10, 13, 21, 2, 0, 16, 10, 12, 13, 4]]}}
and the deeper question is how should I go about doing ml? part of the problem is I am not really super interested in ml and I just have to use it in my projects so while I don't wanna delve into it maybe I just have to? I'm so lost please help me find a direction.
It sounds like you're struggling with loading your data into a format that can be used with scikit-learn's machine learning algorithms. In your case, you have a dictionary with nested lists that represent your data. One way to convert this into a format that can be used with scikit-learn is to convert the dictionary into a Pandas DataFrame, and then use the DataFrame to create a NumPy array.

Here's an example of how you can do this with your data:

Python:
import pandas as pd
import numpy as np

# Create a Pandas DataFrame from your dictionary
df = pd.DataFrame.from_dict(your_data_dict)

# Convert the DataFrame to a NumPy array
X = np.array(df.values)

# Flatten the nested list to create a 2D array
X = X.reshape(X.shape[0], -1)



It sounds like you're struggling with loading your data into a format that can be used with scikit-learn's machine learning algorithms. In your case, you have a dictionary with nested lists that represent your data. One way to convert this into a format that can be used with scikit-learn is to convert the dictionary into a Pandas DataFrame, and then use the DataFrame to create a NumPy array.
Here's an example of how you can do this with your data:
python
import pandas as pd
import numpy as np

# Create a Pandas DataFrame from your dictionary
df = pd.DataFrame.from_dict(your_data_dict)

# Convert the DataFrame to a NumPy array
X = np.array(df.values)

# Flatten the nested list to create a 2D array
X = X.reshape(X.shape[0], -1)

Once you have your data in the correct format, you can use scikit-learn to split your data into training and testing sets, train a random forest regression model, and make predictions on new data.

Here's an example of how you can do this:

Python:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

# Split your data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a random forest regression model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions on new data
y_pred = model.predict(X_test)

This is just a basic example, and there are many other techniques you can use to preprocess your data and improve the performance of your model. I would recommend reading through scikit-learn's documentation and working through some examples to get a better understanding of the tools and techniques available to you. Good luck!
 
Joined
Feb 6, 2023
Messages
42
Reaction score
2
The problem is your solution is not valid because the 12 arrays are not equally long.
I have managed to work with a few algorithms but I can't get decent accuracy..
 
Joined
Feb 6, 2023
Messages
42
Reaction score
2
Yeah that was the first thing that yelled at me. You can easily do a check in excel and just make sure that all the cells have a value so all the rows are the same length.

@Wictorian as far as ML goes, you first want to have a pretty good idea of what problem you are trying to solve. There are a lot of tools out there that already have autoML baked into their reports for the most straightforward use cases.
the thing is all the months have different lengths. And I am having problems like simple syntax errors and such..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,525
Members
44,997
Latest member
mileyka

Latest Threads

Top