Soccer Dashboard
Data from Brazil, Mexico, USA and Argentina Soccer Leagues
Soccer Dashboard (BRA and US)
For this project I scraped data from a webpage, loaded it in a Database and created this HTML page to display the information.
Applied Technologies
- Python
- BeautifulSoup
- SQL (PostgreSQL)
- JSON
- HTML/CSS
- JavaScript
- D3
Steps
- Scrape
- Website scraped: Transfermarkt
- Using BeautifulSoup, scrape the website and save the data in a .csv file.
- SQL and JSON
- With SQLAlchemy, store all the data scraped from the URL above into in a single Database.
- Create a JSON file with all the data and used it to build the dashboard.
- HTML and JavaScript
- Build a webpage to present the information.
- Using JavaScript and D3, parse the JSON file and display the data.
Further Improvements
- Presentation
- Improve Technical Info with code snips and better descriptions
- Improve About
- Add Leagues Data page
- Add charts
- Improve Tableau Dashboards
- Structure
- Improve Database
- Develop a Flask App and API
- Prediction
- Develop a Machine Learning Model to predict results based on past games
- Use a Sentiment Analysis to read the feedback of a game.
Step 1 - Scraping
- I wanted the values to be displayed in US Dollars instead of Euros. To overcome this, I built a simple function that would get the exchange rate for the day and apply on all the values.
Detailed Descriptions
To kickoff this project, first I needed to select the data source for the players’ market values. I wanted a data source that was reliable and up to date with transfers and values, therefore I chose the Transfermarkt website, since they have just completed 20 years working with soccer data. The leagues I decided to work with were the first divisions from Brazil, USA, Mexico and Argetina.
Once I had chosen the source, the next step was obtain the data to work with. With the direct link to each league’s page I used Beautiful Soup to read and scrape information about the clubs for the respective league and look up the players info on each club page. Then used pandas to create a DataFrame and write a .csv file.
While scraping I ran into some issues:
convert_url = "https://www.google.com/search?q=euro+to+dollar....."
html = requests.get(convert_url, headers=headers)
soup = BeautifulSoup(html.content, 'html.parser' )
curr_value = soup.find('div' , class_= "b1hJbf" )
curr_value = round (float (curr_value["data-exchange-rate" ]),2)
def get_value_us (x):
value = []
for char in x:
value.append(char)
if value[-1] == "m" :
float_value = "" .join(value[1:-1])
return round (float (float_value)*curr_value,2)
else :
float_value = "" .join(value[1:-3])
return round ((float (float_value)*curr_value)/1000,2)
def fix_heights (df):
heights = []
for index, row in df.iterrows():
if pd.notnull(row["Height" ]):
heights.append(row["Height" ])
else :
nat = row["Nat" ]
position = row["position" ]
field_position = row["field_position" ]
comp_players = df.loc[((df["Nat" ] == nat) & (df["position" ] == position))]
avg_df = round (comp_players["Height" ].mean(skipna = True ),2)
row["Height" ] = avg_df
if pd.isnull(row["Height" ]):
comp_players2 = df.loc[((df["Nat" ] == nat) & (df["field_position" ] == field_position))]
avg_df2 = round (comp_players2["Height" ].mean(skipna = True ),2)
row["Height" ] = avg_df2
if pd.isnull(row["Height" ]):
comp_players3 = df.loc[((df["position" ] == position))]
avg_df3 = round (comp_players3["Height" ].mean(skipna = True ),2)
row["Height" ] = avg_df3
heights.append(row["Height" ])
df.height = heights
Step 2 - SQL and JSON
Once I had the data, I decided to store it on a SQL database. Using SQLAlchemy library I built 3 tables (Leagues, Teams and Players) and loaded the Database on Postgres SQL. Then I used JSON to export the results of a JOIN SQL query to a JSON file.
# read database to json file
pd.read_sql_query('SELECT * FROM leagues JOIN (teams JOIN players USING("team_ID")) USING ("league_ID")' , con=engine)\
.to_json(r'../dashboard/static/data/data.json' , orient='records' )