Sean Lahman's Baseball Database
Lahman-package.Rd
This database contains pitching, hitting, and fielding statistics for Major League Baseball from 1871 through 2023. It includes data from the two current leagues (American and National), the four other "major" leagues (American Association, Union Association, Players League, and Federal League), and the National Association of 1871-1875.
This database was created by Sean Lahman, who pioneered the effort to make baseball statistics freely available to the general public. What started as a one man effort in 1994 has grown tremendously, and now a team of researchers have collected their efforts to make this the largest and most accurate source for baseball statistics available anywhere.
This database, in the form of an R package offers a variety of interesting challenges and opportunities for data processing and visualization in R.
In the current version, the examples make extensive use of the dplyr
package for data manipulation (tabulation, queries, summaries, merging, etc.),
reflecting the original relational database design
and ggplot2
for graphics.
Details
Package: | Lahman |
Type: | Package |
Version: | 12.0-0 |
Date: | 2024-08-24 |
License: | GPL version 2 or newer |
LazyLoad: | yes |
LazyData: | yes |
The main form of this database is a relational database in Microsoft Access format.
The design follows these general principles: Each player is assigned a
unique code (playerID
). All of the information in different tables relating to that player
is tagged with his playerID
. The playerID
s are linked to names and
birthdates in the People
table. Similar links exist among other tables
via analogous *ID
variables.
The database is composed of the following main tables:
People
Player names, dates of birth, death and other biographical info
Batting
batting statistics
Pitching
pitching statistics
Fielding
fielding statistics
% \item{\code{\link{Teams}}}{yearly team statistics and standings}
A collection of other tables is also provided:
Teams:
Teams | yearly stats and standings |
TeamsHalf | split season data for teams |
TeamsFranchises | franchise information |
Post-season play:
BattingPost | post-season batting statistics |
PitchingPost | post-season pitching statistics |
FieldingPost | post-season fielding data |
SeriesPost | post-season series information |
Awards:
AwardsManagers | awards won by managers |
AwardsPlayers | awards won by players |
AwardsShareManagers | award voting for manager awards |
AwardsSharePlayers | award voting for player awards |
Hall of Fame: links to People via hofID
HallOfFame | Hall of Fame voting data |
Other tables:
AllstarFull
- All-Star games appearances;
Managers
- managerial statistics;
FieldingOF
- outfield position data;
ManagersHalf
- split season data for managers;
Salaries
- player salary data;
Appearances
- data on player appearances;
Schools
- Information on schools players attended;
CollegePlaying
- Information on schools players attended, by player and year;
Variable label tables are provided for some of the tables:
Author
Michael Friendly, Dennis Murphy, Chris Dalzell, Martin Monkman
Maintainer: Chris Dalzell <cdalzell@gmail.com>
Source
Lahman, S. (2024) Lahman's Baseball Database, 1871-2023, Main page, http://www.seanlahman.com/