Guide - Data Science with F#
Data science is the application of statistical analysis, machine learning, data visualization and programming to real-world data sources to bring understanding and insight to data-oriented problem domains. F# is an excellent solution for programmatic data science as it combines efficient execution, REPL-scripting, powerful libraries and scalable data integration.
As data science employs techniques from many problem domains, numerous base technologies are required. F# has very strong support for integration with many systems and libraries, both via direct usage of .NET libraries and type providers, which provide interoperability support with Excel, R, Python, MATLAB and Mathematica, and more. For details on interoperability with these systems, see:
Many other resources are available for integrating F#, C# and .NET with these systems. If a resource specific to F# can’t be found, then search for C# instead and adjust the technique appropriately. Data science also requires strong support for many technologies covered in other Guides. For detailed information, refer to the guides for Math and Statistics, Data Access, Machine Learning and Cloud Programming.
This guide includes resources related to data science programming and scripting with F#. To contribute to this guide, log on to GitHub, edit this page and send a pull request.
Note that the resources listed below are provided only for educational purposes related to the F# programming language. The F# Software Foundation does not endorse or recommend any commercial products, processes, or services. Therefore, mention of commercial products, processes, or services should not be construed as an endorsement or recommendation.
Integrated Data Science Packages
FsLab is an integrated, cross-platform collection of open source data science pacakges for F#, including FSharp.Data, Deedle, RProvider, Math.NET Numerics and more.
Interactive Charting on Windows
FSharp.Charting - an interactive charting library frequently used on Windows.
Using R, MATLAB, Mathematica, Excel and Python for Data Visualization
F# can integrate with systems such as R, MATLAB, Mathematica, Excel and Python and these can be used for data visualization. See below for more details and the following tutorials specific to visualization:
- Using F#, R and GGPlot2
- Tutorial: Charting with Excel from F#
- Tutorial: Charting with Gnuplot from F#
Time Series Programming
Deedle is an easy-to-use, high quality package for data and time series manipulation and for scientific programming. It uses a design similar to the Pandas library from Python and the ‘tseries’ or ‘zoo’ packages in R, though with stronger typing. Deedle supports working with structured data frames, ordered and unordered data, as well as time series. Deedle is designed to work well for exploratory programming using F# and C# interactive console, but can be also used in efficient compiled .NET code.
F# and Excel
Integrating F# and Excel through Excel-DNA
Excel-DNA is an independent project to integrate .NET into Excel. With Excel-DNA you can make native (.xll) add-ins for Excel using C#, Visual Basic.NET or F#, providing high-performance user-defined functions (UDFs), custom ribbon interfaces and more. Your entire add-in can be packed into a single .xll file requiring no installation or registration:
- Excel-DNA home pages
- Combining F# and Excel using Excel DNA
- Async and event-streaming Excel UDFs with F#
- Machine Learning with Excel: Combine the power of Excel, F# and R
Integrating F# and Excel through Open Office XML file manipulation
NPOI is .NET version of POI Java project at http://poi.apache.org/. POI is an open source project which can help you read/write xls, doc, ppt files.
EPPlus is a .NET library that reads and writes Excel 2007/2010/2013 files using the Open Office XML format (xlsx)
ExcelPackageF is a simple F# wrapper over the EPPlus library.
Both NPOI and EPPLus manipulate the Open Office XML format directly so they do not require having Excel installed and do not use Interop. You can read, create, and edit Excel documents using this approach.
Interoperating with Excel through type providers
The F# Excel Type Provider is a prototypical F# type providerfor Excel that allows you to read Excel files using typed data provided by the type provider.
Interoperating with Excel through API’s
F# can interoperate with Excel through existing Excel API’s. For example:
There are also some F# versions of Excel functions, useful when migrating code:
F# and R
- R Type Provider for F# - An F# type provider for high fidelity integration between F# and R
- R.NET - Core interoperability component used by the R Type Provider with some F# extensions.
F# and MATLAB
- MATLAB Type Provider for F# - An F# type provider for higher-fidelity integration between F# and MATLAB
F# and Python
- Python for .NET - Allows Python to be integrated into F# and C# programs
F# and Mathematica/Wolfram Language