Python Pandas equivalent in JavaScript
With this CSV example:
Source,col1,col2,col3
foo,1,2,3
bar,3,4,5
The standard method I use Pandas is this:
Parse CSV
Select columns into a data frame (
col1
andcol3
)- Process the column (e.g. avarage the values of
col1
andcol3
)
Is there a JavaScript library that does that like Pandas?
All answers are good. Hoping my answer is comprehensive (i.e. tries to list all options). I hope to return and revise this answer with any criteria to help make a choice.
I hope anyone coming here is familiar with d3
. d3
is very useful "swiss army knife" for handling data in Javascript, like pandas
is helpful for Python. You may see d3
used frequently like pandas
, even if d3
is not exactly a DataFrame/Pandas replacement (i.e. d3
doesn't have the same API; d3
doesn't have Series
/ DataFrame
which behave like in pandas
)
Ahmed's answer explains how d3 can be used to achieve some DataFrame functionality, and some of the libraries below were inspired by things like LearnJsData which uses d3
and lodash
.
As for DataFrame-focused-features , I was overwhelmed with JS libraries which help. Here's a quick list of some of the options you might've encountered. I haven't checked any of them in detail yet (Most I found in combination Google + NPM search).
Be careful you use a variety that you can work with; some are Node.js aka Server-side Javascript, some are browser-compatible aka client-side Javascript. Some are Typescript.
- dataframe-js
- "DataFrame-js provides an immutable data structure for javascript and datascience, the DataFrame, which allows to work on rows and columns with a sql and functional programming inspired api."
- data-forge
- Seen in Ashley Davis' answer
- "JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ."
- Note the old data-forge JS repository is no longer maintained; now a new repository uses Typescript
- jsdataframe
- "Jsdataframe is a JavaScript data wrangling library inspired by data frame functionality in R and Python Pandas."
- dataframe
- "explore data by grouping and reducing."
Then after coming to this question, checking other answers here and doing more searching, I found options like:
- Apache Arrow in JS
- Thanks to user Back2Basics suggestion:
- "Apache Arrow is a columnar memory layout specification for encoding vectors and table-like containers of flat and nested data. Apache Arrow is the emerging standard for large in-memory columnar data (Spark, Pandas, Drill, Graphistry, ...)"
- Observable
- At first glance, seems like a
JS
alternative to the IPython/Jupyter "notebooks" - Observable's page promises: "Reactive programming", a "Community", on a "Web Platform"
- See 5 minute intro here
- At first glance, seems like a
- recline (from Rufus' answer)
- I expected an emphasis on DataFrame's API, which Pandas itself tries to
preserve from Rdocument its replacement/improvement/correspondence to every R function. - Instead I find an emphasis recline's example emphasizes
the jQuery way of getting data into the DOMits (awesome) Multiview (the UI), which doesn't require jQuery but does require a browser! More examples - ...or an emphasis on its MVC-ish architecture; including back-end stuff (i.e. database connections)
- I am probably being too harsh; after all, one of the nice things about pandas is how it can create visualizations easily; out-of-the-box.
- I expected an emphasis on DataFrame's API, which Pandas itself tries to
- js-data
- Really more of an ORM! Most of its modules correspond to different data storage questions (
js-data-mongodb
,js-data-redis
,js-data-cloud-datastore
), sorting, filtering, etc. - On plus-side does work on Node.js as a first-priority; "Works in Node.js and in the Browser."
- Really more of an ORM! Most of its modules correspond to different data storage questions (
- miso (another suggestion from Rufus)
- AlaSQL
- "AlaSQL" is an open source SQL database for Javascript with a strong focus on query speed and data source flexibility for both relational data and schemaless data. It works in your browser, Node.js, and Cordova."
- Some thought experiments:
I hope this post can become a community wiki, and evaluate (i.e. compare the different options above) against different criteria like:
- Panda's criterias in its R comparison
- Performance
- Functionality/flexibility
- Ease-of-use
- My own suggestions
- Similarity to Pandas / Dataframe API's
- Specifically hits on their main features
- Data-science emphasis > UI emphasis
- Demonstrated integration in combination with other tools like
Jupyter
(interactive notebooks), etc
Some things a JS library may never do (but could it?)
- Use an underlying framework that is best-in-class Javascript numbers/math library? (i.e. an equivalent of a NumPy)
- Use any optimizing/compilers that might result in faster code (i.e. an equivalent of Pandas' use of Cython)
- Sponsored by any data-science-flavored consortiums, ala Pandas and NumFocus
I've been working on a data wrangling library for JavaScript called data-forge. It's inspired by LINQ and Pandas.
It can be installed like this:
npm install --save data-forge
Your example would work like this:
var csvData = "Source,col1,col2,col3\n" +
"foo,1,2,3\n" +
"bar,3,4,5\n";
var dataForge = require('data-forge');
var dataFrame =
dataForge.fromCSV(csvData)
.parseInts([ "col1", "col2", "col3" ])
;
If your data was in a CSV file you could load it like this:
var dataFrame = dataForge.readFileSync(fileName)
.parseCSV()
.parseInts([ "col1", "col2", "col3" ])
;
You can use the select
method to transform rows.
You can extract a column using getSeries
then use the select
method to transform values in that column.
You get your data back out of the data-frame like this:
var data = dataFrame.toArray();
To average a column:
var avg = dataFrame.getSeries("col1").average();
There is much more you can do with this.
You can find more documentation on npm.
I think the closest thing are libraries like:
Recline in particular has a Dataset object with a structure somewhat similar to Pandas data frames. It then allows you to connect your data with "Views" such as a data grid, graphing, maps etc. Views are usually thin wrappers around existing best of breed visualization libraries such as D3, Flot, SlickGrid etc.
Here's an example for Recline:
// Load some data var dataset = recline.Model.Dataset({ records: [ { value: 1, date: '2012-08-07' }, { value: 5, b: '2013-09-07' } ] // Load CSV data instead // (And Recline has support for many more data source types) // url: 'my-local-csv-file.csv', // backend: 'csv' }); // get an element from your HTML for the viewer var $el = $('#data-viewer'); var allInOneDataViewer = new recline.View.MultiView({ model: dataset, el: $el }); // Your new Data Viewer will be live!
Ceaveat The following is applicable only to d3 v3, and not the latest d4v4!
I am partial to d3.js, and while it won't be a total replacement for Pandas, if you spend some time learning its paradigm, it should be able to take care of all your data wrangling for you. (And if you wind up wanting to display results in the browser, it's ideally suited to that.)
Example. My CSV file data.csv
:
name,age,color
Mickey,65,black
Donald,58,white
Pluto,64,orange
In the same directory, create an index.html
containing the following:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>My D3 demo</title>
<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>
</head>
<body>
<script charset="utf-8" src="demo.js"></script>
</body>
</html>
and also a demo.js
file containing the following:
d3.csv('/data.csv',
// How to format each row. Since the CSV file has a header, `row` will be
// an object with keys derived from the header.
function(row) {
return {name : row.name, age : +row.age, color : row.color};
},
// Callback to run once all data's loaded and ready.
function(data) {
// Log the data to the JavaScript console
console.log(data);
// Compute some interesting results
var averageAge = data.reduce(function(prev, curr) {
return prev + curr.age;
}, 0) / data.length;
// Also, display it
var ulSelection = d3.select('body').append('ul');
var valuesSelection =
ulSelection.selectAll('li').data(data).enter().append('li').text(
function(d) { return d.age; });
var totalSelection =
ulSelection.append('li').text('Average: ' + averageAge);
});
In the directory, run python -m SimpleHTTPServer 8181
, and open http://localhost:8181 in your browser to see a simple listing of the ages and their average.
This simple example shows a few relevant features of d3:
- Excellent support for ingesting online data (CSV, TSV, JSON, etc.)
- Data wrangling smarts baked in
- Data-driven DOM manipulation (maybe the hardest thing to wrap one's head around): your data gets transformed into DOM elements.
Below is Python numpy and pandas
```
import numpy as np
import pandas as pd
data_frame = pd.DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])
data_frame[5] = np.random.randint(1, 50, 5)
print(data_frame.loc[['C', 'D'], [2, 3]])
# axis 1 = Y | 0 = X
data_frame.drop(5, axis=1, inplace=True)
print(data_frame)
```
The same can be achieved in JavaScript* [numjs works only with Node.js] But D3.js has much advanced Data file set options. Both numjs and Pandas-js still in works..
import np from 'numjs';
import { DataFrame } from 'pandas-js';
const df = new DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])
// df
/*
1 2 3 4
A 0.023126 1.078130 -0.521409 -1.480726
B 0.920194 -0.201019 0.028180 0.558041
C -0.650564 -0.505693 -0.533010 0.441858
D -0.973549 0.095626 -1.302843 1.109872
E -0.989123 -1.382969 -1.682573 -0.637132
*/
Pandas.js at the moment is an experimental library, but seems very promising it uses under the hood immutable.js and NumpPy logic, both data objects series and DataFrame are there..
Here is an dynamic approach assuming an existing header on line 1. The csv is loaded with d3.js
.
function csvToColumnArrays(csv) {
var mainObj = {},
header = Object.keys(csv[0]);
for (var i = 0; i < header.length; i++) {
mainObj[header[i]] = [];
};
csv.map(function(d) {
for (key in mainObj) {
mainObj[key].push(d[key])
}
});
return mainObj;
}
d3.csv(path, function(csv) {
var df = csvToColumnArrays(csv);
});
Then you are able to access each column of the data similar an R, python or Matlab dataframe with df.column_header[row_number]
.
It's pretty easy to parse CSV in javascript because each line's already essentially a javascript array. If you load your csv into an array of strings (one per line) it's pretty easy to load an array of arrays with the values:
var pivot = function(data){
var result = [];
for (var i = 0; i < data.length; i++){
for (var j=0; j < data[i].length; j++){
if (i === 0){
result[j] = [];
}
result[j][i] = data[i][j];
}
}
return result;
};
var getData = function() {
var csvString = $(".myText").val();
var csvLines = csvString.split(/\n?$/m);
var dataTable = [];
for (var i = 0; i < csvLines.length; i++){
var values;
eval("values = [" + csvLines[i] + "]");
dataTable[i] = values;
}
return pivot(dataTable);
};
Then getData()
returns a multidimensional array of values by column.
I've demonstrated this in a jsFiddle for you.
Of course, you can't do it quite this easily if you don't trust the input - if there could be script in your data which eval might pick up, etc.
ReferenceURL : https://stackoverflow.com/questions/30610675/python-pandas-equivalent-in-javascript
'development' 카테고리의 다른 글
sed-아직 주석 처리되지 않은 특정 문자열과 일치하는 행에 주석 달기 (0) | 2021.01.07 |
---|---|
ReactJS : 자식 구성 요소에 배치 될 때 onClick 핸들러가 실행되지 않음 (0) | 2021.01.07 |
git repo를 종속성으로 포함하도록 setup.py를 작성하는 방법 (0) | 2021.01.07 |
TensorFlow 그래프에 if 조건을 추가하는 방법은 무엇입니까? (0) | 2021.01.07 |
VueJS v-for와 함께 계산 된 속성을 어떻게 사용할 수 있습니까? (0) | 2021.01.07 |