当前位置：首页 > news >正文

林壑地板北京网站建设东莞市阳光网首页

news 2025/12/19 3:28:36

林壑地板北京网站建设,东莞市阳光网首页,wordpress 备份到云盘,淘宝的网站怎么做的好处目录一、用法精讲 44、pandas.crosstab函数 44-1、语法 44-2、参数 44-3、功能 44-4、返回值 44-5、说明 44-6、用法 44-6-1、数据准备 44-6-2、代码示例 44-6-3、结果输出 45、pandas.cut函数 45-1、语法 45-2、参数 45-3、功能 45-4、返回值 45-5、说明 4…目录一、用法精讲 44、pandas.crosstab函数 44-1、语法 44-2、参数 44-3、功能 44-4、返回值 44-5、说明 44-6、用法 44-6-1、数据准备 44-6-2、代码示例 44-6-3、结果输出 45、pandas.cut函数 45-1、语法 45-2、参数 45-3、功能 45-4、返回值 45-5、说明 45-6、用法 45-6-1、数据准备 45-6-2、代码示例 45-6-3、结果输出 46、pandas.qcut函数 46-1、语法 46-2、参数 46-3、功能 46-4、返回值 46-5、说明 46-6、用法 46-6-1、数据准备 46-6-2、代码示例 46-6-3、结果输出二、推荐阅读 1、Python筑基之旅 2、Python函数之旅 3、Python算法之旅 4、Python魔法之旅 5、博客个人主页一、用法精讲 44、pandas.crosstab函数 44-1、语法 # 44、pandas.crosstab函数 pandas.crosstab(index, columns, valuesNone, rownamesNone, colnamesNone, aggfuncNone, marginsFalse, margins_nameAll, dropnaTrue, normalizeFalse) Compute a simple cross tabulation of two (or more) factors.By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.Parameters: index array-like, Series, or list of arrays/Series Values to group by in the rows.columns array-like, Series, or list of arrays/Series Values to group by in the columns.values array-like, optional Array of values to aggregate according to the factors. Requires aggfunc be specified.rownames sequence, default None If passed, must match number of row arrays passed.colnames sequence, default None If passed, must match number of column arrays passed.aggfunc function, optional If specified, requires values be specified as well.margins bool, default False Add row/column margins (subtotals).margins_name str, default ‘All’ Name of the row/column that will contain the totals when margins is True.dropna bool, default True Do not include columns whose entries are all NaN.normalize bool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False Normalize by dividing all values by the sum of values.If passed ‘all’ or True, will normalize over all values.If passed ‘index’ will normalize over each row.If passed ‘columns’ will normalize over each column.If margins is True, will also normalize margin values.Returns: DataFrame Cross tabulation of the data. 44-2、参数 44-2-1、index(必须)用于交叉表的行索引的数组或序列这通常是DataFrame中的一列或多列用于确定交叉表的行。 44-2-2、columns(必须)用于交叉表的列索引的数组或序列这同样是DataFrame中的一列或多列用于确定交叉表的列。 44-2-3、values(可选默认值为None)如果提供它应该是DataFrame中的一列其值将根据aggfunc参数指定的函数进行聚合以填充交叉表的单元格如果不提供则默认计算每个组合中的观测数(即计数)。 44-2-4、rownames/colnames(可选默认值为None)在较新版本的pandas中这两个参数可能已被弃用或不再使用它们原本用于为行和列索引提供自定义名称但现在通常建议直接使用index和columns参数的列名作为行和列索引的名称。 44-2-5、aggfunc(可选默认值为None)用于聚合values参数指定的值的函数如果values为None则默认为count即计算每个组合的观测数其他函数有sum、mean、max、min等。 44-2-6、margins(可选默认值为False)布尔值如果为True则会在交叉表的末尾添加一个全行/全列包含所有值的聚合(基于aggfunc)。 44-2-7、margins_name(可选默认值为All)字符串当marginsTrue时用于命名全行/全列的标签。 44-2-8、dropna(可选默认值为True)布尔值如果为True则会从结果中删除包含缺失值的行或列(取决于index和columns中的缺失值)如果为False则包含缺失值的组合也会出现在交叉表中但它们的值将取决于aggfunc和values的设置。 44-2-9、normalize(可选默认值为False)布尔值或字符串(index或columns)如果为True则会对值进行归一化处理使得每个行(或列取决于归一化方式)的总和等于1如果为index则对每行进行归一化如果为columns则对每列进行归一化。 44-3、功能用于创建交叉表(也称为列联表或频数表)。 44-4、返回值返回值是一个新的DataFrame该DataFrame展示了基于index和columns参数指定的行和列索引的交叉表。 44-5、说明 44-5-1、如果未指定values和aggfunc参数则交叉表中的值默认为每个组合的观测数量。 44-5-2、如果指定了values和aggfunc参数则交叉表中的值是根据aggfunc指定的聚合函数对values中的值进行聚合得到的结果。 44-5-3、如果margins参数为True则返回的DataFrame还会包含一个额外的全行和/或全列(取决于margins的具体设置)用于显示所有行和/或列的总和。 44-5-4、如果normalize参数为True或all则交叉表中的值会被归一化使得每行或每列(或整个交叉表)的总和等于 1如果normalize为index或columns则分别对每行或每列进行归一化。 44-6、用法 44-6-1、数据准备无 44-6-2、代码示例 # 44、pandas.crosstab函数 import pandas as pd import numpy as np # 创建一个示例数据集 data {Date: pd.date_range(2023-01-01, periods6, freqD),City: [New York, Los Angeles, New York, Los Angeles, New York, Los Angeles],Category: [A, A, B, B, A, B],Values: [100, 200, 150, 250, np.nan, 300] } df pd.DataFrame(data) print(原始数据集:) print(df) # 使用crosstab函数创建交叉表 crosstab_result pd.crosstab(index[df[Date], df[City]],columnsdf[Category],valuesdf[Values],rownames[Date, City],colnames[Category],aggfuncsum,marginsTrue,margins_nameAll,dropnaTrue,normalizeFalse ) print(\ncrosstab结果:) print(crosstab_result) 44-6-3、结果输出 # 44、pandas.crosstab函数 # 原始数据集: # Date City Category Values # 0 2023-01-01 New York A 100.0 # 1 2023-01-02 Los Angeles A 200.0 # 2 2023-01-03 New York B 150.0 # 3 2023-01-04 Los Angeles B 250.0 # 4 2023-01-05 New York A NaN # 5 2023-01-06 Los Angeles B 300.0# crosstab结果: # Category A B All # Date City # 2023-01-01 00:00:00 New York 100.0 NaN 100.0 # 2023-01-02 00:00:00 Los Angeles 200.0 NaN 200.0 # 2023-01-03 00:00:00 New York NaN 150.0 150.0 # 2023-01-04 00:00:00 Los Angeles NaN 250.0 250.0 # 2023-01-05 00:00:00 New York 0.0 NaN NaN # 2023-01-06 00:00:00 Los Angeles NaN 300.0 300.0 # All 300.0 700.0 1000.0 45、pandas.cut函数 45-1、语法 # 45、pandas.cut函数 pandas.cut(x, bins, rightTrue, labelsNone, retbinsFalse, precision3, include_lowestFalse, duplicatesraise, orderedTrue) Bin values into discrete intervals.Use cut when you need to segment and sort data values into bins. This function is also useful for going from a continuous variable to a categorical variable. For example, cut could convert ages to groups of age ranges. Supports binning into an equal number of bins, or a pre-specified array of bins.Parameters: x array-like The input array to be binned. Must be 1-dimensional.bins int, sequence of scalars, or IntervalIndex The criteria to bin by.int : Defines the number of equal-width bins in the range of x. The range of x is extended by .1% on each side to include the minimum and maximum values of x.sequence of scalars : Defines the bin edges allowing for non-uniform width. No extension of the range of x is done.IntervalIndex : Defines the exact bins to be used. Note that IntervalIndex for bins must be non-overlapping.right bool, default True Indicates whether bins includes the rightmost edge or not. If right True (the default), then the bins [1, 2, 3, 4] indicate (1,2], (2,3], (3,4]. This argument is ignored when bins is an IntervalIndex.labels array or False, default None Specifies the labels for the returned bins. Must be the same length as the resulting bins. If False, returns only integer indicators of the bins. This affects the type of the output container (see below). This argument is ignored when bins is an IntervalIndex. If True, raises an error. When orderedFalse, labels must be provided.retbins bool, default False Whether to return the bins or not. Useful when bins is provided as a scalar.precision int, default 3 The precision at which to store and display the bins labels.include_lowest bool, default False Whether the first interval should be left-inclusive or not.duplicates {default ‘raise’, ‘drop’}, optional If bin edges are not unique, raise ValueError or drop non-uniques.ordered bool, default True Whether the labels are ordered or not. Applies to returned types Categorical and Series (with Categorical dtype). If True, the resulting categorical will be ordered. If False, the resulting categorical will be unordered (labels must be provided).Returns: out Categorical, Series, or ndarray An array-like object representing the respective bin for each value of x. The type depends on the value of labels.None (default) : returns a Series for Series x or a Categorical for all other inputs. The values stored within are Interval dtype.sequence of scalars : returns a Series for Series x or a Categorical for all other inputs. The values stored within are whatever the type in the sequence is.False : returns an ndarray of integers.bins numpy.ndarray or IntervalIndex. The computed or specified bins. Only returned when retbinsTrue. For scalar or sequence bins, this is an ndarray with the computed bins. If set duplicatesdrop, bins will drop non-unique bin. For an IntervalIndex bins, this is equal to bins. 45-2、参数 45-2-1、x(必须)输入的数组或序列包含要分组的连续数据。 45-2-2、bins(必须)区间边界的数组或序列如果bins是一个整数函数会自动生成从x.min()到x.max()的等宽区间区间数量为bins(注意这会导致bins-1个区间)如果bins是一个序列它将被解释为区间的边界并定义每个区间的开放或闭合。 45-2-3、right(可选默认值为True)布尔值如果为True则区间是右闭的(即每个区间包括右端点)如果为False则区间是左闭的(即每个区间包括左端点)。 45-2-4、labels(可选默认值为None)用于标记输出类别的数组或序列如果给定它必须与生成的区间数量相同如果未提供则使用默认标签如[(0, 1], (1, 2], ...。 45-2-5、retbins(可选默认值为False)布尔值如果为True则返回区间边界数组和分类数组。 45-2-6、precision(可选默认值为3)整数用于设置返回区间标签的浮点数精度。只有当bins是整数且labels未指定时此参数才有效。 45-2-7、include_lowest(可选默认值为False)布尔值如果为True则第一个区间将包括其左边界这对于不均匀的bins或当bins的第一个值大于x的最小值时特别有用。 45-2-8、duplicates(可选默认值为raise){raise, drop}如果bins包含重复值则 45-2-8-1、raise引发ValueError。 45-2-8-2、drop删除重复值但仅保留第一个出现的值。 45-2-9、ordered(可选默认值为True)布尔值如果为True则返回的Categorical对象是有序的这对于后续的数据分析(如排序)很重要。 45-3、功能将连续的数值型数据按照指定的区间(或称为“桶”)进行分割从而将连续的数值变量转换为离散的类别变量这在数据分析和机器学习的特征工程中尤其有用因为它可以帮助揭示不同区间内的数据分布特征或者简化模型的输入。 45-4、返回值 45-4-1、当不设置retbinsTrue时pandas.cut函数返回一个Categorical对象该对象包含了输入数据 x 中每个值所属的区间标签Categorical对象是一种特殊的pandas数据类型用于表示固定数量的类别且这些类别是有序的(如果orderedTrue)。 45-4-2、当设置retbinsTrue时pandas.cut函数除了返回上述的Categorical对象外还会额外返回一个数组该数组包含了用于划分区间的边界值这允许用户同时获取区间标签和区间边界便于后续的数据处理和分析。 45-5、说明无 45-6、用法 45-6-1、数据准备无 45-6-2、代码示例 # 45、pandas.cut函数 import pandas as pd # 创建一个示例数据集 data {Age: [22, 25, 45, 33, 50, 41, 23, 37, 29, 31, 35, 48, 52, 44, 27] } df pd.DataFrame(data) print(原始数据集:) print(df) # 定义区间 bins [20, 30, 40, 50, 60] # 使用cut函数将年龄分割成不同的区间 df[Age Group] pd.cut(xdf[Age],binsbins,rightTrue,labels[20-30, 30-40, 40-50, 50-60],retbinsFalse,precision0,include_lowestTrue,duplicatesraise,orderedTrue ) print(\n分割后的数据集:) print(df) 45-6-3、结果输出 # 45、pandas.cut函数 # 原始数据集: # Age # 0 22 # 1 25 # 2 45 # 3 33 # 4 50 # 5 41 # 6 23 # 7 37 # 8 29 # 9 31 # 10 35 # 11 48 # 12 52 # 13 44 # 14 27# 分割后的数据集: # Age Age Group # 0 22 20-30 # 1 25 20-30 # 2 45 40-50 # 3 33 30-40 # 4 50 40-50 # 5 41 40-50 # 6 23 20-30 # 7 37 30-40 # 8 29 20-30 # 9 31 30-40 # 10 35 30-40 # 11 48 40-50 # 12 52 50-60 # 13 44 40-50 # 14 27 20-30 46、pandas.qcut函数 46-1、语法 # 46、pandas.qcut函数 pandas.qcut(x, q, labelsNone, retbinsFalse, precision3, duplicatesraise) Quantile-based discretization function.Discretize variable into equal-sized buckets based on rank or based on sample quantiles. For example 1000 values for 10 quantiles would produce a Categorical object indicating quantile membership for each data point.Parameters: x 1d ndarray or Series q int or list-like of float Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles.labels array or False, default None Used as labels for the resulting bins. Must be of the same length as the resulting bins. If False, return only integer indicators of the bins. If True, raises an error.retbins bool, optional Whether to return the (bins, labels) or not. Can be useful if bins is given as a scalar.precision int, optional The precision at which to store and display the bins labels.duplicates {default ‘raise’, ‘drop’}, optional If bin edges are not unique, raise ValueError or drop non-uniques.Returns: out Categorical or Series or array of integers if labels is False The return type (Categorical or Series) depends on the input: a Series of type category if input is a Series else Categorical. Bins are represented as categories when categorical data is returned.bins ndarray of floats Returned only if retbins is True.NotesOut of bounds values will be NA in the resulting Categorical object 46-2、参数 46-2-1、x(必须)要分箱(或分桶)的一维数组或类似数组的对象。 46-2-2、q(必须)int或array-like of quantiles如果是一个整数它表示要分成的箱(或桶)的数量如果是一个数组则必须包含从0到1的浮点数表示分位数。例如[0, 0.25, 0.5, 0.75, 1.]会将数据分成四个等宽的区间(或尽量等宽)。 46-2-3、labels(可选默认值为None)用于指定每个箱(或桶)的标签如果为None(默认值)则会自动生成标签(通常是基于整数索引的)如果为False则不返回标签如果提供了数组其长度必须与生成的箱数相同。 46-2-4、retbins(可选默认值为False)如果为True则返回用于分箱的边界数组(即每个箱的最小值和下一个箱的最小值之间的值除了最后一个箱其边界是无穷大)。 46-2-5、precision(可选默认值为3)控制内部计算的精度更高的精度可以减少由浮点数舍入引起的误差但可能会增加计算时间。 46-2-6、duplicates(可选默认值为raise)如果q参数中有重复的分位数并且duplicatesraise(默认值)则会抛出错误如果duplicatesdrop则忽略重复的分位数。 46-3、功能用于将连续数据根据分位数划分成等频(或近似等频)区间的重要工具其功能和返回值可以归纳如下 46-3-1、等频分箱pandas.qcut函数基于数据的分位数进行分箱确保每个箱(或桶)中的样本数量大致相等(在可能的情况下)这对于需要平衡各个类别中样本数量的场景特别有用。 46-3-2、自定义分位数除了将数据等频分箱外用户还可以通过指定q参数中的分位数数组来自定义分箱方式从而实现更精细的数据划分。 46-3-3、数据离散化在数据预处理和特征工程中pandas.qcut函数常用于将连续变量离散化以便进行后续的分析、建模或可视化。 46-4、返回值 46-4-1、如果retbinsFalse(默认值)则返回两个对象 46-4-1-1、bins一个与x形状相同的分类数组(Categorical dtype)表示每个元素所属的箱(或桶)。 46-4-1-2、labels(如果指定了)一个数组包含每个箱(或桶)的标签。 46-4-2、如果retbinsTrue则返回三个对象 46-4-2-1、bins 与x形状相同的分类数组。 46-4-2-2、labels(如果指定了)一个数组包含每个箱(或桶)的标签。 46-4-2-3、bin_edges一个数组表示箱(或桶)的边界。 46-5、说明无 46-6、用法 46-6-1、数据准备无 46-6-2、代码示例 # 46、pandas.qcut函数 import pandas as pd # 创建一个示例数据集 data {Age: [22, 25, 45, 33, 50, 41, 23, 37, 29, 31, 35, 48, 52, 44, 27] } df pd.DataFrame(data) print(原始数据集:) print(df) # 使用qcut函数将年龄按分位数分割成四个区间 df[Age Group] pd.qcut(xdf[Age],q4,labels[Q1, Q2, Q3, Q4],retbinsFalse,precision3,duplicatesraise ) print(\n按分位数分割后的数据集:) print(df) 46-6-3、结果输出 # 46、pandas.qcut函数 # 原始数据集: # Age # 0 22 # 1 25 # 2 45 # 3 33 # 4 50 # 5 41 # 6 23 # 7 37 # 8 29 # 9 31 # 10 35 # 11 48 # 12 52 # 13 44 # 14 27# 按分位数分割后的数据集: # Age Age Group # 0 22 Q1 # 1 25 Q1 # 2 45 Q4 # 3 33 Q2 # 4 50 Q4 # 5 41 Q3 # 6 23 Q1 # 7 37 Q3 # 8 29 Q2 # 9 31 Q2 # 10 35 Q2 # 11 48 Q4 # 12 52 Q4 # 13 44 Q3 # 14 27 Q1 二、推荐阅读 1、Python筑基之旅 2、Python函数之旅 3、Python算法之旅 4、Python魔法之旅 5、博客个人主页

查看全文

http://www.w-s-a.com/news/268718/