Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open PR to remove lightgbm stubs from microsoft/python-type-stubs at next release #5863

Closed
Avasam opened this issue May 2, 2023 · 8 comments

Comments

@Avasam
Copy link

Avasam commented May 2, 2023

Description

Since both projects are under the Microsoft organization, it made sense to open a reminder here.

Open a PR to delete https://github.com/microsoft/python-type-stubs/tree/main/lightgbm once a typed version of LightGBM is released on PyPI.
This could be added as a checklist item to #5153 if that's the version that will include type hints and a py.typed marker.

Motivation

https://github.com/microsoft/python-type-stubs is bundled with Pylance. The long-term goal as stated by maintainers is to upstream everything either to the base repository, or typeshed. Once LightGBM for Python releases with type hints, https://github.com/microsoft/python-type-stubs/tree/main/lightgbm needs to be deleted or users of Pylance won't be able to properly make use of the new type-hints. As well as causing differences between what's seen in the IDE and the pyright CLI results.

@jameslamb
Copy link
Collaborator

😱 I've never seen this! Thanks very much for bringing it to our attention.

We've been methodically working on adding type hints here in this repo for 2+ years (#3756). 😭

@shiyu1994 did you know about that project? Is it a part of VS Code?

@bschnurr since you're the author of microsoft/python-type-stubs#257, could you help us understand the purpose of that PR and what you'd like to see happen here in LightGBMM?

@bschnurr
Copy link
Member

bschnurr commented May 2, 2023

I see now there are return type hints.

def get_data(self) -> Optional[_LGBM_TrainDataType]:

I added stubs to address an issue with slow return type inference for function self.get_data in module lightgbm.basic
logs:

(39760) [BG(1)]                                           Re ["concat" (lightgbm.basic) [2470:33]] (3ms) [f:0, t:1, p:0, i:0, b:0]
(39760) [BG(1)]                                         Re ["self.data.getformat" (lightgbm.basic) [2455:33]] (568ms) [f:0, t:1, p:0, i:0, b:0]
(39760) [BG(1)]                                       Re ["self.data.iloc[self.used_indic <shortened> " (lightgbm.basic) [2323:33]] (10856ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10856ms)
(39760) [BG(1)]                                     Re ["self.get_data" (lightgbm.basic) [1805:25]] (10882ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10882ms)
(39760) [BG(1)]                                   Re ["self.set_group" (lightgbm.basic) [1807:25]] (10882ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10882ms)
(39760) [BG(1)]                                 Re ["self.get_label" (lightgbm.basic) [1808:24]] (10884ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10884ms)
(39760) [BG(1)]                               Re ["self.reference._predictor" (lightgbm.basic) [1810:96]] (10884ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10884ms)
(39760) [BG(1)]                             Re ["self.get_data" (lightgbm.basic) [1811:25]] (10884ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10884ms)
(39760) [BG(1)]                           Re ["self._set_init_score_by_predic <shortened> " (lightgbm.basic) [1812:25]] (10884ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10884ms)
(39760) [BG(1)]                         Re ["train_set.construct" (lightgbm.basic) [2605:13]] (11435ms) [f:1, t:1, p:2, i:3, b:1]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11435ms)
(39760) [BG(1)]                       Re ["params.update" (lightgbm.basic) [2607:13]] (11440ms) [f:1, t:1, p:2, i:3, b:1]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11440ms)
(39760) [BG(1)]                     Re ["predictor.predict" (lightgbm.basic) [3538:16]] (11630ms) [f:2, t:2, p:2, i:3, b:3]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11630ms)
(39760) [BG(1)]                   Re ["self._Booster.predict" (lightgbm.sklearn) [803:16]] (11632ms) [f:2, t:2, p:2, i:3, b:3]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11632ms)
(39760) [BG(1)]                 Re ["super().predict" (lightgbm.sklearn) [997:18]] (11633ms) [f:2, t:2, p:2, i:3, b:3]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11633ms)
(39760) [BG(1)]               Re ["lgb_model.predict_proba" (detector) [349:37]] (11998ms) [f:5, t:13, p:25, i:5, b:36]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (11998ms)
(39760) [BG(1)]             Re ["gc.collect" (detector) [353:5]] (11999ms) [f:5, t:13, p:25, i:5, b:36]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (11999ms)
(39760) [BG(1)]           Re ["load_npz" (detector) [355:12]] (11999ms) [f:5, t:13, p:25, i:5, b:36]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (11999ms)
(39760) [BG(1)]         Re ["csr_matrix" (detector) [356:12]] (11999ms) [f:5, t:13, p:25, i:5, b:36]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (11999ms)
(39760) [BG(1)]       Re ["lgb_model.predict_proba" (detector) [357:24]] (12088ms) [f:6, t:17, p:31, i:57, b:46]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (12088ms)
(39760) [BG(1)]     Re ["gc.collect" (detector) [362:5]] (12088ms) [f:6, t:17, p:31, i:57, b:46]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (12088ms)
(39760) [BG(1)]   getDeclarationsForNameNode ["format" (detector) [314:23]] (12090ms) [f:6, t:17, p:31, i:57, b:46]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: getDeclarationsForNameNode (12090ms)
(39760) [BG(1)]   getDeclarationsForNameNode ...

Source code example

import pandas as pd
import numpy as np
import lightgbm as lgb
#import xgboost as xgb
from scipy.sparse import vstack, csr_matrix, save_npz, load_npz
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
import gc

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import seaborn as sns
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import Ridge
from sklearn.compose import TransformedTargetRegressor
from sklearn.metrics import median_absolute_error
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_validate
from sklearn.model_selection import RepeatedKFold

gc.enable()

dtypes = {
        'MachineIdentifier':                                    'category',
        'ProductName':                                          'category',
        'EngineVersion':                                        'category',
        'AppVersion':                                           'category',
        'AvSigVersion':                                         'category',
        'IsBeta':                                               'int8',
        'RtpStateBitfield':                                     'float16',
        'IsSxsPassiveMode':                                     'int8',
        'DefaultBrowsersIdentifier':                            'float16',
        'AVProductStatesIdentifier':                            'float32',
        'AVProductsInstalled':                                  'float16',
        'AVProductsEnabled':                                    'float16',
        'HasTpm':                                               'int8',
        'CountryIdentifier':                                    'int16',
        'CityIdentifier':                                       'float32',
        'OrganizationIdentifier':                               'float16',
        'GeoNameIdentifier':                                    'float16',
        'LocaleEnglishNameIdentifier':                          'int8',
        'Platform':                                             'category',
        'Processor':                                            'category',
        'OsVer':                                                'category',
        'OsBuild':                                              'int16',
        'OsSuite':                                              'int16',
        'OsPlatformSubRelease':                                 'category',
        'OsBuildLab':                                           'category',
        'SkuEdition':                                           'category',
        'IsProtected':                                          'float16',
        'AutoSampleOptIn':                                      'int8',
        'PuaMode':                                              'category',
        'SMode':                                                'float16',
        'IeVerIdentifier':                                      'float16',
        'SmartScreen':                                          'category',
        'Firewall':                                             'float16',
        'UacLuaenable':                                         'float32',
        'Census_MDC2FormFactor':                                'category',
        'Census_DeviceFamily':                                  'category',
        'Census_OEMNameIdentifier':                             'float16',
        'Census_OEMModelIdentifier':                            'float32',
        'Census_ProcessorCoreCount':                            'float16',
        'Census_ProcessorManufacturerIdentifier':               'float16',
        'Census_ProcessorModelIdentifier':                      'float16',
        'Census_ProcessorClass':                                'category',
        'Census_PrimaryDiskTotalCapacity':                      'float32',
        'Census_PrimaryDiskTypeName':                           'category',
        'Census_SystemVolumeTotalCapacity':                     'float32',
        'Census_HasOpticalDiskDrive':                           'int8',
        'Census_TotalPhysicalRAM':                              'float32',
        'Census_ChassisTypeName':                               'category',
        'Census_InternalPrimaryDiagonalDisplaySizeInInches':    'float16',
        'Census_InternalPrimaryDisplayResolutionHorizontal':    'float16',
        'Census_InternalPrimaryDisplayResolutionVertical':      'float16',
        'Census_PowerPlatformRoleName':                         'category',
        'Census_InternalBatteryType':                           'category',
        'Census_InternalBatteryNumberOfCharges':                'float32',
        'Census_OSVersion':                                     'category',
        'Census_OSArchitecture':                                'category',
        'Census_OSBranch':                                      'category',
        'Census_OSBuildNumber':                                 'int16',
        'Census_OSBuildRevision':                               'int32',
        'Census_OSEdition':                                     'category',
        'Census_OSSkuName':                                     'category',
        'Census_OSInstallTypeName':                             'category',
        'Census_OSInstallLanguageIdentifier':                   'float16',
        'Census_OSUILocaleIdentifier':                          'int16',
        'Census_OSWUAutoUpdateOptionsName':                     'category',
        'Census_IsPortableOperatingSystem':                     'int8',
        'Census_GenuineStateName':                              'category',
        'Census_ActivationChannel':                             'category',
        'Census_IsFlightingInternal':                           'float16',
        'Census_IsFlightsDisabled':                             'float16',
        'Census_FlightRing':                                    'category',
        'Census_ThresholdOptIn':                                'float16',
        'Census_FirmwareManufacturerIdentifier':                'float16',
        'Census_FirmwareVersionIdentifier':                     'float32',
        'Census_IsSecureBootEnabled':                           'int8',
        'Census_IsWIMBootEnabled':                              'float16',
        'Census_IsVirtualDevice':                               'float16',
        'Census_IsTouchEnabled':                                'int8',
        'Census_IsPenCapable':                                  'int8',
        'Census_IsAlwaysOnAlwaysConnectedCapable':              'float16',
        'Wdft_IsGamer':                                         'float16',
        'Wdft_RegionIdentifier':                                'float16',
        'HasDetections':                                        'int8'
        }

# scikit-learn examples
survey = fetch_openml(data_id=534, as_frame=True)
X = survey.data[survey.feature_names]
X.describe(include="all")
X.head()
y = survey.target.values.ravel()
survey.target.head()
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=42
)
train_dataset = X_train.copy()
train_dataset.insert(0, "WAGE", y_train)
_ = sns.pairplot(train_dataset, kind='reg', diag_kind='kde')
survey.data.info()

categorical_columns = ['RACE', 'OCCUPATION', 'SECTOR',
                       'MARR', 'UNION', 'SEX', 'SOUTH']
numerical_columns = ['EDUCATION', 'EXPERIENCE', 'AGE']

preprocessor = make_column_transformer(
    (OneHotEncoder(drop='if_binary'), categorical_columns),
    remainder='passthrough'
)

model = make_pipeline(
    preprocessor,
    TransformedTargetRegressor(
        regressor=Ridge(alpha=1e-10),
        func=np.log10,
        inverse_func=sp.special.exp10
    )
)
_ = model.fit(X_train, y_train)

mae = median_absolute_error(y_train, y_pred)
string_score = f'MAE on training set: {mae:.2f} $/hour'
y_pred = model.predict(X_test)
mae = median_absolute_error(y_test, y_pred)
string_score += f'\nMAE on testing set: {mae:.2f} $/hour'
fig, ax = plt.subplots(figsize=(5, 5))
plt.scatter(y_test, y_pred)
ax.plot([0, 1], [0, 1], transform=ax.transAxes, ls="--", c="red")
plt.text(3, 20, string_score)
plt.title('Ridge model, small regularization')
plt.ylabel('Model predictions')
plt.xlabel('Truths')
plt.xlim([0, 27])
_ = plt.ylim([0, 27])

feature_names = (model.named_steps['columntransformer']
                      .named_transformers_['onehotencoder']
                      .get_feature_names(input_features=categorical_columns))
feature_names = np.concatenate(
    [feature_names, numerical_columns])

coefs = pd.DataFrame(
    model.named_steps['transformedtargetregressor'].regressor_.coef_,
    columns=['Coefficients'], index=feature_names
)

coefs.plot(kind='barh', figsize=(9, 7))
plt.title('Ridge model, small regularization')
plt.axvline(x=0, color='.5')
plt.subplots_adjust(left=.3)

X_train_preprocessed = pd.DataFrame(
    model.named_steps['columntransformer'].transform(X_train),
    columns=feature_names
)

X_train_preprocessed.std(axis=0).plot(kind='barh', figsize=(9, 7))
plt.title('Features std. dev.')
plt.subplots_adjust(left=.3)

coefs = pd.DataFrame(
    model.named_steps['transformedtargetregressor'].regressor_.coef_ *
    X_train_preprocessed.std(axis=0),
    columns=['Coefficient importance'], index=feature_names
)
coefs.plot(kind='barh', figsize=(9, 7))
plt.title('Ridge model, small regularization')
plt.axvline(x=0, color='.5')
plt.subplots_adjust(left=.3)

cv_model = cross_validate(
    model, X, y, cv=RepeatedKFold(n_splits=5, n_repeats=5),
    return_estimator=True, n_jobs=-1
)
coefs = pd.DataFrame(
    [est.named_steps['transformedtargetregressor'].regressor_.coef_ *
     X_train_preprocessed.std(axis=0)
     for est in cv_model['estimator']],
    columns=feature_names
)
plt.figure(figsize=(9, 7))
sns.swarmplot(data=coefs, orient='h', color='k', alpha=0.5)
sns.boxplot(data=coefs, orient='h', color='cyan', saturation=0.5)
plt.axvline(x=0, color='.5')
plt.xlabel('Coefficient importance')
plt.title('Coefficient importance and its variability')
plt.subplots_adjust(left=.3)

# end of scikit learn example

print('Download Train and Test Data.\n')
train = pd.read_csv('../input/train.csv', dtype=dtypes, low_memory=True)
train['MachineIdentifier'] = train.index.astype('uint32')
test  = pd.read_csv('../input/test.csv',  dtype=dtypes, low_memory=True)
test['MachineIdentifier']  = test.index.astype('uint32')

gc.collect()

print('Transform all features to category.\n')
for usecol in train.columns.tolist()[1:-1]:

    train[usecol] = train[usecol].astype('str')
    test[usecol] = test[usecol].astype('str')
    
    #Fit LabelEncoder
    le = LabelEncoder().fit(
            np.unique(train[usecol].unique().tolist()+
                      test[usecol].unique().tolist()))

    #At the end 0 will be used for dropped values
    train[usecol] = le.transform(train[usecol])+1
    test[usecol]  = le.transform(test[usecol])+1

    agg_tr = (train
              .groupby([usecol])
              .aggregate({'MachineIdentifier':'count'})
              .reset_index()
              .rename({'MachineIdentifier':'Train'}, axis=1))
    agg_te = (test
              .groupby([usecol])
              .aggregate({'MachineIdentifier':'count'})
              .reset_index()
              .rename({'MachineIdentifier':'Test'}, axis=1))

    agg = pd.merge(agg_tr, agg_te, on=usecol, how='outer').replace(np.nan, 0)
    #Select values with more than 1000 observations
    agg = agg[(agg['Train'] > 1000)].reset_index(drop=True)
    agg['Total'] = agg['Train'] + agg['Test']
    #Drop unbalanced values
    agg = agg[(agg['Train'] / agg['Total'] > 0.2) & (agg['Train'] / agg['Total'] < 0.8)]
    agg[usecol+'Copy'] = agg[usecol]

    train[usecol] = (pd.merge(train[[usecol]], 
                              agg[[usecol, usecol+'Copy']], 
                              on=usecol, how='left')[usecol+'Copy']
                     .replace(np.nan, 0).astype('int').astype('category'))

    test[usecol]  = (pd.merge(test[[usecol]], 
                              agg[[usecol, usecol+'Copy']], 
                              on=usecol, how='left')[usecol+'Copy']
                     .replace(np.nan, 0).astype('int').astype('category'))

    del le, agg_tr, agg_te, agg, usecol
    gc.collect()
          
y_train = np.array(train['HasDetections'])
train_ids = train.index
test_ids  = test.index

del train['HasDetections'], train['MachineIdentifier'], test['MachineIdentifier']
gc.collect()

print("If you don't want use Sparse Matrix choose Kernel Version 2 to get simple solution.\n")

print('--------------------------------------------------------------------------------------------------------')
print('Transform Data to Sparse Matrix.')
print('Sparse Matrix can be used to fit a lot of models, eg. XGBoost, LightGBM, Random Forest, K-Means and etc.')
print('To concatenate Sparse Matrices by column use hstack()')
print('Read more about Sparse Matrix https://docs.scipy.org/doc/scipy/reference/sparse.html')
print('Good Luck!')
print('--------------------------------------------------------------------------------------------------------')

#Fit OneHotEncoder
ohe = OneHotEncoder(categories='auto', sparse=True, dtype='uint8').fit(train)

#Transform data using small groups to reduce memory usage
m = 100000
train = vstack([ohe.transform(train[i*m:(i+1)*m]) for i in range(train.shape[0] // m + 1)])
test  = vstack([ohe.transform(test[i*m:(i+1)*m])  for i in range(test.shape[0] // m +  1)])
save_npz('train.npz', train, compressed=True)
save_npz('test.npz',  test,  compressed=True)

del ohe, train, test
gc.collect()

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
skf.get_n_splits(train_ids, y_train)

lgb_test_result  = np.zeros(test_ids.shape[0])
lgb_train_result = np.zeros(train_ids.shape[0])
#xgb_test_result  = np.zeros(test_ids.shape[0])
#xgb_train_result = np.zeros(train_ids.shape[0])
counter = 0

print('\nLightGBM\n')

for train_index, test_index in skf.split(train_ids, y_train):
    
    print('Fold {}\n'.format(counter + 1))
    
    train = load_npz('train.npz')
    X_fit = vstack([train[train_index[i*m:(i+1)*m]] for i in range(train_index.shape[0] // m + 1)])
    X_val = vstack([train[test_index[i*m:(i+1)*m]]  for i in range(test_index.shape[0] //  m + 1)])
    X_fit, X_val = csr_matrix(X_fit, dtype='float32'), csr_matrix(X_val, dtype='float32')
    y_fit, y_val = y_train[train_index], y_train[test_index]
    
    del train
    gc.collect()

    lgb_model = lgb.LGBMClassifier(max_depth=-1,
                                   n_estimators=30000,
                                   learning_rate=0.05,
                                   num_leaves=2**12-1,
                                   colsample_bytree=0.28,
                                   objective='binary', 
                                   n_jobs=-1)
                                   
    #xgb_model = xgb.XGBClassifier(max_depth=6,
    #                              n_estimators=30000,
    #                              colsample_bytree=0.2,
    #                              learning_rate=0.1,
    #                              objective='binary:logistic', 
    #                              n_jobs=-1)
    
                               
    lgb_model.fit(X_fit, y_fit, eval_metric='auc', 
                  eval_set=[(X_val, y_val)], 
                  verbose=100, early_stopping_rounds=100)
                  
    #xgb_model.fit(X_fit, y_fit, eval_metric='auc', 
    #              eval_set=[(X_val, y_val)], 
    #              verbose=1000, early_stopping_rounds=300)

    lgb_train_result[test_index] += lgb_model.predict_proba(X_val)[:,1]
    #xgb_train_result[test_index] += xgb_model.predict_proba(X_val)[:,1]
    
    del X_fit, X_val, y_fit, y_val, train_index, test_index
    gc.collect()
    
    test = load_npz('test.npz')
    test = csr_matrix(test, dtype='float32')
    lgb_test_result += lgb_model.predict_proba(test)[:,1]
    #xgb_test_result += xgb_model.predict_proba(test)[:,1]
    counter += 1
    
    del test
    gc.collect()
    
    #Stop fitting to prevent time limit error
    #if counter == 3 : break

print('\nLigthGBM VAL AUC Score: {}'.format(roc_auc_score(y_train, lgb_train_result)))
#print('\nXGBoost VAL AUC Score: {}'.format(roc_auc_score(y_train, xgb_train_result)))

submission = pd.read_csv('../input/sample_submission.csv')
submission['HasDetections'] = lgb_test_result / counter
submission.to_csv('lgb_submission.csv', index=False)
#submission['HasDetections'] = xgb_test_result / counter
#submission.to_csv('xgb_submission.csv', index=False)
#submission['HasDetections'] = 0.5 * lgb_test_result / counter  + 0.5 * xgb_test_result / counter 
#submission.to_csv('lgb_xgb_submission.csv', index=False)

print('\nDone.')

import pytz
from datetime import datetime

# assuming now contains a timezone aware datetime
pactz = pytz.timezone('America/Los_Angeles')
loc_dt = pactz.localize(datetime(2019, 10, 27, 6, 0, 0))
utcnow = pytz.utc
print(pytz.all_timezones)
dt = datetime(2019, 10, 31, 23, 30)
print (pactz.utcoffset(dt, is_dst=True))


def do_plotly():
    import plotly.graph_objs as go
    fig = go.Figure()
    fig.add_scatter

@bschnurr
Copy link
Member

bschnurr commented May 2, 2023

I'll remove the bundled stubs when the next version of lightGBM, with type annotations, is released.

@jameslamb
Copy link
Collaborator

Ok sure, thanks! Sorry, I would have probably been watching that microsoft/python-type-stubs repo if I'd known about it.

If you're interested in improving LightGBM's typing (or anything else), we'd also welcome any contributions you'd like to make here and would be happy to help with the process.

@bschnurr
Copy link
Member

bschnurr commented May 2, 2023

my stubs where generated using pylance/pyright by adding # pyright: reportMissingTypeStubs=true
https://microsoft.github.io/pyright/#/type-stubs?id=generating-type-stubs

@bschnurr
Copy link
Member

bschnurr commented May 2, 2023

you can also use pyright to verify your type completeness of your public api (--verifytypes ). https://microsoft.github.io/pyright/#/command-line?id=pyright-command-line-options

@Avasam
Copy link
Author

Avasam commented Aug 6, 2023

@bschnurr With the release of 4.0.0, it seems LightGBM's types seem complete enough to obsolete https://github.com/microsoft/python-type-stubs/tree/main/stubs/lightgbm-stubs . Not only has a lot of variables been removed or renamed (ie: a handful are no longer public), the only area I can see the inline type hints not bein on par, at a glance, are some non-public method parameters using the Any type (they're not part of the public API anyway, os whatever) and the lack of ndarray generic type in https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/basic.py compared to https://github.com/microsoft/python-type-stubs/blob/main/stubs/lightgbm-stubs/basic.pyi . Which feels acceptable to me.

@jameslamb Are you aware of any major issue with the state of type hints in the currently published version?

@jameslamb
Copy link
Collaborator

Sorry for the delayed response. It appears that @bschnurr just went ahead and removed those stubs about 2 weeks ago: microsoft/python-type-stubs#294

So I guess this discussion about whether or not they should be removed can be closed.

the lack of ndarray generic type

We'd welcome a pull request fixing this if you're interested in contributing!

Are you aware of any major issue with the state of type hints in the currently published version?

I'm not aware of any in the public API that are incorrect. There are certainly some cases where the type hints could be more specific (e.g. where they're using implicit Any, the numpy topic you mentioned). We'd welcome contributions on #3756 and #3867 if it's something your interested in improving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants