Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements #183

Open
Citugulia40 opened this issue Jul 25, 2023 · 8 comments

Comments

@Citugulia40
Copy link

Citugulia40 commented Jul 25, 2023

Hi,

I am not able to run this code as I am having an error and I am not able to understand this

import pyranges as pr
from pycistarget.utils import region_names_to_coordinates
region_sets = {}
region_sets['topics_otsu'] = {}
region_sets['topics_top_3'] = {}
region_sets['DARs'] = {}
for topic in region_bin_topics_otsu.keys():
    regions = region_bin_topics_otsu[topic].index[region_bin_topics_otsu[topic].index.str.startswith('chr')] #only keep regions on known chromosomes
    region_sets['topics_otsu'][topic] = pr.PyRanges(region_names_to_coordinates(regions))
for topic in region_bin_topics_top3k.keys():
    regions = region_bin_topics_top3k[topic].index[region_bin_topics_top3k[topic].index.str.startswith('chr')] #only keep regions on known chromosomes
    region_sets['topics_top_3'][topic] = pr.PyRanges(region_names_to_coordinates(regions))
for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))`
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[38], line 15
     13 for DAR in markers_dict.keys():
     14     regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
---> 15     region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/utils.py:33, in region_names_to_coordinates(region_names)
     31 regiondf=pd.concat([chrom, start, end], axis=1, sort=False)
     32 regiondf.index=[i for i in region_names if ':' in i]
---> 33 regiondf.columns=['Chromosome', 'Start', 'End']
     34 return(regiondf)

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/generic.py:5920, in NDFrame.__setattr__(self, name, value)
   5918 try:
   5919     object.__getattribute__(self, name)
-> 5920     return object.__setattr__(self, name, value)
   5921 except AttributeError:
   5922     pass

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/_libs/properties.pyx:69, in pandas._libs.properties.AxisProperty.__set__()

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/generic.py:822, in NDFrame._set_axis(self, axis, labels)
    820 def _set_axis(self, axis: int, labels: AnyArrayLike | list) -> None:
    821     labels = ensure_index(labels)
--> 822     self._mgr.set_axis(axis, labels)
    823     self._clear_item_cache()

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/managers.py:228, in BaseBlockManager.set_axis(self, axis, new_labels)
    226 def set_axis(self, axis: int, new_labels: Index) -> None:
    227     # Caller is responsible for ensuring we have an Index object.
--> 228     self._validate_set_axis(axis, new_labels)
    229     self.axes[axis] = new_labels

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/base.py:70, in DataManager._validate_set_axis(self, axis, new_labels)
     67     pass
     69 elif new_len != old_len:
---> 70     raise ValueError(
     71         f"Length mismatch: Expected axis has {old_len} elements, new "
     72         f"values have {new_len} elements"
     73     )

ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements`

Please help me

Thanks

@SeppeDeWinter
Copy link
Collaborator

Hi @Citugulia40

One of your region set is probably empty, you can check which one by running:

for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    if len(regions) == 0:
      print(DAR)

To be sure can you also show the output of:

markers_dict

Best,

Seppe

@Citugulia40
Copy link
Author

Citugulia40 commented Jul 28, 2023

Hi, The output for these are as follows:

for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    if len(regions) == 0:
      print(DAR)

Inhibitory
Microglia-PVM
OPC
Oligodendrocyte
VLMC

markers_dict

{'Astrocyte': Log2FC Adjusted_pval Contrast
chr1:2073547-2074047 0.674378 4.645593e-86 Astrocyte
chr1:2132169-2132669 0.674378 4.645593e-86 Astrocyte
chr1:2138117-2138617 0.674378 4.645593e-86 Astrocyte
chr1:2230690-2231190 0.674378 4.645593e-86 Astrocyte
chr1:7623862-7624362 0.674378 4.645593e-86 Astrocyte
... ... ... ...
chr13:97976181-97976681 0.588829 4.273270e-86 Astrocyte
chr1:211579050-211579550 0.586471 4.273270e-86 Astrocyte
chr3:136752192-136752692 0.586471 4.273270e-86 Astrocyte
chr19:50329086-50329586 0.586471 4.273270e-86 Astrocyte
chr20:35771784-35772284 0.586471 4.273270e-86 Astrocyte

[13580 rows x 3 columns],
'Endothelial': Log2FC Adjusted_pval Contrast
GL000205.2:1206-1706 0.591258 0.000001 Endothelial
chr1:1300489-1300989 0.591258 0.000001 Endothelial
chr1:1989119-1989619 0.591258 0.000001 Endothelial
chr1:2301001-2301501 0.591258 0.000001 Endothelial
chr1:3031465-3031965 0.591258 0.000001 Endothelial
... ... ... ...
chrY:16653321-16653821 0.586501 0.000001 Endothelial
chrY:17022248-17022748 0.586501 0.000001 Endothelial
chrY:17150978-17151478 0.586501 0.000001 Endothelial
chrY:19557739-19558239 0.586501 0.000001 Endothelial
chrY:21603769-21604269 0.586501 0.000001 Endothelial

[8015 rows x 3 columns],
'Excitatory': Log2FC Adjusted_pval Contrast
KI270728.1:988400-988900 0.662772 1.912565e-15 Excitatory
chr1:4722163-4722663 0.662772 1.912565e-15 Excitatory
chr1:5408334-5408834 0.662772 1.912565e-15 Excitatory
chr1:6950642-6951142 0.662772 1.912565e-15 Excitatory
chr1:8636339-8636839 0.662772 1.912565e-15 Excitatory
... ... ... ...
chr17:40177311-40177811 0.587594 1.912565e-15 Excitatory
chr10:23095254-23095754 0.586441 1.912565e-15 Excitatory
chr11:74171006-74171506 0.586441 1.912565e-15 Excitatory
chr16:57486140-57486640 0.585518 1.912565e-15 Excitatory
chr7:23105439-23105939 0.585184 1.912565e-15 Excitatory

[14021 rows x 3 columns],
'Inhibitory': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [],
'Microglia-PVM': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [],
'OPC': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [],
'Oligodendrocyte': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [],
'VLMC': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: []}

Most of the cell types have empty data frame.
But how to handle it?

@SeppeDeWinter
Copy link
Collaborator

Hi @Citugulia40

It is these cell types with empty dataframes that is causing the issues.

Either remove these from the motif enrichment analysis or set less strict parameters for calling DARs.

Best,

Seppe

@Citugulia40
Copy link
Author

Thanks of the solution.

I have removed the empty cells and was able to run the pipeline but at the end I am still getting an error

from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
    run_scenicplus(
        scplus_obj = scplus_obj,
        variable = ['GEX_cell_type'],
        species = 'hsapiens',
        assembly = 'hg38',
        tf_file = 'regulation/data/utoronto_human_tfs_v_1.01.txt',
        save_path = os.path.join(work_dir, 'scenicplus'),
        biomart_host = biomart_host,
        upstream = [1000, 150000],
        downstream = [1000, 150000],
        calculate_TF_eGRN_correlation = True,
        calculate_DEGs_DARs = True,
        export_to_loom_file = True,
        export_to_UCSC_file = True,
        path_bedToBigBed = 'regulation',
        n_cpu = 12,
        _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
except Exception as e:
    #in case of failure, still save the object
    dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
    raise(e)
2023-08-01 10:04:50,847 SCENIC+_wrapper INFO     regulation/scenicplus folder already exists.
2023-08-01 10:04:50,849 SCENIC+_wrapper INFO     Calculating TF-eGRNs AUC correlation
2023-08-01 10:04:55,932 SCENIC+_wrapper INFO     Making eGRNs AUC UMAP
2023-08-01 10:04:57,254 SCENIC+_wrapper INFO     Making eGRNs AUC tSNE
2023-08-01 10:04:59,227 SCENIC+_wrapper INFO     Calculating eRSS
2023-08-01 10:05:00,345 SCENIC+_wrapper INFO     Calculating DEGs/DARs
2023-08-01 10:05:00,347 SCENIC+      INFO     Calculating DEGs for variable GEX_cell_type
2023-08-01 10:05:00,945 SCENIC+      INFO     There are 4357 variable features
2023-08-01 10:05:01,172 SCENIC+      INFO     Finished calculating DEGs for variable GEX_cell_type
2023-08-01 10:05:01,174 SCENIC+      INFO     Calculating DARs for variable GEX_cell_type
2023-08-01 10:05:01,377 SCENIC+      INFO     There are 5325 variable features
2023-08-01 10:05:01,652 SCENIC+      INFO     Finished calculating DARs for variable GEX_cell_type
2023-08-01 10:05:01,654 SCENIC+_wrapper INFO     Exporting to loom file
2023-08-01 10:05:01,655 SCENIC+      INFO     Formatting data
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[51], line 23
     20 except Exception as e:
     21     #in case of failure, still save the object
     22     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
---> 23     raise(e)

Cell In[51], line 3
      1 from scenicplus.wrappers.run_scenicplus import run_scenicplus
      2 try:
----> 3     run_scenicplus(
      4         scplus_obj = scplus_obj,
      5         variable = ['GEX_cell_type'],
      6         species = 'hsapiens',
      7         assembly = 'hg38',
      8         tf_file = 'regulation/data/utoronto_human_tfs_v_1.01.txt',
      9         save_path = os.path.join(work_dir, 'scenicplus'),
     10         biomart_host = biomart_host,
     11         upstream = [1000, 150000],
     12         downstream = [1000, 150000],
     13         calculate_TF_eGRN_correlation = True,
     14         calculate_DEGs_DARs = True,
     15         export_to_loom_file = True,
     16         export_to_UCSC_file = True,
     17         path_bedToBigBed = 'regulation',
     18         n_cpu = 12,
     19         _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
     20 except Exception as e:
     21     #in case of failure, still save the object
     22     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)

File /data2/ccitu/software/scenicplus/src/scenicplus/wrappers/run_scenicplus.py:323, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
    321 if export_to_loom_file is True:
    322     log.info('Exporting to loom file')
--> 323     export_to_loom(scplus_obj, 
    324            signature_key = 'Gene_based',
    325            tree_structure = tree_structure,
    326            title =  'Gene based eGRN',
    327            nomenclature = assembly,
    328            out_fname=os.path.join(save_path,'SCENIC+_gene_based.loom'))
    329     export_to_loom(scplus_obj, 
    330            signature_key = 'Region_based',
    331            tree_structure = tree_structure,
    332            title =  'Region based eGRN',
    333            nomenclature = assembly,
    334            out_fname=os.path.join(save_path,'SCENIC+_region_based.loom'))
    336 if export_to_UCSC_file is True:

File /data2/ccitu/software/scenicplus/src/scenicplus/loom.py:141, in export_to_loom(scplus_obj, signature_key, out_fname, eRegulon_metadata_key, auc_key, auc_thr_key, keep_direct_and_extended_if_not_direct, selected_features, selected_cells, cluster_annotation, tree_structure, title, nomenclature)
    138 auc_mtx = scplus_obj.uns[auc_key][signature_key].loc[cell_names]
    139 auc_mtx.columns = [re.sub('_\(.*\)', '', x)
    140                    for x in auc_mtx.columns]
--> 141 auc_thresholds = scplus_obj.uns[auc_thr_key][signature_key]
    142 auc_thresholds.index = [re.sub('_\(.*\)', '', x)
    143                         for x in auc_thresholds.index]
    144 if auc_mtx.shape[1] > 900 and keep_direct_and_extended_if_not_direct is False:

KeyError: 'Gene_based'

Please help.

Thanks

@SeppeDeWinter
Copy link
Collaborator

Hi @Citugulia40

Sorry for the late response.

Seems like something went wrong with thresholding the AUC values. You can skip generating the loom file for now export_to_loom_file = False.

Your object should still be fine though, including all results. see: os.path.join(work_dir, 'scenicplus/scplus_obj.pkl')

Best,

Seppe

@Citugulia40
Copy link
Author

Thanks @SeppeDeWinter for the recommendations!

@Oliversinn
Copy link

Oliversinn commented Jun 30, 2024

I am facing the same issue but when I run the pipeline using snakemake and I already removed the empty BED files. In the snakemake config.yml the param export_to_loom_file doesn't exist. Any workaround for this case?

@SeppeDeWinter
Copy link
Collaborator

Hi @Oliversinn

What is your error exactly? The Snakemake pipeline does not export any loom files ...

Best,

Seppe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants