スパムメール コンペ EDA

スパムメール コンペ EDA

  • 配布されたデータを簡単に可視化・分析してみました.皆さんの参考になれば幸いです.
  • 意見・アイディア等ございましたらお気軽にコメントください.
  • 動作はGoogle Colaboratory上で確認しましたが,必要なライブラリをインポートすればローカルでも動くと思います.

参考

このnotebook作成にあたり以下のサイトを参考にしました.

!pip install transformers
%cd "/content/drive/My Drive/Colab Notebooks/Competition/ProbSpace/Spam mail"
/content/drive/My Drive/Colab Notebooks/Competition/ProbSpace/Spam mail

ライブラリのインポート,ファイルのロードなど

from collections import Counter
import os
import random
import string

import matplotlib.pyplot as plt
import nltk
from nltk.corpus import stopwords
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objects as go
import seaborn as sns
import torch
from transformers import AutoTokenizer
from wordcloud import STOPWORDS, WordCloud

import warnings
warnings.filterwarnings('ignore')
SEED = 42
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)

seed_everything(SEED)
TRAIN_FILE = "./data/train_data.csv"
TEST_FILE = "./data/test_data.csv"
MODEL_NAME = 'bert-base-uncased'

train_df = pd.read_csv(TRAIN_FILE)
test_df = pd.read_csv(TEST_FILE)
all_df = pd.concat([train_df, test_df])

not_spam = "0: not spam"
spam = "1: spam"

train_df.loc[train_df["y"] == 0, "y"] = not_spam
train_df.loc[train_df["y"] == 1, "y"] = spam

データの確認

基本情報

print(train_df.shape)
print(test_df.shape)
(8878, 3)
(24838, 2)

trainデータは8878個,testデータは24838個です

print(train_df.info())
print()
print(test_df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8878 entries, 0 to 8877
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        8878 non-null   int64 
 1   contents  8878 non-null   object
 2   y         8878 non-null   object
dtypes: int64(1), object(2)
memory usage: 208.2+ KB
None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24838 entries, 0 to 24837
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        24838 non-null  int64 
 1   contents  24838 non-null  object
dtypes: int64(1), object(1)
memory usage: 388.2+ KB
None

trainデータ,testデータともに欠損値はありません

train_df
id contents y
0 1 Subject: re : fw : willis phillips\r\ni just s... 0: not spam
1 2 Subject: re : factor loadings for primary curv... 0: not spam
2 3 Subject: re : meridian phone for kate symes\r\... 0: not spam
3 4 Subject: re : october wellhead\r\nvance ,\r\nd... 0: not spam
4 5 Subject: california 6 / 13\r\nexecutive summar... 0: not spam
... ... ... ...
8873 8874 Subject: uk rpi model\r\nhi zimin !\r\nplease ... 0: not spam
8874 8875 Subject: new sitara desk request\r\nthis needs... 0: not spam
8875 8876 Subject: enterprise risk management\r\ndear vi... 0: not spam
8876 8877 Subject: re : the spreadsheet for talon deal\r... 0: not spam
8877 8878 Subject: re : tenaska iv march 2001\r\nyes , t... 0: not spam

8878 rows × 3 columns

test_df
id contents
0 1 Subject: re : weather and energy price data\r\...
1 2 Subject: organizational study\r\ngpg and eott ...
2 3 Subject: re [ 7 ] : talk about our meds\r\nske...
3 4 Subject: report about your cable service\r\nhi...
4 5 Subject: start date : 1 / 26 / 02 ; hourahead ...
... ... ...
24833 24834 Subject: savvy players would be wise to | 0 ad...
24834 24835 Subject: it ' s mariah from dating service\r\n...
24835 24836 Subject: meter 9699\r\njackie -\r\ni cannot fo...
24836 24837 Subject: presentation for cal berkeley\r\nhell...
24837 24838 Subject: we are giving away ipod mini ' s !\r\...

24838 rows × 2 columns

重複の確認

train_df[["contents", "y"]].describe()
contents y
count 8878 8878
unique 8675 2
top Subject: calpine daily gas nomination\r\n>\r\n... 0: not spam
freq 13 8707

contentsにおいてcountに比べてuniqueが少ないので一部重複があることがわかります
最大で13個の重複があります

test_df[["contents"]].describe()
contents
count 24838
unique 22147
top Subject: \r\n
freq 51

同様にtestデータにもcontentsの重複があります
特にSubject: \r\nは51個も含まれています

print('contentsが重複:', len(train_df[train_df.duplicated(subset=["contents"])]))
print('contentsとyがどちらも重複:', len(train_df[train_df.duplicated(subset=["contents", "y"])]))
train_df[train_df.duplicated(subset=["contents", "y"])]
contentsが重複: 203
contentsとyがどちらも重複: 203
id contents y
624 625 Subject: calpine daily gas nomination\r\n>\r\n... 0: not spam
764 765 Subject: calpine daily gas nomination\r\n>\r\n... 0: not spam
855 856 Subject: fw : rahil jafry : carly fiorina tops... 0: not spam
865 866 Subject: enron japan weekly update\r\nhello ej... 0: not spam
994 995 Subject: attention : changes in remote access\... 0: not spam
... ... ... ...
8789 8790 Subject: entouch newsletter\r\nbusiness highli... 0: not spam
8840 8841 Subject: re : global risk management operation... 0: not spam
8869 8870 Subject: thanks for the offsite\r\nthank you f... 0: not spam
8872 8873 Subject: salary increase for logistics schedul... 0: not spam
8874 8875 Subject: new sitara desk request\r\nthis needs... 0: not spam

203 rows × 3 columns

trainデータにおいて,contentsが重複している数と,contentsとyがどちらも重複している数が一致しているので,同一テキストで異なるラベルが付与されたものはないことが確認できます

train_value_counts = train_df["contents"].value_counts()
train_value_counts[train_value_counts>=2]
Subject: calpine daily gas nomination\r\n>\r\nricky a . archer\r\nfuel supply\r\n700 louisiana , suite 2700\r\nhouston , texas 77002\r\n713 - 830 - 8659 direct\r\n713 - 830 - 8722 fax\r\n- calpine daily gas nomination 1 . doc                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               13
Subject: formation of enron management committee\r\ni am pleased to announce the formation of the enron management committee . the management committee comprises our business unit and function leadership and will focus on the key management , strategy , and policy issues facing enron . the management committee will supplant the former policy committee and will include the following individuals :\r\nken lay - chairman and ceo , enron corp .\r\nray bowen - coo , enron industrial markets\r\nmichael brown - coo , enron europe\r\nrick buy - exec vp & chief risk officer , enron corp .\r\nrick causey - exec vp & chief accounting officer , enron corp .\r\ndave delainey - chairman and ceo , enron energy services\r\njim derrick - exec vp & general counsel , enron corp .\r\njanet dietrich - president , enron energy services\r\njim fallon - president & ceo , enron broadband services\r\nandy fastow - exec vp & cfo , enron corp .\r\nmark frevert - chairman & ceo , enron wholesale services\r\nben glisan - managing director & treasurer , enron corp .\r\nmark haedicke - managing director & general counsel , enron wholesale services\r\nkevin hannon - ceo , enron global assets & services\r\nstan horton - chairman & ceo , enron transportation services\r\njim hughes - president & coo , enron global assets & services\r\nsteve kean - exec . vp & chief of staff , enron corp .\r\nlouise kitchen - coo , enron americas\r\nmark koenig - exec vp , investor relations , enron corp .\r\njohn lavorato - president & ceo , enron americas\r\nmike mcconnell - president & ceo , enron global markets\r\njeff mcmahon - president & ceo , enron industrial markets\r\njeff shankman - coo , enron global markets\r\njohn sherriff - president & ceo , enron europe\r\ngreg whalley - president & coo , enron wholesale services\r\nafter we convene the management committee later this week , i will make a further announcement regarding the executive committee .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              3
Subject: important video announcement\r\ni have a very important video announcement about the future of our company . please go to to access the video . thank you .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             3
Subject: enron net works\r\nit is becoming increasingly clear that the development of ecommerce will have\r\na significant and continuing impact on the conduct of business in a broad\r\narray of industries . through enrononline , enron has quickly become a major\r\ncatalyst for the transition to the web in the gas and electric industries .\r\nenrononline has been an enormous success since its launch . since launch , we\r\nhave completed 67 , 043 transactions on line , with a total dollar value of over\r\n$ 25 billion . enrononline is now the largest ecommerce site in the world .\r\nwe believe that the competitive success of enrononline is due to one very\r\nspecific reason . in addition to providing a web - based platform for\r\ntransactions , enron acts as principal to provide direct liquidity to the\r\nsite . we stand ready at all times , in any market conditions , to buy and sell\r\nat the posted price . this converts a  & bulletin board  8 ( the more typical\r\necommerce concept ) into a true market . there are very few , if any ,\r\ncompetitors that can provide this capability .\r\nwe are increasingly convinced that this competitive advantage can be\r\ndramatically expanded to other products and other geographies . if we are\r\ncorrect , this could provide an enormous new opportunity for growth for enron .\r\naccordingly , we are initiating a major new effort to capture this\r\nopportunity . effective today we are creating a new business , enron net\r\nworks , to pursue new market development opportunities in ecommerce across a\r\nbroad range of industries . it is likely that this business will ultimately\r\nbe our fifth business segment , joining transmission mike\r\nmcconnell , chief operating officer ; and jeff mcmahon , chief commercial\r\nofficer . these individuals will comprise the office of the chairman for\r\nenron net works and remain on the executive committee of enron corp .\r\nreplacing greg whalley as president and chief operating officer of enron\r\nnorth america is dave delainey , who will also join enron  , s executive\r\ncommittee .\r\nglobal technology will remain intact but will now be a part of enron net\r\nworks . it will maintain all of the same businesses and services as it did as\r\nan enron global function . philippe bibi will remain the chief technology\r\nofficer for all of enron corp . and continues to be responsible for the\r\ndevelopment of worldwide technology standards and platforms .\r\nenrononline , headed by louise kitchen , will also remain intact and will now\r\nbe a part of enron net works . the success of enrononline enables us to\r\nutilize this site as a model as we explore other markets . in addition , the\r\nfollowing individuals are included in enron net works along with their\r\ncurrent ecommerce initiatives : harry arora , public financial securities ; jay\r\nfitzgerald , new markets identification ; bruce garner , metals ; and greg piper ,\r\npulp and paper .\r\nover the next several weeks we will complete staffing and organizational\r\ndesign and will provide full details on this exciting new business\r\nopportunity .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       3
Subject: capital book\r\nto further the process of reaching the stated objectives of increasing enron america ' s velocity of capital and associated return on invested capital , we have decided to create a capital book . the capital book will have no profit target associated with it and will be managed by joe deffner . the purpose of creating this book is to ensure that all transactions within enron americas , with any form of capital requirement , are structured correctly and are allocated the appropriate cost of capital charge .\r\nthe previous numbers used in the business plans at the beginning of this year will remain for all transactions in place and where we hold assets . therefore , on any assets currently held within each business area , the capital charge will remain at 15 % . internal ownership of these assets will be maintained by the originating business unit subject to the internal ownership policy outlined below .\r\nthe cost of capital associated with all transactions in enron americas will be set by joe . this process is separate and apart from the current rac process for transactions which will continue unchanged .\r\ncapital investments on balance sheet will continue to accrue a capital charge at the previously established rate of 15 % . transactions which are structured off credit will receive a pure market pass through of the actually incurred cost of capital as opposed to the previous 15 % across the board charge . transactions which are structured off balance sheet , but on credit will be priced based upon the financial impact on enron america ' s overall credit capacity .\r\non transactions that deploy capital through the trading books , the capital book will take a finance reserve on each transaction , similar to the way the credit group takes a credit reserve . this finance reserve will be used specifically to fund the capital required for the transaction . as noted above , the capital book will have no budget and will essentially charge out to the origination and trading groups at actual cost .\r\nby sending market - based capital pricing signals internally , enron america ' s sources of capital and liquidity should be better optimized across the organization .\r\nquestions regarding the capital book can be addressed to :\r\njoe deffner 853 - 7117\r\nalan quaintance 345 - 7731                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ..
Subject: enron mentions\r\nthe rise and fall of enron\r\nthe new york times - 11 / 02 / 01\r\nthe enron debacle\r\nbusiness week - 11 / 02 / 01\r\nderivatives danger ?\r\nbusiness week - 11 / 02 / 01\r\ncommentary : enron ' s clout won ' t sway the sec\r\nbusiness week - 11 / 02 / 01\r\nafl - cio , amalgamated bank call on enron to redirect the mission of newly created special committee : outside directors must adopt independent role\r\npr newswire - 11 / 02 / 01\r\nwilliams ceo sees validation of co ' s bandwidth model\r\ndow jones energy service - 11 / 02 / 01\r\nusa : enron shares plunge after s section a\r\nthe rise and fall of enron\r\n11 / 02 / 2001\r\nthe new york times\r\npage 24 , column 1\r\nc . 2001 new york times company\r\nearlier this year , most companies would have loved to have enron ' s problems . californians resented the energy trading company ' s huge profits during their energy crisis , and democrats in washington raised questions about enron ' s influence within the white house and about the cozy relationship between enron ' s chairman , kenneth lay , and vice president dick cheney . nobody seemed better positioned to thrive during the bush presidency than this houston - based apostle of deregulation .\r\nwall street was impressed with enron ' s strategy of swooping into formerly regulated markets to broker contracts for natural gas , electricity or unused telecom bandwidth . the company was celebrated as a paragon of american ingenuity , a stodgy gas pipeline company that had reinvented itself as a high - tech clearinghouse in an ever - expanding roster of markets . enron ' s push to force utilities into the internet age with its online trading systems , at a seemingly handsome profit , became an epic tale of the dot - com revolution .\r\nit now appears that enron ' s tale may be more cautionary than epic . enron envy has crashed , along with the company ' s stock price , as serious questions emerge about its bookkeeping . enron disclosed earlier this month that $ 1 . 2 billion in market value had vanished as a result of a controversial deal it entered into with private partnerships run by its chief financial officer , andrew fastow .\r\nmost alarming was enron ' s reluctance to shed light on management ' s wheeling and dealing . ' ' related - party transactions , ' ' as the accountants call them , are fraught with conflicts of interest . though much remains to be learned about these transactions , their scope and lack of transparency suggest that enron may have in effect created its own private hedge fund to assume some of the risk and mask the losses of its complex trading . the extent to which company insiders profited from the partnerships is not yet clear .\r\nenron has scrambled to dampen wall street ' s concerns , acknowledging its credibility problem while insisting on the health of its core businesses . on wednesday it brought in william powers , the dean of the university of texas school of law , to review the transactions . the securities and exchange commission has launched its own formal investigation . mr . fastow was forced to resign , following jeffrey skilling , the man credited with driving enron into new cutting - edge businesses , out the door .\r\nenron ' s former admirers on wall street , mindful of recent scandals involving high - profile companies doctoring their earnings , and of the spectacular collapse of the long - term capital management hedge fund in 1998 , are alarmed . carole coale of prudential securities summed up the prevailing sentiment when she told the times : ' ' the bottom line is , it ' s really difficult to recommend an investment when management does not disclose facts . ' ' analysts , as well as the media , are not entirely blameless . enron did mention , albeit in passing , the troubling related - party deals as early as march 2000 . but few analysts bothered to raise questions at a time when the company ' s revenues , profits and stock price were soaring .\r\nharvey pitt , the new securities and exchange commissioner , must pursue the enron inquiry aggressively in order to assure investors that he will be as vigilant as his predecessor , arthur levitt , when it comes to protecting the integrity of financial markets . indeed , even if enron is cleared of any wrongdoing and regains some of its past luster , as it well might , the company that preaches the merits of self - regulating marketplaces has reminded us all of the need for a strong regulator on wall street .\r\ncopyright ? 2000 dow jones & company , inc . all rights reserved .\r\nnovember 12 , 2001\r\nfinance\r\nbusiness week\r\nthe enron debacle\r\nbyzantine deals have shattered the energy outfit ' s credibility\r\nexecutives at high - flying enron corp . ( ene ) never seemed overly concerned with how the rest of the world viewed their business practices . earlier this year , the california attorney general had to get a court order to collect documents in an industrywide investigation into energy price fixing . and when an analyst challenged former ceo jeffrey k . skilling in a conference call to produce enron ' s balance sheet , skilling called him an " ass - - - - . " still , even some enron executives worried that the company had gone too far with two complex partnerships set up in 1999 to buy company assets and hedge investments . with enron ' s then - chief financial officer acting as general manager of the partnerships and in a position to personally benefit from their investments , the potential for a conflict of interest and backlash from investors seemed overwhelming . " internally , everybody said this is not a good idea , " says a source close to the company .\r\nbut no one could have predicted such a jaw - dropping outcome for the nation ' s largest and most innovative energy trader . since oct . 16 , when enron revealed a $ 35 million charge to earnings to reflect losses on those partnerships and was forced to knock $ 1 . 2 billion off its shareholders ' equity , the company ' s stock has plunged 60 % . the securities its highly ballyhooed business for trading high - speed communications capacity was crippled by the telecom industry meltdown ; and its calamitous foray into the water business with azurix corp . has already cost enron at least $ 574 million in write - offs .\r\nthese bad bets and the expensive ljm shock have investors worried about what else might be lurking at enron . many aren ' t sticking around to find out . " i think the lack of disclosure on their financial engineering killed the credibility of the management team , " says richard a . giesen , who manages the munder power plus fund , which dumped its enron shares about a month ago .\r\nit ' s not clear just how many off - balance - sheet financing vehicles enron has used over time . some were created years ago to finance oil and gas producers . analysts and sources close to the deals say there ' s no particular risk in these to enron shareholders . other interconnected entities , such as whitewing , osprey , atlantic water trust , and marlin water trust , were a way to get assets no longer central to enron ' s strategy off its balance sheet , freeing capital and credit for the core energy business and ventures like broadband ( table ) .\r\nto entice institutional investors such as pension funds and insurers into these deals , enron promised to kick in equity if asset sales weren ' t enough to cover debt . such " mandatory equity " deals have been used by at least a half - dozen others in the energy and telecom industries , including el paso ( epg ) , williams , and dominion , says standard & poor ' s director todd a . shipman . " nobody ' s going to find anything that ' s particularly unique or below board " in such deals , says one investment banker specializing in the energy business .\r\nworst case , which shipman considers unlikely , enron could be on the hook for about $ 3 billion in its mandatory - equity deals . that could mean diluting its shares by more than 25 % at today ' s prices . analyst john e . olson at sanders morris harris inc . ( smhg ) figures a 9 % dilution is more likely . even if this $ 3 billion in debt were included on enron ' s balance sheet now , the debt - to - capital ratio would climb to 54 % from about 49 % at the end of june . such a change would pressure enron ' s credit rating but not push it below investment grade , says shipman .\r\nhuge hit . fastow ' s ljm , a private equity fund , was a different kind of animal , according to sources familiar with the arrangements . it bought energy and other assets from enron , which booked gains and losses on those deals . ljm was also involved in complex hedging that was supposed to reduce the volatility of some of enron ' s investments , including stakes in high - tech and telecom businesses and an interest in new power co . , which markets power to consumers . when enron terminated these deals in september , it took the $ 1 . 2 billion hit to equity .\r\nbut the more immediate question is whether trading partners will stick with the company . the first place that might show up is in enron ' s highly successful online platform , which trades everything from gas and electricity to weather derivatives . to reassure its partners , enron is scrambling to shore up its liquidity . it has already tapped $ 3 billion in credit lines and is trying to arrange another $ 1 billion . shipman says he has seen no signs of massive customer defections or drastically worsened credit terms . still , rival traders are wary . " we certainly have taken a closer look at enron in the last week to 10 days and will continue to manage the credit risk , but we ' re still doing business with the company as usual , " says keith g . stamm , ceo of power and gas trader aquila inc . ( ila )\r\n" speeding train . " still , with the stock battered and rating agencies considering further downgrades , that could rapidly change . in recent days , the uncertainty about enron ' s future has reduced investors ' appetite for enron debt . " no one wants to speculate on the direction or their likelihood of survival , " says a credit - derivatives salesperson . " it ' s really difficult to get in front of a speeding train . "\r\neven if lay can calm his trading partners , he and his management team face a much tougher task of restoring their credibility on wall street . with the stock now hovering around $ 14 - - down from a high of $ 90 in august , 2000 - - some even believe that enron could be a takeover target for the likes of ge capital or royal dutch / shell group . ( rd ) both declined to comment . would enron sell ? one source close to the company says enron has talked about possible mergers and strategic alliances in the past with royal dutch / shell , among others . " if they ' re really worried about liquidity , they might take the easy way out , " he says .\r\nif enron does pull through this crisis , some suspect it will be a humbler , more risk - averse place . the company that once believed it could expand its trading and logistics empire to all manner of commodities - - from advertising space to steel - - will be forced to scale back its grandiose visions . that ' s something some investors applaud . " the company should focus on its strengths , " says william n . adams , senior energy analyst at banc of america capital management , a major shareholder . but that ' s a far less exciting place than enron ' s energy cowboys ever hoped to roam .\r\nby stephanie anderson forest and wendy zellner in dallas , with heather timmons in new york\r\nnovember 12 , 2001\r\nfinance\r\nbusiness week\r\nderivatives danger ?\r\nlately , owners of enron ' s ( ene ) equity , bonds , and loans have been struggling to understand how exposed the company is to risks of losses that they didn ' t know about before . now , as a fuller picture of enron ' s entanglements with partnerships begins to emerge , investors have something else to worry about : credit default swaps , known as cdss for short .\r\nthat ' s financial marketspeak for insurance on bonds and bank loans . their owners pay a premium for coverage that reimburses them for any losses they have if their investments go bad . the cds market has existed only for about two years , but it ' s growing fast . goldman , sachs & co . ( gs ) and others estimate that bonds and loans with a face value of between $ 1 trillion and $ 1 . 5 trillion are covered . not surprisingly , big banks with hefty balance sheets such as j . p . morgan chase ( jpm ) , merrill lynch ( mer ) , and deutsche bank ( db ) dominate the market .\r\nenron , however , is a player - - and the only significant one that isn ' t also a bank . competitors say that although enron has issued only between $ 500 million and $ 700 million worth of cdss so far this year , it had ambitious plans to offer them online . " they don ' t belong in this market , " says one trader . " they don ' t understand the implications . " enron did not return calls seeking comment .\r\nof course , neither enron nor others will have to pay out unless the loans and bonds they ' re insuring turn bad . trouble is , this year is potentially a doozie for losses on corporate debt . corporate defaults could reach a record of $ 100 billion , says standard & poor ' s , like businessweek a unit of the mcgraw - hill companies . regulators say shaky bank loans hit a record $ 193 billion by early october .\r\nif enron has insured any of the bad debt , it might have to take charges for losses if they exceed the premiums it has been getting . with all that has been happening in recent weeks , that ' s the last thing it needs .\r\nby heather timmons in new york\r\nnovember 12 , 2001\r\nfinance\r\nby mike mcnamee\r\ncommentary : enron ' s clout won ' t sway the sec\r\nfor securities & exchange commission chairman harvey l . pitt , the sec ' s investigation into enron corp . ( ene ) could hardly have come at a worse time . the future of pitt ' s ambitious agenda of reforms in securities regulation could depend on how well he handles this case .\r\nenron ' s political clout and close ties to president george w . bush create real risks for the sec . enron ceo kenneth l . lay is a longtime bush backer , and the company was the biggest corporate contributor to the president ' s campaign . a bush appointee , pitt is attempting a delicate balancing act . he has made it clear he wants to speed up the sec ' s enforcement , in part by rewarding companies that cooperate with probes . but he insists the sec will still come down hard on true corporate miscreants - - and knows that any signs of let - up could jeopardize the rest of his reform agenda .\r\nenter the enron probe . the fine shadings of securities enforcement - - where most cases are settled by negotiated penalties , not court - imposed fines - - often make it hard for outsiders to tell whether the sec is being tough or lenient . but pitt must go out of his way to make it clear that the enron case is handled by the book - - getting the same strict scrutiny from the sec as any other , less connected company .\r\nfor now , top sec aides say that ' s happening . the sec enforcement div . in washington is looking into whether enron adequately disclosed to shareholders the risks of its complex deals with andrew s . fastow , the company ' s former chief financial officer . agency insiders say pitt and his fellow commissioners will be briefed on the case as it proceeds . but they insist pitt hasn ' t heard from the white house or enron ' s other political allies .\r\ntop lobbyist . enron ' s connections are numerous . besides lay ' s links to bush , an enron director , wendy lee gramm , is the wife of texas senator phil gramm , top republican on the senate banking committee . and enron spreads its lobbying budget - - $ 2 . 13 million in 2000 - - across both parties . just this year it hired four lobbying firms with democratic roots . enron says it lobbies heavily because it operates in regulated industries . it notes that electric utilities outspend it 35 to 1 .\r\non oct . 31 , a special committee of enron ' s board hired william r . mclucas , former sec enforcement director , to represent it . mclucas should know that any attempt to muscle the stock cops is likely to backfire . pitt , who joined the sec out of law school in 1968 , " remembers how the [ nixon - era ] sec tainted itself by turning a blind eye to [ fugitive financier ] robert vesco , " says an agency veteran . pitt has too much riding on the enron probe to let its connections sway his judgment .\r\nmcnamee covers finance in washington .\r\nafl - cio , amalgamated bank call on enron to redirect the mission of newly created special committee : outside directors must adopt independent role\r\n11 / 02 / 2001\r\npr newswire\r\n( copyright ( c ) 2001 , pr newswire )\r\nwashington , nov . 2 / prnewswire / - - in a shareholder letter sent today to the board of directors of the enron corporation ( nyse : ene ) , the afl - cio and the amalgamated bank called on enron to expand the mission of its newly created special committee and for enron ' s board to adopt a package of reforms designed to restore investor confidence in the battered energy firm . america ' s working families are significant shareholders of enron stock through their pension , health and welfare benefit funds .\r\namong other suggestions , the letter urges the board to expand the special committee ' s mandate to : * examine all transactions with entities in which enron employees\r\nor directors have an interest\r\n* adopt procedures for reviewing insider participation in investments\r\n* commence an extraordinary review of executive compensation\r\n* adopt a stricter definition of director independence and disclose\r\ndirector conflicts of interest\r\n" the special committee ' s mandate is far too narrow to address the current crisis , " said richard trumka , secretary - treasurer of the afl - cio . " in light of enron ' s recent balance sheet write - downs , share price decline , and credit rating deterioration , the special committee must be forward looking and consider sweeping governance reform measures , " he explained .\r\n" in this time of crisis , outside directors must reform enron ' s traditional lack of transparency and communicate directly with shareholders , " said gabriel caprio , president and ceo of the amalgamated bank . " this is particularly important since several outside directors have apparent conflicts of interest that compromise their objectivity , " he added .\r\nafl - cio affiliate unions sponsor benefit funds with over $ 400 billion in assets and hold an estimated 3 . 1 million enron shares . the afl - cio is joined in its call for reform by the amalgamated bank , the trustee of the longview funds which hold 251 , 304 shares of enron . the amalgamated bank ' s longview funds are collective investment trusts that manage equity assets on behalf of workers ' benefit funds .\r\na copy of the letter sent to enron ' s board of directors is available by contacting the afl - cio office of investment at 202 - 637 - 3900 or online at http : / / www . shareholdervalue . org .\r\nmake your opinion count - click here\r\n/ contact : lane windham , + 1 - 202 - 637 - 5018 , or bill patterson , + 1 - 202 - 637 - 3900 , both of afl - cio / 13 : 39 est\r\ncopyright ? 2000 dow jones & company , inc . all rights reserved .\r\nwilliams ceo sees validation of co ' s bandwidth model\r\n11 / 02 / 2001\r\ndow jones energy service\r\n( copyright ( c ) 2001 , dow jones & company , inc . )\r\nhouston - ( dow jones ) - at the beginning of the year , two telecommunications units born of energy companies seemed to symbolize the different ways to sell bandwidth .\r\nenron corp . ( ene ) was viewed as the hot pioneer blazing a trail for selling telecommunications network capacity as a commodity like natural gas or electricity .\r\non the other hand , williams communications group ( wcg ) seemed to promote the old - fashioned method that telecommunications companies have used for making long - term agreements .\r\non thursday , wcg reported third - quarter earnings of $ 22 . 6 million before taxes and interest . wcg chairman and chief executive howard janzen told dow jones newswires that the earnings validated williams ' model of a bandwidth company being a provider of services , not a trader of commodities .\r\n" i think bandwidth trading has its place , " janzen said . " it ' s developing relatively slowly . the reality is ( that ) bandwidth is not a commodity except for lower - capacity services and on certain routes with large amounts of capacity . "\r\nenron reported an $ 80 million third - quarter loss for its broadband services unit .\r\nas janzen views it , bandwidth is defined by the technology used to provide it .\r\n" the bandwidth we ' ll sell five years from now will be different from what we ' re selling today because of changes in technology , " he said .\r\nafter taxes and interest , williams reported a loss of $ 272 . 7 million on revenue of $ 297 . 8 million in the third quarter . an extraordinary gain of $ 223 . 7 million from the repurchase of senior redeemable notes in the open market enabled williams to post the positive result before taxes and interest .\r\nin the same period a year ago , williams reported a net loss of $ 150 . 5 million on revenue of $ 209 million .\r\nwilliams also announced thursday that it has agreed to buy the assets of coreexpress , a company which operated on williams ' network . coreexpress guarantees data delivery over multiple networks and developed software to monitor quality of service over multiple networks .\r\nterms of the agreement weren ' t disclosed . the deal won ' t close until undisclosed conditions are met .\r\nwilliams also announced a 20 - year agreement valued at $ 267 million to provide bandwidth to boeing co .\r\n- by erwin seba , dow jones newswires , 713 - 547 - 9214 erwin . seba @ dowjones . com\r\ncopyright ? 2000 dow jones & company , inc . all rights reserved .\r\nusa : enron shares plunge after s & p credit rating cut .\r\n11 / 02 / 2001\r\nreuters english news service\r\n( c ) reuters limited 2001 .\r\nnew york , nov 2 ( reuters ) - shares of enron corp . dropped more than 8 percent in early trade on friday , as the stock took a new plunge in a two - week free - fall , a day after its credit rating was cut for the second time this week .\r\nenron was down 89 cents , or 7 . 4 percent , to $ 11 . 10 on the new york stock exchange , after briefly slipping to $ 10 . 95 , a price it also touched briefly on tuesday and last closed at in july 1992 .\r\nenron , the nation ' s largest energy trader , lined up $ 1 billion of new credit on thursday in a bid to restore investor confidence . even so , standard & poor ' s cut enron ' s credit rating after stock markets closed on thursday , saying it could do so again if the situation worsens .\r\nenron has been rocked by a stock slide that has slashed two - thirds of its share price since oct . 16 , when the company reported a $ 1 billion charge that was caused , in part , by dealings linked to a chief financial officer who was subsequently ousted .\r\nhouston - based enron is facing an investigation by the u . s . securities and exchange commission into those dealings . at issue are off - balance sheet deals with limited partnerships , run by then - cfo andrew fastow , which contributed to a $ 1 . 2 billion reduction in shareholder equity .\r\ncopyright ? 2000 dow jones & company , inc . all rights reserved .\r\ngoldman declines enron ' s request for loan amid credit concerns\r\n2001 - 11 - 02 11 : 22 ( new york )\r\ngoldman declines enron ' s request for loan amid credit concerns\r\nnew york , nov . 2 ( bloomberg ) - - goldman sachs group inc .\r\nrefused to participate in a $ 1 billion loan to enron corp . because\r\nit was unwilling to risk its capital on a client with falling\r\ncredit ratings that has been using its investment - banking services\r\nless often , according to people familiar with matter .\r\nenron chief executive kenneth lay asked for the line of\r\ncredit after meeting with bankers from goldman , j . p . morgan chase\r\n& co . and citigroup inc . last month to discuss ways to alleviate a\r\ncash crunch . j . p . morgan and citigroup agreed to lend on the\r\ncondition that the houston - based energy trader pledge two\r\npipelines as collateral .\r\ngoldman ' s refusal highlights its reluctance to provide credit\r\nlines to all but its top fee - paying clients . by not lending to\r\nenron , the firm also is able to advise companies that want to buy\r\nenron or some of its assets , or make a bid of its own .\r\nfor enron , the decision makes it harder to resist the demands\r\nof its existing lenders , who made the company give up collateral\r\nand pay higher interest rates to get its new loan .\r\n` ` people smell resistance to enron , ' ' said glenn reynolds , an\r\nanalyst with creditsights . com , an independent research firm . ` ` the\r\nbalance of power is the with the banks now . ' '\r\nkathleen baum , a spokeswoman for goldman , said the firm had\r\n` ` a long history of putting its capital to work for clients . ' ' she\r\nwouldn ' t comment on the firm ' s relationship with enron . karen\r\ndenne , a spokeswoman for enron , said , ` ` i won ' t confirm who we\r\nhave had talks with . ' '\r\nshut out\r\nenron needed the $ 1 billion secured loan to supplement cash\r\nreserves and help the company pay off existing debt . yesterday ,\r\nstandard & poor ' s cut enron ' s long - term credit rating to ` ` bbb , ' '\r\nthe second - lowest investment grade rating , from ` ` bbb + . ' '\r\nthe sec is investigating partnerships run by former chief\r\nfinancial officer andrew fastow that bought and sold enron shares\r\nand assets . those trades cost enron $ 35 million . the company also\r\nlost $ 1 . 2 billion in shareholder equity .\r\nenron ' s dealings with the partnerships have shut it out of\r\ncommercial - paper markets , where corporations borrow money for days\r\nor weeks . lenders are concerned that the company may have further\r\nlosses from its trading business .\r\na week ago enron tapped $ 3 billion in credit lines , arranged\r\nby j . p . morgan and citibank , to pay off $ 2 . 2 billion in commercial\r\npaper it has outstanding .\r\nalready loaded down with enron debt , j . p . morgan and citibank\r\nforced enron to pay as much as 2 . 5 percentage points more than the\r\nlondon interbank offered rate , or libor , on its new loan ,\r\naccording to the people familiar with the matter . that ' s about\r\nfive times the spread enron is paying on its existing lines ,\r\naccording to bloomberg data .\r\nthe extra yield wasn ' t sufficient incentive for goldman to\r\nlend . securities firms such as goldman are required to value\r\nlending commitments at market prices , exposing them to potential\r\nlosses from declines in loan prices . banks such as j . p . morgan and\r\ncitibank are allowed to carry the assets at full value .\r\nfalling prices\r\nenron ' s existing loans have already fallen to between 80 and\r\n90 cents on the dollar since the company drew them down , according\r\nto traders .\r\nthe company ' s 6 . 4 percent coupon notes due in 2006 fell as\r\nmuch as 6 cents on the dollar today , with price indications\r\nbetween 68 and 73 cents , traders said . yesterday , traders bid for\r\nthe bonds at 74 cents and offered to sell them at 77 cents\r\ngoldman has been getting less investment - banking business\r\nfrom enron .\r\nthe no . 3 u . s . securities firm by capital hasn ' t arranged a\r\nbond sale for enron since 1995 , and enron hasn ' t picked goldman to\r\nadvise on an acquisition or sale since at least 1993 , according to\r\nbloomberg data . the firm is a dealer on enron ' s $ 3 billion\r\ncommercial paper program and has arranged six of the company ' s 15\r\npreferred share sales .\r\nwithout a steady flow of investment banking fees , goldman is\r\nreluctant to make loans that tie up capital in a business that is\r\nless profitable than advising on mergers or underwriting\r\nsecurities .\r\namong clients the firm has turned down : ford motor co . , the\r\nsecond - largest automaker , and vodafone group plc , europe ' s largest\r\ncellular phone company .\r\ngoldman did extend $ 2 billion in credit to at & t corp . as part\r\nof a $ 25 billion credit facility last year . the largest u . s . long -\r\ndistance phone company is one of the firm ' s most lucrative\r\nclients , however , paying out more than $ 100 million in investment -\r\nbanking fees during the past five years .\r\nafter refusing to lend to enron , goldman is unlikely to\r\nadvise the energy trader on any of its planned $ 2 . 1 billion in\r\nasset sales , the people familiar said .\r\nj . p . morgan is advising enron on the sale of its azurix north\r\namerican water business and may be retained , along with\r\ncitigroup ' s salomon smith barney unit , to look at other disposals\r\nand a possible sale of the whole company , the people said .\r\ngoldman , which has an energy trading business of its own , may\r\nalready be advising potential buyers of enron ' s businesses , or\r\nlining up a bid of its own , reynolds said .\r\n` ` we may well see goldman pop up on the other side , which\r\nwould explain its reticence to lend , ' ' he said .\r\npower points : time for a bronx miracle in houston\r\nby mark golden\r\n11 / 02 / 2001\r\ndow jones energy service\r\n( copyright ( c ) 2001 , dow jones & company , inc . )\r\na dow jones newswires column\r\nnew york - ( dow jones ) - with two outs in the bottom of the ninth and down by two runs with one man on , enron corp . ( ene ) needs nothing less than a home run just to stay in the game .\r\nthis is the home run enron must come up with : full disclosure of the assets and liabilities of those troubled partnerships that have been kept off their balance sheet , and proof the corporation can cover the difference . it ' s what stockholders , bondholders and trading partners have been demanding , but time is running out , and enron hasn ' t produced .\r\n" we ' re going to get to that number soon and tell people . we have to do that , " said a source at enron , who didn ' t know how quickly the disclosure would come or whether it would be independently audited . " in this environment , we are going to be absolutely sure that it ' s accurate . "\r\nenron has about $ 3 . 3 billion in debt on the two key partnerships payable in 2003 . the company will sell the assets in those partnerships and , in a worse case scenario , will face a $ 1 billion shortfall that could be covered by selling stock and other company assets , enron has said . the company hopes that when the time comes to issue new stock , shares will be trading closer to $ 20 than their current $ 11 .\r\ninvestors , however , aren ' t acting as if they ' re reassured . part of the trouble , analysts with credit - rating firm standard & poor ' s said friday morning , is that until enron sells the assets , nobody knows for sure what they ' re worth or how deep the partnerships are in the hole .\r\ns & p is confident that enron can raise enough capital to fill that hole , provided its trading partners in key energy markets continue to business with the company .\r\ncredit derivative fizzle\r\nto see the pickle the world ' s premier energy franchise is in , one needs to look no further than its credit derivatives trading desk . that relatively new and innovative operation was getting off the ground nicely , but the desk hasn ' t been able to transact for three weeks , since the value of enron ' s stock and bonds began to plummet , according to another enron employee .\r\nenron didn ' t respond to a request for comment , but the enron credit division in europe said in a release that lines are " constrained " as the market continues to evaluate the company ' s credit position .\r\nuntil recently , customers would buy a credit derivative from enron to cover its exposure to bankruptcy by a third company . a big supplier to a financially troubled company like xerox corp . ( xrx ) or lucent technologies ( lu ) , for example , might pay enron $ 1 million a year for a payout of , say , $ 10 million in the event of a bankruptcy .\r\nbut you wouldn ' t pay the premium on your insurance unless you were sure that the insurance company could pay your claims . likewise , with enron ' s creditworthiness in doubt , there ' s no reason to pay a lot of money just to exchange lucent risk for enron risk .\r\nthe same holds true , though less obviously , for enron ' s core business of energy trading . some utilities , large industrial companies and other energy trading companies could see their profits ruined if natural gas prices - already on the rise at $ 3 per million british thermal units - shoot to $ 10 this winter as they did last winter .\r\nto guard against that risk , companies buy contracts now for delivery this winter at set prices . if a company wants to lock in winter gas at $ 3 , should it turn to enron ? if gas were to rise to $ 10 in january and enron couldn ' t deliver , enron ' s counterparty paid a good price for no protection . and if enron failed in a $ 10 gas market , that would send hundreds of companies scrambling for supplies . gas could go to $ 20 in a heartbeat .\r\nleading role an asset\r\nenron does have a man on base . the company is so important to energy market participants that they desperately want enron to survive . and any company that moves to lock enron out of the market now could regret doing so if enron survives as the big dog .\r\nenron ' s trading partners have a pretty consistent position at this point : they are continuing to trade with enron , but are avoiding long - term deals that increase their exposure to the company . the only acceptable long - term deals are those that offset deals done earlier . unless and until a bill goes unpaid , they ' ll keep delivering .\r\nenron , like the rest of the u . s . energy industry , paid its bills for september gas deliveries on oct . 20 and for its september electricity deliveries on oct . 25 . its next power and gas bills come due in the third week of november .\r\nthat means trading companies generally have just seven weeks of receivables at risk before enron ' s creditworthiness faces another test . in exchange for taking on that small risk , they keep the great market maker in business .\r\nlenders show concerns\r\nlenders , whether through holding enron bonds or providing loans , don ' t have the same luxury as energy trading companies that are keeping deals on a short leash . lenders supply cash up front and for longer periods of time , usually years .\r\noutside of long - term commitments , what worries the banks , which presumably got a good look at enron ' s books ? if enron is now unable to make the big bets with long - term energy deals that have kept it profitable for years , can it service all of its liabilities , known and unknown ? if it has to sell profitable assets like pipelines and power plants , or otherwise put those assets at risk , will lower earnings be enough to pay rising credit costs ? what happens if its greatest asset - its traders - leave in droves to work elsewhere ?\r\nif enron is to have a chance , it needs energy trading profits to keep rolling in so that it can continue to service its debt . so far this year , trading and other wholesale operations have accounted for $ 2 . 2 billion out of $ 2 . 4 billion in net recurring income before interest and taxes . by comparison , enron has made $ 617 million in interest payments so far this year .\r\nj . p . morgan chase 201 - 938 - 4604 ; mark . golden @ dowjones . com\r\ncopyright ? 2000 dow jones & company , inc . all rights reserved .\r\nenron says it remains ' biggest ' player in eu gas , power\r\n11 / 02 / 2001\r\ndow jones energy service\r\n( copyright ( c ) 2001 , dow jones & company , inc . )\r\nlondon - ( dow jones ) - enron corp . ( ene ) said in a statement friday that it remains the " the biggest buyer and seller of gas and power in europe , " and that its enrononline platform " continues to be the key trading platform in europe . "\r\nthe company issued the statement in response to widespread rumors that several companies have ceased trading gas and power with enron since its share price took a nosedive last week .\r\nin the statement , enron added that worldwide transactions in the power and gas markets were averaging $ 3 billion to $ 4 billion a day , up from a 30 - day average of $ 2 . 5 billion at close of business friday , oct . 26 .\r\nenron has been on the defensive after announcing a third - quarter loss of $ 618 million two weeks ago , followed by news that it took a $ 1 . 2 billion equity write - down , based partly on transactions involving a handful of its own officers .\r\non wednesday , enron disclosed that the sec elevated its inquiry into enron ' s alleged related - party transactions to a formal probe .\r\nenron corp . secured $ 1 billion in new credit lines this week , but the deal did not appease credit - rating agency standard + 44 - ( 0 ) 20 - 7842 - 9345 ; sarah . spikes @ dowjones . com\r\ncopyright ? 2000 dow jones & company , inc . all rights reserved .\r\nusa : update 1 - enron shares fall as investor concerns linger .\r\nby janet mcgurty\r\n11 / 02 / 2001\r\nreuters english news service\r\n( c ) reuters limited 2001 .\r\nnew york , nov 2 ( reuters ) - shares in enron corp . , the u . s . energy trading giant facing a federal probe into its dealings , declined on friday amid lingering investor concern about management credibility and the outcome of pending lawsuits .\r\nshares of enron were off 62 cents , or 5 . 2 percent , at $ 11 . 37 in early afternoon trade on the new york stock exchange , recouping some losses after plunging briefly to $ 10 . 95 , a closing price last seen in july 1992 and touched in intraday trade on tuesday .\r\nenron debt due in 2006 declined about another 6 points , and was trading at about 65 cents on the dollar . the bonds were trading at about $ 1 . 01 two weeks ago , before enron announced on oct . 16 a $ 1 billion charge , caused in part by dealings linked to partnerships run until recently by a chief financial officer who was ousted last week , and that form part of the federal probe .\r\n" there remain several fundamental issues that we believe need to be addressed before the clouds can clear above enron ' s skies , " said ronald barone of brokerage ubs warburg .\r\napart from the probe by the securities and exchange commission , barone said concerns include " the evolving state of the company ' s balance sheet , management credibility and ultimate outcome of shareholder lawsuits . "\r\na handful of law firms have sued enron , saying it overstated operating results , failed to write down assets on a timely basis and concealed investments that might require the company to issue large amounts of shares to cover loses .\r\nenron ' s woes have put it under intense investor scrutiny , with the company losing about $ 17 billion in market capitalization in the past two weeks . the afl - cio union umbrella urged enron on friday to review executive compensation and adopt procedures for insider investments , among other recommendations .\r\nafl - cio affiliate unions sponsor benefit funds that hold an estimated 3 . 1 million enron shares .\r\npremium player in risk management\r\nshares also plunged on friday - they have fallen about two - thirds in the past two weeks - after standard & poor ' s cut enron ' s credit rating late thursday , the second cut this week .\r\nalthough s & p said another rate cut could ensue if enron ' s situation worsens , a team of s & p analysts told investors on a conference call on friday they were confident they were aware of all the company ' s financial obligations .\r\nenron has been criticized for providing scant details about its dealings , leading to concerns about the riskiness of its obligations .\r\n" enron is a premium player in the risk management area , " said ronald barone , a member of the s & p team who is not related to ubs ' s barone .\r\nthe s & p analysts said they were using about $ 1 . 5 billion as the amount of off - balance sheet items , about half the $ 3 billion calculated in past years , because of a change in reporting international energy assets , which no longer puts non - recourse debt onto the balance sheet in places like india and south america .\r\nthe s & p analysts also said they felt enron ' s move to secure a $ 1 billion line of additional credit earlier this week sent out enron ' s clear commitment to credit quality .\r\nthey also said it seemed reasonable to expect a long - term solution to fix enron ' s hard to understand balance sheet would assume an infusion of long - term equity as well as asset sales .\r\nthe s & p ratings team also said it was closely monitoring enron ' s trading partners for a change in their credit stance toward enron and , to date , have seen no significant change .\r\nbut they cautioned that could change at any time and that was the reason for putting enron on the credit watch listing .\r\nenron chairman and chief executive officer ken lay , who has been criticized for not providing the public enough details on enron , was a no - show at a business conference in houston where he had been scheduled to speak on friday morning .\r\nenron executive robert bradley , who declined to respond to questions from reuters after his speech , said he had been sent to replace lay because " ken is better at putting out fires at enron . "\r\ncopyright ? 2000 dow jones & company , inc . all rights reserved .     2
Subject: re : weather and energy price data\r\nmulong wang on 04 / 24 / 2001 10 : 58 : 43 am\r\nto :\r\ncc :\r\nsubject : re : weather and energy price data\r\nhello , elena :\r\nthank you very much for your data . i sent an email to ft but had no\r\nresponse so far . as soon as i got their permission i will let you know .\r\nhave a great day !\r\nmulong\r\non thu , 19 apr 2001 elena . chilkina @ enron . com wrote :\r\n>\r\n> mulong ,\r\n>\r\n> please find attached a file with henry hub natural gas prices . the data\r\n> starts from 1995 and given on the daily basis , please let us know when we\r\n> can proceed with electricity prices .\r\n>\r\n> sincerely ,\r\n> elena chilkina\r\n>\r\n> ( see attached file : henryhub . xls )\r\n>\r\n>\r\n>\r\n>\r\n>\r\n>\r\n> vince j kaminski @ ect\r\n> 04 / 16 / 2001 08 : 19 am\r\n>\r\n> to : mulong wang @ enron\r\n> cc : vince j kaminski / hou / ect @ ect , elena chilkina / corp / enron @ enron ,\r\n> macminnr @ uts . cc . utexas . edu\r\n>\r\n> subject : re : weather and energy price data ( document link : elena\r\n> chilkina )\r\n>\r\n> mulong ,\r\n>\r\n> we shall send you natural gas henry hub prices right away .\r\n> please look at the last winter and the winter of\r\n> 95 / 96 .\r\n>\r\n> we shall prepare for you the electricity price\r\n> information ( cinergy , cobb and palo verde ) but\r\n> you have to approach ft ( the publishers of\r\n> megawatts daily , a newsletter that produces the price\r\n> index we recommend using ) and request the permision\r\n> to use the data . we are not allowed to distribute\r\n> this information .\r\n>\r\n> please , explain that this is for academic research and that\r\n> we can produce the time series for you ,\r\n> conditional on the permission from the publishers\r\n> of megawatts daily .\r\n>\r\n> vince kaminski\r\n>\r\n>\r\n>\r\n> mulong wang on 04 / 15 / 2001 03 : 43 : 26 am\r\n>\r\n> to : vkamins @ ect . enron . com\r\n> cc : richard macminn\r\n> subject : weather and energy price data\r\n>\r\n>\r\n> dear dr . kaminski :\r\n>\r\n> i am a phd candidate under the supervision of drs . richard macminn and\r\n> patrick brockett . i am now working on my dissertation which is focused on\r\n> the weather derivatives and credit derivatives .\r\n>\r\n> could you kindly please offer me some real weather data information about\r\n> the price peak or plummet because of the weather conditions ?\r\n>\r\n> the past winter of 2000 was very cold nationwide , and there may be a\r\n> significant price jump for natural gas or electricity . could you\r\n> please offer me some energy price data during that time period ?\r\n>\r\n> your kind assistance will be highly appreciated and have a great day !\r\n>\r\n> mulong\r\n>\r\n>\r\n>\r\n>\r\n>\r\n>\r\n>\r\n>\r\n>\r\n>\r\n>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       2
Subject: nesa / hea ' s 24 th annual meeting\r\nsaddle up for nesa / hea ' s 24 th annual meeting\r\n" deep in the heart of texas "\r\nseptember 9 - 11 , 2001\r\nhyatt regency hill country resort\r\nsan antonio , texas\r\nthe annual meeting planning committee has put together an outstanding\r\nprogram for your review . it is attached below in an adobe acrobat pdf file\r\n- if you have problems with the attachment please call nesa / hea\r\nheadquarters at ( 713 ) 856 - 6525 and we ' ll mail or fax a copy out immediately .\r\ngeneral session topics include :\r\n* will tomorrow ' s restructured electric infrastructure support\r\ntomorrow ' s economy ?\r\n* power deregulation panel : developer / ipp , utility / transmission , power\r\nmarketer , government , retail\r\n* power demand\r\n* the state of the energy industry\r\n* new political administration - impact on energy policy and\r\nenvironment\r\nnetworking opportunities :\r\n* opening reception - sunday , september 9\r\n* golf tournament & tours of area attractions - monday , september 10\r\nthe hyatt regency hill country resort has a limited block of rooms available\r\nfor nesa / hea members and guests - be sure to check page 6 for lodging\r\ninformation and make your reservation as soon as possible .\r\nwe hope that you take this opportunity to meet with your colleagues and\r\ncustomers in this relaxing yet professional environment to exchange ideas on\r\nmatters of importance covering a broad spectrum of subjects . the annual\r\nmeeting agenda includes timely issues presented by knowledgeable industry\r\nleaders who will discuss formidable and thought provoking issues affecting\r\nthe energy industry today .\r\nplease take a moment to review the attached brochure . nesa / hea encourages\r\nand appreciates you taking the time to pass the brochure on to industry\r\npersonnel who would benefit from participating at this conference .\r\nnesa / hea ' s 24 th annual meeting\r\neducating the energy professional\r\nto unsubscribe from the nesa / hea member email blast list please respond to\r\nthis email with the word unsubscribe typed into the subject field . this\r\nwill preclude you from receiving any email blasts in the future , but hard\r\ncopies of the material will be sent to your attention .\r\n>\r\n- 512 _ nesa 2001 annlmtgjam . pdf                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         2
Subject: hpl nom for march 17 , 2001\r\n( see attached file : hplno 317 . xls )\r\n- hplno 317 . xls                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             2
Subject: temporary spaces in new building\r\ni think it would be a great idea if we could get a few extra spaces for\r\nit support in the new building .\r\nwe would need five spots .\r\ntwo for enpower ( one for risk , and one for volume mangement and confirms )\r\ntwo for unify ( one for logistics and one for edi )\r\none for gas risk\r\ni won ' t make requests for specific personnel ( yet ) just not the most junior\r\npersonnel .\r\nlet me know if you need anything else .\r\nthanks\r\nbob                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  2
Name: contents, Length: 185, dtype: int64

重複する(ユニークな)テキストは185個です
13個もの重複があるテキストが1つあり,その他は2~3個の重複です

print('contentsが重複:', len(test_df[test_df.duplicated(subset=["contents"])]))
test_df[test_df.duplicated(subset=["contents"])]
contentsが重複: 2691
id contents
596 597 Subject: re : ink prices got you down ? 11956\...
871 872 Subject: hi paliourg get all pills . everythin...
924 925 Subject: save your money buy getting this thin...
1129 1130 Subject: select small - cap for astute investo...
1173 1174 Subject: caiso notification - tswg conference ...
... ... ...
24817 24818 Subject: fulton bank online security message\r...
24818 24819 Subject: request for transfer assistance\r\nfr...
24827 24828 Subject: freedom - $ 1 , 021 , 320 . 00 per ye...
24830 24831 Subject: delivery status notification ( failur...
24834 24835 Subject: it ' s mariah from dating service\r\n...

2691 rows × 2 columns

test_value_counts = test_df["contents"].value_counts()
test_value_counts[test_value_counts>=2]
Subject: \r\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        51
Subject: fwd :\r\nyour needed soffttwares at rock bottom prri ce ! - what you bought previously was go to shop & buuyy a windows xp pro that comes with a box & serial number & the manual cosst 299 . 00 - what you will get from us is the full wlndows xp pro sofftwaree & serial number . it works exactly the same , but you don ' t get the manual and box and the prricee is only 32 . 00 . that is a savviing of 254 . 00\r\nsooftware title\r\nour low priicce\r\nadobbe creative suite ( 5 cds )\r\nadobbe photooshop cs 8 . 0 ( 1 cd )\r\n3 d studio max 6 . 0 ( 3 cds )\r\nadobbe premiere pro 7 . 0 ( 1 cd )\r\nalias wavefront maya 5 . 0 unlimited\r\nautocad 2005\r\nautodesk architectural desktop 2005\r\ncakewalk sonar 3 producer edition ( 3 cds )\r\ncanopus procoder 1 . 5 ( 1 cd )\r\ncorel draw 12 graphic suite ( 3 cds )\r\ndragon naturally speaking preferred 7 . 0\r\nmacromedia dreamweaver mx 2004 v 7 . 0\r\nmacromedia fireworks mx 2004 v 7 . 0\r\nmacromedia flash mx 2004 v 7 . 0 professional\r\nmacromedia studio mx 2004 ( 1 cd )\r\nmicrosoft money 2004 deluxe ( 1 cd )\r\nmicrosoft office 2003 system professional ( 5 cds )\r\nmicrosoft office 2003 multilingual user interface pack ( 2 cds )\r\nmicrosoft project 2002 pro\r\nmicrosoft publisher xp 2002\r\nmicrosoft visio for enterprise architects 2003\r\nmicrosoft windows xp corporate edition with spl\r\nmicrosoft windows xp professional\r\nnorton antivirus 2004 pro\r\nnorton systemworks pro 2004 ( 1 cd )\r\nomnipage 14 office ( 1 cd )\r\npinnacle impression dvd pro 2 . 2 ( 1 cd )\r\nptc pro engineer wildfire datecode 2003451 ( 3 cds )\r\npowerquest drive image 7 . 01 multilanguage ( 1 cd )\r\nulead dvd workshop 2 . 0\r\nmicrosoft visual studio . net 2003 enterprise architect ( 8 cds )\r\nwinfax pro 10 . 03\r\nand more soft wares - have 850 soft ware titles on our site for u\r\n55 . 00\r\n32 . 00\r\n50 . 00\r\n32 . 00\r\n40 . 00\r\n32 . 00\r\n32 . 00\r\n36 . 00\r\n25 . 00\r\n32 . 00\r\n25 . 00\r\n25 . 00\r\n32 . 00\r\n30 . 00\r\n50 . 00\r\n20 . 00\r\n40 . 00\r\n25 . 00\r\n32 . 00\r\n20 . 00\r\n25 . 00\r\n40 . 00\r\n32 . 00\r\n20 . 00\r\n20 . 00\r\n25 . 00\r\n25 . 00\r\n40 . 00\r\n20 . 00\r\n20 . 00\r\n93 . 00\r\n20 . 00\r\ndownload your sofftwaares from our superfast ( 100 mbits connection ) site & you will be given your own exclusive registration key to register the sofftwaares you bought from us , and now you have your own registered copy of sofftwaares ( will never expired again )\r\nit ' s oem version of sofftwaares which is an original / genuine sofftwaares , strictly no piracy sofftwaares\r\nover 850 popular titles for you to choose fromact quick now before all soldstart using your needed sofftwaares now = = c l i c k - h e r e = = ( plz give 2 - 3 mins to complete the page loading bcos the page has 850 titles on it )\r\ntake me down\r\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         8
Subject: calpine daily gas nomination\r\n>\r\nricky a . archer\r\nfuel supply\r\n700 louisiana , suite 2700\r\nhouston , texas 77002\r\n713 - 830 - 8659 direct\r\n713 - 830 - 8722 fax\r\n- calpine daily gas nomination 1 . doc                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     7
Subject: \r\nh e l l od e a rh o m eo w n e r ,\r\nw eh a v eb e e nn o t i f i e dt h a ty o u rm o r t g a g e\r\nr a t ei sf i x e da tav e r yh i g hi n t e r e s tr a t e .\r\nt h e r e f o r ey o ua r ec u r r e n to v e r p a y i n g ,\r\nw h i c hs u m s - u pt ot h o u s a n d so f\r\nd o l l a r sa n n u a l l y .\r\nl u c k i l yf o ry o uw ec a n\r\ng u a r a n t e e\r\nt h el o w e s tr a t e si nt h eu . s .\r\n( 3 . 5 2 % ) .\r\ns oh u r r yb e c a u s et h er a t ef o r e c a s ti s\r\nn o tl o o k i n gg o o d !\r\nt h e r ei sn oo b l i g a t i o n s , a n di tf r e e\r\nl o c ko nt h e 3 . 5 2 % , e v e nw i t hb a dc r e d i t !\r\nc l i c kh e r en o wf o rd e t a i l s\r\nr e - m o v eh e r e\r\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               7
Subject: new product ! cialis soft tabs .\r\nhi !\r\nwe have a new product that we offer to you , c _ i _ a _ l _ i _ s soft tabs ,\r\ncialis soft tabs is the new impotence treatment drug that everyone is talking\r\nabout . soft tabs acts up to 36 hours , compare this to only two or three hours\r\nof viagra action ! the active ingredient is tadalafil , same as in brand cialis .\r\nsimply disolve half a pill under your tongue , 10 min before sex , for the best\r\nerections you ' ve ever had !\r\nsoft tabs also have less sidebacks ( you can drive or mix alcohol drinks with them ) .\r\nyou can get it at : http : / / go - medz . com / soft /\r\nno thanks : http : / / go - medz . com / rr . php\r\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        6
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ..
Subject: failure notice\r\nhi . this is the qmail - send program at mail - 03 . cdsnet . net .\r\ni ' m afraid i wasn ' t able to deliver your message to the following addresses .\r\nthis is a permanent error ; i ' ve given up . sorry it didn ' t work out .\r\n:\r\nsorry , i couldn ' t find any host named bb . internetcds . com . ( # 5 . 1 . 2 )\r\n- - - below this line is a copy of the message .\r\nreturn - path :\r\nreceived : ( qmail 50355 invoked by alias ) ; 19 jul 2005 10 : 58 : 51 - 0000\r\ndelivered - to : nic - notify @ internetcds . com\r\nreceived : ( qmail 50352 invoked from network ) ; 19 jul 2005 10 : 58 : 51 - 0000\r\nreceived : from unknown ( helo localhost ) ( 127 . 0 . 0 . 1 )\r\nby mail - 03 . cdsnet . net with smtp ; 19 jul 2005 10 : 58 : 51 - 0000\r\nreceived : from mail - 03 . cdsnet . net ( [ 127 . 0 . 0 . 1 ] )\r\nby localhost ( mail - 03 . cdsnet . net [ 127 . 0 . 0 . 1 ] ) ( amavisd - new , port 10024 )\r\nwith smtp id 46679 - 09 for ;\r\ntue , 19 jul 2005 03 : 58 : 51 - 0700 ( pdt )\r\nreceived : ( qmail 50346 invoked from network ) ; 19 jul 2005 10 : 58 : 50 - 0000\r\nreceived : from yahoobb 220056020109 . bbtec . net ( helo mailwisconsin . com ) ( 220 . 56 . 20 . 109 )\r\nby mail - 03 . cdsnet . net with smtp ; 19 jul 2005 10 : 58 : 50 - 0000\r\nreceived : from 205 . 214 . 42 . 66\r\n( squirrelmail authenticated user projecthoneypot @ projecthoneypot . org ) ;\r\nby mailwisconsin . com with http id j 87 gzo 09360462 ;\r\ntue , 19 jul 2005 10 : 57 : 46 + 0000\r\nmessage - id :\r\ndate : tue , 19 jul 2005 10 : 57 : 46 + 0000\r\nsubject : just to her . . .\r\nfrom : " barry castillo "\r\nto : nic - notify @ internetcds . com\r\nuser - agent : squirrelmail / 1 . 4 . 3 a\r\nx - mailer : squirrelmail / 1 . 4 . 3 a\r\nmime - version : 1 . 0\r\ncontent - type : text / html ; charset = iso - 8859 - 1\r\ncontent - transfer - encoding : 8 bit\r\nx - priority : 3 ( normal )\r\nimportance : normal\r\nx - virus - scanned : by amavisd - new at internetcds . com\r\nsoft viagra at $ 1 . 62 per dose\r\nready to boost your sex life ? positive ?\r\ntime to do it right now !\r\norder soft viagra at incredibly low prices\r\nstarting at $ 1 . 99 per dose ! unbelivabie !                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                2
Subject: the future of continuing education\r\nselect your state then press " go " to view ce courses available\r\n( aol users\r\nclick here )\r\nal\r\nak\r\naz\r\nar\r\nca\r\nco\r\nct\r\nde\r\ndc\r\nfl\r\nga\r\nhi\r\nid\r\nil\r\nin\r\nia\r\nks\r\nky\r\nla\r\nme\r\nmd\r\nma\r\nmi\r\nmn\r\nms\r\nmo\r\nmt\r\nne\r\nnv\r\nnh\r\nnj\r\nnm\r\nny\r\nnc\r\nnd\r\noh\r\nok\r\nor\r\npa\r\nri\r\nsc\r\nsd\r\ntn\r\ntx\r\nut\r\nvt\r\nva\r\nwa\r\nwv\r\nwi\r\nwy\r\nwe don ' t want anyone to receive our mailings who does not\r\nwish to receive them . this is a professional communication\r\nsent to insurance professionals . to be removed from this mailing\r\nlist , do not reply to this message . instead , go here :\r\nhttp : / / www . insurancemail . net\r\nlegal notice\r\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          2
Subject: winning notification ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !\r\nluckyday lottery international program .\r\ninternational promotions / prize award dep .\r\nlaan van hoornwijck 2289 dg rijswijk ,\r\nden haag - the netherlands .\r\ne - mail : infoluckydayinfo @ netscape . net\r\nwebsite : luckyday . nl\r\nref . kfm / 9083428767 / 02 / tca\r\nbacth no : 10 / 25 / 0742\r\nattn : winning notification ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !\r\nwe happily announce to you the draw of the netherlands lottery international programs held on the 10 th of may 2005 in netherlands . your e - mail address attached to ticket number : 564 75644545 188 with serial number 3398 / 09 drew the lucky numbers : 31 - 6 - 97 - 13 - 40 - 6 , which subsequently won you the lottery in the a category .\r\nyou have therefore been approved to claim a total sum of 1 , 000 , 000 . 00 ( one million euros in cash credited to file kpc / 9080118308 / 02 / tca . this is from a total cash prize of 10 , 000 , 000 . 00 million euros shared amongst the first ten ( 10 ) lucky winners in this category .\r\nall participants were selected randomly from ' world wide web ' site through computer draw system and extracted from over 100 , 000 companies . this promotion takes place annually . for security reasons , you are advised to keep your winning information confidential till your claims is processed and your money remitted to you in whatever manner you deem fit to claim your prize . this is a part of our precautionary measure to avoid double claiming and unwarranted abuse of this program by some unscrupulous elements .\r\nnote : all winnings must be notarized and a certificate of award must be obtained from the netherlands gaming control board to complete the claims process , this certificate can only be obtained through legal representation so winners will be referred to our regional director to assist in this process . winners are to cover the legal charges for the notarization of the claims form and the acquisition of the certificate of award not luckyday .\r\nto file for your claim , please contact our regional office with the details below for processing and release of your winning .\r\nname : mr . kelvin morrison .\r\ntel : + 31 624 408267\r\nfax : + 31 847 306501\r\ne - mail : infoluckydayinfo @ netscape . net\r\nluckyday redemption centre , amsterdam . the netherlands .\r\nnote that all claims process and clearance procedures must be duly completed not later than 30 th of may 2005 to avoid impersonation arising to the issue of double claim .\r\nto avoid unnecessary delays and complications , please quote your reference / batch numbers in any correspondences with us or our designated agent .\r\ncongratulations once more from all members and staffs of this program .\r\nyours faithfully ,\r\nmrs . harrieth smith\r\nzonal lottery coordinator .\r\ncheck - out go . com\r\ngo get your free go e - mail account with expanded storage of 6 mb !\r\nhttp : / / mail . go . com                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                2
Subject: r o l e x watches starting under . .\r\nwho can resist a 24 kt . white gold r o l e x watch surrounded in stainless steal ? the high profile jewelry you ' re looking for at the lowest prices in the nation !\r\nclick here and choose from our collection of r o l e x watches .\r\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       2
Subject: software para igrejas - promoção por tempo limitado\r\na visual inform?tica ltda esta completando 7 anos de vida e quem vai ganhar presentes?o as igrejas . . . temos mais de 2000 clientes em todo o brasil e exterior , agora chegou a sua vez ! promo??o espetacular - por tempo limitadoo melhor software eclesi?stico pelo menor pre?o : de r $ 250 , 00 + sedex ( a vista ) por apenas r $ 199 , 00 + sedex ( a vista ) na compra ? vista , enviar comprovante do dep?sito para tele - fax ( 033 ) 3278 - 4610 ou por e - mail : jardim @ visualinformatica . com . br valor a vista : r $ 199 , 00 + sedexvalor a prazo : 3 x de r $ 70 , 00 entrada de r $ 70 , 00 + sedex e mais dois cheques para 30 e 60 dias no valor de r $ 70 , 00 . os cheques devem estar cruzados , pr? - datados e nominais a visual inform?tica ltda . o programa ? enviado pelo sedex . na compra a vista o valor do sedex deve ser depositado juntamente com o valor do programa . na compra a prazo , o valor do sedex deve ser inclu?do no primeiro cheque . tabela de sedex para todo o brasilna compra ? vista o cliente deve depositar o valor do programa mais o pre?o do sedex , conforme a tabela abaixo . na compra a prazo o valordo sedex deve ser inclu?do no cheque da entrada . mgr $ 13 , 00 df - es - rj - spr $ 22 , 00 go - ms - prr $ 25 , 00 ba - sc - se - to r $ 29 , 00 al - mt - rsr $ 30 , 00 ce - ma - pb - pe - pi - rnr $ 32 , 00 pa r $ 35 , 00 ac - ap - am - ror $ 39 , 00 rrr $ 40 , 00 licen?a para 3 computadores * a concorr?ncia por at? r $ 200 , 00 , s? permite instalar em 1 computador ! ser?o reproduzidos apenas 1000 cd ? s para esta promo??o ! este software auxilia na administra??o da secretaria , tesouraria e demais minist?rios e departamentos de uma igreja . visual igrejas 2001 - vers?o 6 . 0 - revis?o 2004 para windows 95 , 98 , me , 2000 , xp , 2003 temos uma vers?o personalizada para as igrejas em c?lulas modelo dos 12 - portugu?s ou espanhol nesta vers?o o sistema controla todos os passos da vis?o , ( ganhar , consolidar , discipular e enviar ) . controle de novos convertidos , consolida??o , pre - encontros , encontros , p?s - encontros , etc . controle da escola de l?deres ( alunos , notas , frequ?ncias , etc . . ) controle das c?lulas . controles existentes : membros . m?sicas do de materiais crian?asa??o dominical serm?esc?lulas ( consolida??o , pre - encontro , encontro , p?s - encontro , reencontro , escola de l?deres ) e ( semin?rios , participantes , local , hospedagem ) caixa , bancos , a pagar , a receber , or?amento . direta of?cios ( apresenta??es crian?as , batismos , casamentos , f?nebres ) diversos relat?rios cantinacart?o de membrobiblioteca e v?deo por mission?rias e mission?rios . backup autom?tico dos do programapara windows 95 ou superior . de seguran?a definido pelo pr?prio administrador do banco de dados , onde ? poss?vel determinar quem s?o os usu?rios , as senhas e tamb?m o que cada usu?rio tem o direito de usar . help onlineroda em microcomputador petium ou superior , com 64 mb de ram m?nima , 128 mb ? recomend?vel . o hd precisa ter no m?nimo 55 mb de espa?o livre para instala??o do software . suporte t?cnico - gratuito por 1 anotelefax : ( 33 ) 3278 - 4610 hor?rio de 08 : 00 as 12 : 00 e 14 : 00 as 18 : 00 horasde segunda ? sexta - feirae - mail : jardim @ visualinformatica . com . br - messenger : visual _ inf @ hotmail . compromo??o v?lida por tempo limitadomaiores : ( 0 xx 33 ) 3278 - 4610 / 9953 - 2794 9107 - 1838 / 9963 - 0805 e - mail : jardim @ visualinformatica . com . brou messenger : visual _ inf @ hotmail . comcontato : andressa / raquel ou jardimdemonstra??o no site : www . visualinformatica . com . bresta mensagem ? enviada com a complac?ncia da nova legisla??o sobre correio eletr?nico . se??o 301 par?grafo ( a ) ( 2 ) decreto 5 1618 , t?tulo terceiro aprovado pelo 105 o . este e - mail n?o poder? ser considerado spam quando inclua uma forma de ser removido , por favor , envie um e - mail com o subject " remover " . caso tenha interesse , por favor , guarde - o . este e - mail n?o ser? spam se incluido na lista de remo??o . envie um e - mail para jardim @ visualinformatica . com . br , colocando no assunto remover , nome e email .     2
Name: contents, Length: 2564, dtype: int64

testデータでは重複する(ユニークな)重テキストは2564個あります
51個もの重複があるテキストが1つあり,その他は2~8個の重複です

trainデータとtestデータともに一定数の重複が見つかりましたが,これはcha_kabuさんのトピック,元データセットについて(出典論文の訳)で挙げられている論文の内容と異なるので,検証が必要そうです

リークの確認

test_df[test_df["contents"].isin(train_df["contents"])]
id contents
0 1 Subject: re : weather and energy price data\r\...
43 44 Subject: ena analysts and associates\r\ni have...
239 240 Subject: board presentation - revised\r\nlouis...
523 524 Subject: overview of investor conference call\...
535 536 Subject: organization announcement\r\nenron pu...
... ... ...
24592 24593 Subject: gmm 21 sep 01\r\nplease find attached...
24626 24627 Subject: get debts off your back - time : 5 : ...
24683 24684 Subject: internet connectivity that beats the ...
24743 24744 Subject: enron mentions\r\nenron taps $ 3 bill...
24781 24782 Subject: california update 5 / 22 / 01\r\nplea...

354 rows × 2 columns

trainデータとtestデータの両方に含まれるテキストが354個確認できます

その他

all_df[all_df['contents'].str[0:9] != "Subject: "]
id contents y

どのcontentsも最初の9文字はSubject:で始まることがわかります

all_df[~all_df['contents'].str.contains("\r\n")]
id contents y

どのcontentsにも\r\n(改行コード)が少なくとも1つ含まれていることがわかります
これはSubjectと本文を区切るための改行がどのcontentsにも含まれていることを示唆していると考えられます

分布の確認

ラベルごとの分布を確認します

top_labels = [not_spam, spam]

colors = ['#1f77b4', '#ff7f0e']

x_data = np.array([[7838, 17000], 
                   [train_df['y'].value_counts()[0], train_df['y'].value_counts()[1]]])
x_data = np.round(x_data / x_data.sum(axis=1, keepdims=True), 3)*100

y_data = ['test', 'train']

fig = go.Figure()

for i in range(0, len(x_data[0])):
    for xd, yd in zip(x_data, y_data):
        fig.add_trace(go.Bar(
            x=[xd[i]], y=[yd],
            orientation='h',
            marker=dict(
                color=colors[i],
                line=dict(color='rgb(248, 248, 249)', width=1)
            )
        ))

fig.update_layout(
    xaxis=dict(
        showgrid=False,
        showline=False,
        showticklabels=False,
        zeroline=False,
        domain=[0.15, 1]
    ),
    yaxis=dict(
        showgrid=False,
        showline=False,
        showticklabels=False,
        zeroline=False,
    ),
    barmode='stack',
    paper_bgcolor='rgb(248, 248, 255)',
    plot_bgcolor='rgb(248, 248, 255)',
    margin=dict(l=10, r=10, t=80, b=80),
    showlegend=False,
)

annotations = []

for yd, xd in zip(y_data, x_data):
    # labeling the y-axis
    annotations.append(dict(xref='paper', yref='y',
                            x=0.14, y=yd,
                            xanchor='right',
                            text=str(yd),
                            font=dict(family='Arial', size=14),
                            showarrow=False, align='right'))
    # labeling the first percentage of each bar (x_axis)
    annotations.append(dict(xref='x', yref='y',
                            x=xd[0] / 2, y=yd,
                            text=str(xd[0]) + '%',
                            font=dict(family='Arial', size=14,
                                      color='rgb(248, 248, 255)'),
                            showarrow=False))
    # labeling the first Likert scale (on the top)
    if yd == y_data[-1]:
        annotations.append(dict(xref='x', yref='paper',
                                x=xd[0] / 2, y=1.1,
                                text=top_labels[0],
                                font=dict(family='Arial', size=14),
                                showarrow=False))
    space = xd[0]
    for i in range(1, len(xd)):
            # labeling the rest of percentages for each bar (x_axis)
            annotations.append(dict(xref='x', yref='y',
                                    x=space + (xd[i]/2), y=yd,
                                    text=str(xd[i]) + '%',
                                    font=dict(family='Arial', size=14,
                                              color='rgb(248, 248, 255)'),
                                    showarrow=False))
            # labeling the Likert scale
            if yd == y_data[-1]:
                annotations.append(dict(xref='x', yref='paper',
                                        x=space + (xd[i]/2), y=1.1,
                                        text=top_labels[i],
                                        font=dict(family='Arial', size=14),
                                        showarrow=False))
            space += xd[i]

fig.update_layout(annotations=annotations)

fig.show()

trainデータは非スパム側にかなり偏っていることがわかります
一方で,testデータはスパム側に偏っていることがわかります(公式の発表より)
この不均衡に対してどう対処するかがこのコンペの鍵になりそうです

トークン分割後の分析

contentsをトークン分割した後の分析をします tokenizerとしてtransformersbert-base-uncased用に学習されたtokenizerを使います
また,\r\nなどの改行コードは分割の段階で自動的に除去されるらしいです(詳しい仕様は要確認)

トークン数

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
train_df['num_word'] = train_df['contents'].apply(lambda x:len(tokenizer.encode(x, truncation=False)))
test_df['num_word'] = test_df['contents'].apply(lambda x:len(tokenizer.encode(x, truncation=False)))
train_df.head()
HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…
HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…
HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…
id contents y num_word
0 1 Subject: re : fw : willis phillips\r\ni just s... 0: not spam 96
1 2 Subject: re : factor loadings for primary curv... 0: not spam 391
2 3 Subject: re : meridian phone for kate symes\r\... 0: not spam 98
3 4 Subject: re : october wellhead\r\nvance ,\r\nd... 0: not spam 168
4 5 Subject: california 6 / 13\r\nexecutive summar... 0: not spam 659
print(f'longer than 512 (train): {len(train_df[train_df["num_word"]>512].index)} / {len(train_df.index)}')
print(f'longer than 512 (test):  {len(test_df[test_df["num_word"]>512].index)} / {len(test_df.index)}')
longer than 512 (train): 1628 / 8878
longer than 512 (test):  4380 / 24838

トークン数が512を超えるテキストがかなりの割合で存在します
BERTなどの,入力トークン数に制限があるモデルを使用する場合は工夫が必要そうです

トークン数の分布

hist_data = [train_df["num_word"][train_df["num_word"]<3000],\
             test_df["num_word"][test_df["num_word"]<3000]]
group_labels = ["train", "test"]
fig = ff.create_distplot(hist_data, group_labels, show_curve=True)
fig.update_layout(title_text='Distribution of Number of words')
fig.update_layout(
    autosize=False,
    width=900,
    height=700,
    paper_bgcolor="LightSteelBlue",
)
fig.show()

trainデータとtestデータにおけるトークン数の分布です
trainデータとtestデータでトークン数の分布に違いはなさそうです

hist_data = [train_df.loc[train_df["y"]==not_spam, "num_word"][train_df["num_word"]<2000],\
             train_df.loc[train_df["y"]==spam, "num_word"][train_df["num_word"]<2000]]
group_labels = ["not spam", "spam"]
fig = ff.create_distplot(hist_data, group_labels, show_curve=True)
fig.update_layout(title_text='Distribution of Number of words')
fig.update_layout(
    autosize=False,
    width=900,
    height=700,
    paper_bgcolor="LightSteelBlue"
)
fig.show()

スパムと非スパムにおけるトークン数の分布です
スパムにおけるトークン数の分布が不自然に見えますが,縦軸が割合を表しており,スパムのサンプル数が少ないので1サンプルが占める割合が大きくなることを踏まえると,スパムと非スパムの間に分布の違いはないと言えるでしょう

単語出現頻度

単語ごとの出現頻度を可視化します

train_df['word_list'] = train_df['contents'].apply(lambda x: tokenizer.tokenize(x))
top = Counter([item for sublist in train_df['word_list'] for item in sublist])
temp = pd.DataFrame(top.most_common(20))
temp.columns = ['Common_words','count']
temp.style.background_gradient(cmap='Blues')
Common_words count
0 - 161488
1 . 145953
2 , 118498
3 the 94052
4 : 69049
5 to 67404
6 / 66664
7 and 43695
8 of 40371
9 en 34759
10 a 34056
11 in 32163
12 ##ron 32103
13 ' 26443
14 for 25955
15 ##t 21659
16 on 21508
17 @ 20221
18 is 19928
19 i 19643
fig = px.bar(temp, x="count", y="Common_words", title='Commmon Words in contents', orientation='h', 
             width=700, height=700, color='Common_words')
fig.show()

普遍的に登場する単語(the, and, ofなど)が多く登場していることがわかります
このままでは特徴を掴みづらいのでストップワードを除去します
(逆に,ここで上位に上がっている##ronenは相当多く出現していると言えそうです.これは元データセットについて(出典論文の訳)で紹介されている内容と一致していそうです)

nltk.download('stopwords')
stopwords_list = set(stopwords.words('english') + list(string.punctuation))
stopwords_list
def remove_stopword(x):
    return [y for y in x if y not in stopwords_list]
train_df['word_list'] = train_df['word_list'].apply(lambda x:remove_stopword(x))
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
top = Counter([item for sublist in train_df['word_list'] for item in sublist])
temp = pd.DataFrame(top.most_common(20))
temp.columns = ['Common_words','count']
temp.style.background_gradient(cmap='Blues')
Common_words count
0 en 34759
1 ##ron 32103
2 ##t 21659
3 ec 18433
4 subject 15162
5 ##s 13695
6 ##u 11654
7 ho 8983
8 2001 8274
9 2000 7293
10 1 7112
11 com 6979
12 please 6818
13 would 6515
14 e 6501
15 company 5991
16 2 5747
17 energy 5246
18 10 5210
19 said 5156
fig = px.bar(temp, x="count", y="Common_words", title='Commmon Words in contents', orientation='h', 
             width=700, height=700, color='Common_words')
fig.show()

1位と2位でenron,3位と4位でectが確認できます.この辺りは分割方法に工夫の余地がありそうです

top = Counter([item for sublist in train_df['word_list'][train_df["y"]==not_spam] for item in sublist])
temp = pd.DataFrame(top.most_common(20))
temp.columns = ['Common_words','count']
fig = px.bar(temp, x="count", y="Common_words", title=f'Commmon Words in "not spam"', orientation='h', 
             width=700, height=700,color='Common_words')
fig.show()

非スパムにおける単語登場頻度です.trainデータにおいて非スパムが98%を占めるので,trainデータ全体の分布との差異があまりありません

top = Counter([item for sublist in train_df['word_list'][train_df["y"]==spam] for item in sublist])
temp = pd.DataFrame(top.most_common(20))
temp.columns = ['Common_words','count']
fig = px.bar(temp, x="count", y="Common_words", title=f'Commmon Words in "spam"', orientation='h', 
             width=700, height=700,color='Common_words')
fig.show()

スパムの単語登場頻度です.こちらは判断が難しそうです.##o``##l``##eなどのアルファベット1文字に分解されているトークンが多いので,未知語やデタラメな文字列が多いと考えられそうです.httpcomが上位に来ている点も注目するべきでしょう

ラベルごとのユニークな単語

片方のラベルにしか登場しない単語を可視化してみます

def words_unique(label, numwords, raw_words):
    '''
    Input:
        label - spam or not spam;
        numwords - how many specific words do you want to see in the final result; 
        raw_words - list for item in train_df['word_list']:
    Output: 
        dataframe giving information about the name of the specific ingredient and how many times it occurs in the chosen cuisine (in descending order based on their counts)..

    '''
    all_other = set()
    for item in train_df[train_df.y != label]['word_list']:
        for word in set(item):
            all_other.add(word)
    
    unique_words = set([x for x in raw_text if x not in all_other])
    
    counter = Counter()
    
    for item in train_df[train_df.y == label]['word_list']:
        for word in item:
            counter[word] += 1
    
    for word in list(counter):
        if word not in unique_words:
            del counter[word]
    
    unique_words_df = pd.DataFrame(counter.most_common(numwords), columns = ['words','count'])
    
    return unique_words_df
raw_text = set([word for word_list in train_df['word_list'] for word in word_list])
unique_not_spam = words_unique(not_spam, 20, raw_text)
print("The top 20 unique words in 'not spam':")
unique_not_spam.style.background_gradient(cmap='Greens')
The top 20 unique words in 'not spam':
words count
0 cc 4911
1 vince 4739
2 houston 3027
3 ##yne 2875
4 ##inski 2762
5 kam 2753
6 71 2232
7 mm 2010
8 john 1964
9 hp 1913
10 louise 1826
11 jones 1815
12 california 1736
13 mark 1465
14 schedule 1365
15 07 1303
16 meter 1286
17 chief 1163
18 ##bt 1142
19 pipeline 1123

非スパムにユニークな単語です.houston californiaなどの地名やschedule chiefなどの仕事で使用する単語が確認できます.

unique_spam= words_unique(spam, 20, raw_text)
print("The top 20 unique words in 'spam':")
unique_spam.style.background_gradient(cmap='Reds')
The top 20 unique words in 'spam':
words count
0 ion 13
1 php 11
2 ##sphere 10
3 penis 7
4 ##cion 6
5 portraits 5
6 orgasm 5
7 ##wear 5
8 pali 5
9 bbc 5
10 goa 4
11 ##acion 4
12 ##bber 4
13 ##iii 4
14 ##mobile 4
15 ##hism 4
16 sperm 4
17 kernel 4
18 pharmacy 4
19 ##eum 4

スパムにユニークな単語です.##wear##mobileは企業名の一部でしょうか?その他下ネタ系の単語が確認できます

ワードクラウド

最後にワードクラウドを見ていきます
こちらはcha_kabuさんのMultinomialNBを使ったbaseline(参考)パクら参考にさせていただきました. ほとんどcha_kabuさんのものと同様ですが,Tokenizerが違うので一応見てみます

not_spam_words = []
for item in train_df[train_df.y == not_spam]['word_list']:
    not_spam_words += item
not_spam_words = ' '.join(not_spam_words)
plt.figure(figsize=(20,20))
wc = WordCloud(max_words = 2000, width = 1600, height = 800, stopwords = STOPWORDS).generate(not_spam_words)
plt.imshow(wc, interpolation="bilinear")
<matplotlib.image.AxesImage at 0x7f8d35f6f7b8>

上記で見たものと同様にenronなどが確認できます

spam_words = []
for item in train_df[train_df.y == spam]['word_list']:
    spam_words += item
spam_words = ' '.join(spam_words)
plt.figure(figsize=(20,20))
wc = WordCloud(max_words = 2000, width = 1600, height = 800, stopwords = STOPWORDS).generate(spam_words)
plt.imshow(wc, interpolation="bilinear")
<matplotlib.image.AxesImage at 0x7f8d36155278>

subjectが目につきますが,その他に特筆すべき点は見受けられません

意見・アイディア等ありましたがお気軽にコメントいただけると嬉しいです.
(ついでにupvoteもしていただけるとありがたいです!)

添付データ

  • ProbSpace_Spam_Competition_EDA.ipynb?X-Amz-Expires=10800&X-Amz-Date=20240329T022616Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIP7GCBGMWPMZ42PQ
  • EDA.ipynb?X-Amz-Expires=10800&X-Amz-Date=20240329T022616Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIP7GCBGMWPMZ42PQ
  • Aws4 request&x amz signedheaders=host&x amz signature=cb4de9b41038c712b1f9840c3a938c5891fdaf91fbebcd768f9c52643e19b77b
    cha_kabu

    EDAの共有、ありがとうございます。トークンとやらを扱うのにまだ至っていないため、参考にさせていただきます。またトピック参考にして頂けて光栄です!

    重複データについてですが、以下の可能性があるかと思っています(裏取りしていないので怪しいですが)。

    1. 出典論文で重複を削除したと明記されていたのはspamに対してのみだったので、非spamは重複が普通にあるのかも知れません。
    1. spamは元々4つの別々のソースから取得されており、そのうち①②③については共通するテキスト数から(≒完璧ではない方法で?)重複削除したとされています。一方で、④については文脈を読むと取得ソースの関係で重複が無いかのような書かれ方をしているのですが、重複が100%無いとは書かれていないのと、重複を削除したとも書かれていないため、④から来た重複なのかも知れません。なお、その他の前処理は①~④同様に実施したと記載があります。
    Icon13
    DorisSmoom
    Hkwjav whcapw [url=https://clomisale.com/]uses for clomid[/url] [url=http://xn--fromwww-n4f54bq91u.100elearning.com/viewthread.php?tid=607441&extra=]Meet Locality Vam[/url] [url=https://anapa-vibor.ru/zhk-razdvatri-anapa?page=5#comment-16228]Forgather Plat den[/url] 56_3904
    Icon13
    DorisSmoom
    Pmindh gpyogl [url=https://edptadal.com/]cialis super active[/url] ad53_b2
    Aws4 request&x amz signedheaders=host&x amz signature=295f243649c61224f71ecd28af20aed5174e8a3d4cf8384ac8ec74512f8d4c46
    Horoscunsoky
    [url=https://www.youtube.com/c/%D0%9B%D1%83%D1%87%D1%88%D0%B8%D0%B9%D0%95%D0%B6%D0%B5%D0%B4%D0%BD%D0%B5%D0%B2%D0%BD%D1%8B%D0%B9%D0%93%D0%BE%D1%80%D0%BE%D1%81%D0%BA%D0%BE%D0%BF/][img]https://thumb.tildacdn.com/tild3038-3838-4436-b833-626336623932/-/format/webp/channels4_banner.jpg[/img][/url] Ежедневный Гороскоп от НЕЙРОСЕТИ! Персональный гороскоп на сегодня и завтра показывает основные события в жизни и тенденции этих дней. [url=https://www.youtube.com/channel/UCTProTC1RZxnaJsamKkk6OA]гороскоп завтра Скорпион[/url] Бесплатный для всех знаков зодиака! И ваша совместимость с друзьями и любимыми БЕСПЛАТНО! Журнал об астрологии!
    Favicon
    new user
    コメントするには 新規登録 もしくは ログイン が必要です。