- Make changes to the CMD files to utilize a single log and error files. To see if this corrects the PittGrid issues we have been running into
- Clearification on the PittGrid issues:"I ran a DAG that had 2000 jobs for 3 separate programs.
When I checked this morning, I saw there were only 10 left, but they were all held.
Also, when I checked the folder, I was missing a lot of output that said it complete (about 1/2).
I checked the DAG and cmd files, everything looked good.
Checked some input, again looked good (didn't verify everything, too much to check)." - Contiue with the proposal so Dr. Hirtle can help with finding resources for the completion
Friday, October 30, 2009
Things to do:
Things to do for the next week:
Update to Clean_Entromics_1.0.sh
Updated Clean_Entromics_1.0.sh:
- Application will now zip the output folder
- Updated pep9 and romkes Clean_Entromics_1.0.sh
Also spoke with Senthil and his responce to the current PittGrid issue:
"Looks like it is IO issue, since the version of software we have keep the files opened until jobs are complete.
If possible try to use the same log file for all the cmd files instead of unique log file for each job (e.g. segment-12_03_82-1065.Nicol.log), something like entromics.log and see if it fixes the problem.
I think it should fix the problem ,if not try to reduce to 1000 jobs instead of 2000.
Let me know, how it goes."
Will make these updates today and run another test tonight.
- Application will now zip the output folder
- Updated pep9 and romkes Clean_Entromics_1.0.sh
Also spoke with Senthil and his responce to the current PittGrid issue:
"Looks like it is IO issue, since the version of software we have keep the files opened until jobs are complete.
If possible try to use the same log file for all the cmd files instead of unique log file for each job (e.g. segment-12_03_82-1065.Nicol.log), something like entromics.log and see if it fixes the problem.
I think it should fix the problem ,if not try to reduce to 1000 jobs instead of 2000.
Let me know, how it goes."
Will make these updates today and run another test tonight.
Thursday, October 29, 2009
First/Second Entromics Tests
Test variables:
2000 sequences
401 sequence size
All exact copies (name and sequences)
>gene1_hg19_chr7_87127948_87347564_gene_1_A_rs1002205
AddBuffer - true
Nicol n e n 10 90 (Max matrix: 500)
Nisa 1 1e36 -1e36
eH 20 500 100 10
----------------------------------------------------------------------------------------
10/27/09 - Scheduled 2000 sequences in PittGrid and Locally starting at 11:20pm.
10/28/09 morning -
PittGrid results - Checked PittGrid and saw that most of the jobs were off the queue, but when I checked the working directory I saw only the Nicol and Nicol Profile jobs output. No Nisa or eH. Emailed Senthil about this and checked the CMD and DAG files. Both looked good, and the input file looked good as well.
Local results - For some reason AddBuffer failed after 509 files. Everything computed (eH and eS) for the 509 but that was it. Check the AddBuffer application and made modifications (changed the char sizes, allocated more space for each sequence, etc.) was not able to fix the problem. Did some research and found out that because I was opening each file in AddBuffer (509), plus adding in stdout, stdin, stderr (509 + 3 = 512). Which made more sense. Apparently for a given process in Windows, there is a limit of 512 files it can open (even if I close the file in the application). This can be changed using _setmaxstdio(2048);
10/28/09 night - After making changes to AddBuffer. I tested both the PittGrid and Local Entromics with 2000 files. Started at 8:33pm.
10/29/09 morning - Checked results:
PittGrid results - same as yesterday morning, email Senthil again
Local results - All files finished!
eH application was incredible faster, ending at 11:14 pm.
Nisa finished at 6:50am.
The final file size = 18.9 Gb
2000 sequences
401 sequence size
All exact copies (name and sequences)
>gene1_hg19_chr7_87127948_87347564_gene_1_A_rs1002205
AddBuffer - true
Nicol n e n 10 90 (Max matrix: 500)
Nisa 1 1e36 -1e36
eH 20 500 100 10
----------------------------------------------------------------------------------------
10/27/09 - Scheduled 2000 sequences in PittGrid and Locally starting at 11:20pm.
10/28/09 morning -
PittGrid results - Checked PittGrid and saw that most of the jobs were off the queue, but when I checked the working directory I saw only the Nicol and Nicol Profile jobs output. No Nisa or eH. Emailed Senthil about this and checked the CMD and DAG files. Both looked good, and the input file looked good as well.
Local results - For some reason AddBuffer failed after 509 files. Everything computed (eH and eS) for the 509 but that was it. Check the AddBuffer application and made modifications (changed the char sizes, allocated more space for each sequence, etc.) was not able to fix the problem. Did some research and found out that because I was opening each file in AddBuffer (509), plus adding in stdout, stdin, stderr (509 + 3 = 512). Which made more sense. Apparently for a given process in Windows, there is a limit of 512 files it can open (even if I close the file in the application). This can be changed using _setmaxstdio(2048);
10/28/09 night - After making changes to AddBuffer. I tested both the PittGrid and Local Entromics with 2000 files. Started at 8:33pm.
10/29/09 morning - Checked results:
PittGrid results - same as yesterday morning, email Senthil again
Local results - All files finished!
eH application was incredible faster, ending at 11:14 pm.
Nisa finished at 6:50am.
The final file size = 18.9 Gb
Tuesday, October 27, 2009
Update to Entromics_1.0.sh
Entromics_1.0.sh is now Entromics_1.1.sh
Updates:
- Now able to chose to run eS and/or eH (Must run both Nicol and Nisa when running eS).
- Able to chose which sequence to start from (verified this for more than 400 sequence, to this point)
- Files no longer named with date, but a fixed number (to aid Clean_Entromics_1.0.sh)
Removed clean_folder application and replaced it with Clean_Entromics_1.0.sh
- Creates a folder with the current date
- Moves all outputted files to this folder
- Changes their names to match the following schema (Name.#.Program.txt) as is the local Entromics applications
How to run Entromics_1.1.sh:
Updates:
- Now able to chose to run eS and/or eH (Must run both Nicol and Nisa when running eS).
- Able to chose which sequence to start from (verified this for more than 400 sequence, to this point)
- Files no longer named with date, but a fixed number (to aid Clean_Entromics_1.0.sh)
Removed clean_folder application and replaced it with Clean_Entromics_1.0.sh
- Creates a folder with the current date
- Moves all outputted files to this folder
- Changes their names to match the following schema (Name.#.Program.txt) as is the local Entromics applications
How to run Entromics_1.1.sh:
- Create folder for sequences (mkdir name)
- Move sequences to this folder (FilaZilla)
- cd to given folder in PuTTY
- Copy Entromics_1.1.sh (cp $HOME/Applications/Entromics_1.1.sh $PWD) <-*this will work from any folder, any folder name*
- Run Entromics_1.1.sh
- (Clean_Entromics_1.1.sh is copied automatically)
Tuesday, October 6, 2009
10/6/09
- Created directions and the Entromics shell flowchart
- http://www.sis.pitt.edu/~pdudas/BloggerFiles/Entromics%20Shell%20Directions.doc
- http://www.sis.pitt.edu/~pdudas/BloggerFiles/Entromics%20Shell%20FlowChart.doc
** Note **
Entromics.sh will now be numbered to make reading the posts and updates easier.
- Entromics.sh is now.... Entromics_1.0.sh
Sunday, October 4, 2009
10/4/09
- Checked to make sure R file worked locally
- Updated file with some comments
- Editted the R file to allow it to work on unix
- Able to write a script to run it a command on PittGrid
- Was able to get the Unix version to print in pdf
Tuesday, September 22, 2009
9/22/2009
Updated Window Application Entromics:
- corrected for Nicol -> Max Window size issue
- Output of files will now be put in one file (Name_Date) with names: Name.#.Program.txt
- Change date format to be the name of the month
Spoke with Dr. Pancoska about possible papers and Patent information: US. Patent Application 01707/PIT-10-6845-PCT from September 19,2009, "Discovery of t-homology in a set of sequences and production of lists of t-homologous sequences with predefined properties." Petr Pancoska, Robert A. Branch, Patrick M. Dudas
Possible paper: PittGrid application and architecture
Still no resolution for Entromics Site issue in processing sequences
- corrected for Nicol -> Max Window size issue
- Output of files will now be put in one file (Name_Date) with names: Name.#.Program.txt
- Change date format to be the name of the month
Spoke with Dr. Pancoska about possible papers and Patent information: US. Patent Application 01707/PIT-10-6845-PCT from September 19,2009, "Discovery of t-homology in a set of sequences and production of lists of t-homologous sequences with predefined properties." Petr Pancoska, Robert A. Branch, Patrick M. Dudas
Possible paper: PittGrid application and architecture
Still no resolution for Entromics Site issue in processing sequences
Wednesday, September 9, 2009
Quote
Possible usable quote:
The skill of writing is to create a context in which other people can think. - Edwin Schlossberg
Subscribe to:
Posts (Atom)